Applying generator expressions to scalar functions
We'll look at a more complex kind of generator expression to map data values from one kind of data to another. In this case, we'll apply a fairly complex function to individual data values created by a generator.
We'll call these non-generator functions scalar, as they work with simple atomic values. To work with collections of data, a scalar function will be embedded in a generator expression.
To continue the example started earlier, we'll provide a haversine function and then use a generator expression to apply a scalar haversine() function to a sequence of pairs from our KML file.
The haversine() function looks like the following code:
from math import radians, sin, cos, sqrt, asin from typing import Tuple
MI= 3959 NM= 3440 KM= 6371 Point = Tuple[float, float]
def haversine(p1: Point, p2: Point, R: float=NM) -> float:
lat_1, lon_1= p1 lat_2, lon_2= p2
Δ_lat = radians(lat_2 - lat_1) Δ_lon = radians(lon_2 - lon_1) lat_1 = radians(lat_1) lat_2 = radians(lat_2) a = sqrt(
sin(Δ_lat/2)**2 +
cos(lat_1)*cos(lat_2)*sin(Δ_lon/2)**2
) c = 2*asin(a) return R * c
This is a relatively simple implementation copied from the World Wide Web. The start and end points have type hints. The return value is also provided with a hint. The explicit use of Point = Tuple[float, float] makes it possible for the mypy tool to confirm that this function is used properly.
The following code is how we could use our collection of functions to examine some KML data and produce a sequence of distances:
trip= (
(start, end, round(haversine(start, end),4)) for start,end in
legs(float_from_pair(lat_lon_kml()))
)
for start, end, dist in trip: print(start, end, dist)
The essence of the processing is the generator expression assigned to the trip variable. We've assembled three tuples with a start, end, and the distance from start to end. The start and end pairs come from the legs() function. The legs() function works with floating-point data built from the latitude-longitude pairs extracted from a KML file.
The output looks like the following command snippet:
(37.54901619777347, -76.33029518659048) (37.840832, -76.273834) 17.7246 (37.840832, -76.273834) (38.331501, -76.459503) 30.7382 (38.331501, -76.459503) (38.845501, -76.537331) 31.0756 (36.843334, -76.298668) (37.549, -76.331169) 42.3962 (37.549, -76.331169) (38.330166, -76.458504) 47.2866 (38.330166, -76.458504) (38.976334, -76.473503) 38.8019
Each individual processing step has been defined succinctly. The overview, similarly, can be expressed succinctly as a composition of functions and generator expressions.
Clearly, there are several further processing steps we may like to apply to this data. The first, of course, is to use the format() method of a string to produce better-looking output.
More importantly, there are a number of aggregate values we'd like to extract from this data. We'll call these values reductions of the available data. We'd like to reduce the data to get the maximum and minimum latitude, for example, to show the extreme north and south ends of this route. We'd like to reduce the data to get the maximum distance in one leg as well as the total distance for all legs.
The problem we'll have using Python is that the output generator in the trip variable can be used only once. We can't easily perform several reductions of this detailed data. We can use itertools.tee() to work with the iterable several times. It seems wasteful, however, to read and parse the KML file for each reduction.
We can make our processing more efficient by materializing intermediate results. We'll look at this in the next section. Then we will see how to compute multiple reductions of the available data.