上QQ阅读APP看书，第一时间看更新

Extending a simple loop

We have two kinds of extensions we could factor into a simple loop. We'll look first at a filter extension. In this case, we may be rejecting values from further consideration. They may be data outliers, or perhaps source data that's improperly formatted. Then, we'll look at mapping source data by performing a simple transformation to create new objects from the original objects. In our case, we'll be transforming strings to floating-point numbers. The idea of extending a simple for statement with a mapping, however, applies to many situations. We'll look at refactoring the above pairs() function. What if we need to adjust the sequence of points to discard a value? This will introduce a filter extension that rejects some data values.

The loop we're designing simply returns pairs without performing any additional application-related processing—the complexity is minimal. Simplicity means we're somewhat less likely to confuse the processing state.

Adding a filter extension to this design could look something like the following code snippet:

from typing import Iterator, Any, Iterable
Pairs_Iter = Iterator[Tuple[float, float]]
LL_Iter = Iterable[
    Tuple[Tuple[float, float], Tuple[float, float]]]
def legs_filter(lat_lon_iter: Pairs_Iter) -> LL_Iter:
    begin = next(lat_lon_iter)
    for end in lat_lon_iter:
        if #some rule for rejecting:
            continue
        yield begin, end
        begin = end

We have plugged in a processing rule to reject certain values. As the loop remains succinct and expressive, we are confident that the processing will be done properly. Also, we can easily write a test for this function, as the results work for any iterable, irrespective of the long-term destination of the pairs.

We haven't really provided much information about the #some rule for rejecting code. This is a kind of condition that uses begin, end, or both variables to reject the point from further consideration. For example, it may reject begin == end to avoid zero-length legs.

The next refactoring will introduce additional mapping to a loop. Adding mappings is common when a design is evolving. In our case, we have a sequence of string values. We need to convert these to float values for later use. This is a relatively simple mapping that shows the design pattern.

The following is one way to handle this data mapping, through a generator expression that wraps a generator function:

trip = list(
    legs(
        (float(lat), float(lon)) 
        for lat,lon in lat_lon_kml(row_iter_kml(source))
    )
)

We've applied the legs() function to a generator expression that creates float values from the output of the lat_lon_kml() function. We can read this in the opposite order as well. The lat_lon_kml() function's output is transformed into a pair of float values, which is then transformed into a sequence of legs.

This is starting to get complex. We've got a large number of nested functions here. We're applying float(), legs(), and list() to a data generator. One common way of refactoring complex expressions is to separate the generator expression from any materialized collection. We can do the following to simplify the expression:

ll_iter = (
    (float(lat), float(lon)) 
    for lat,lon in lat_lon_kml(row_iter_kml(source))
)
print(tuple(legs(ll_iter)))

We've assigned the generator function to a variable named ll_iter. This variable isn't a collection object; it's a generator of item. We're not using a list comprehension to create an object. We've merely assigned the generator expression to a variable name. We've then used the flt variable in another expression.

The evaluation of the tuple() method actually leads to a proper object being built so that we can print the output. The flt variable's objects are created only as needed.

There is other refactoring we might like to do. In general, the source of the data is something we often want to change. In our example, the lat_lon_kml() function is tightly bound in the rest of the expression. This makes reuse difficult when we have a different data source.

In the case where the float() operation is something we'd like to parameterize so that we can reuse it, we can define a function around the generator expression. We'll extract some of the processing into a separate function merely to group the operations. In our case, the string-pair to float-pair is unique to particular source data. We can rewrite a complex float-from-string expression into a simpler function, such as:

from typing import Iterator, Tuple, Text, Iterable
Text_Iter = Iterable[Tuple[Text, Text]]
LL_Iter = Iterable[Tuple[float, float]]
def float_from_pair(lat_lon_iter: Text_Iter) -> LL_Iter:
    return (
        (float(lat), float(lon)) 
        for lat,lon in lat_lon_iter
    )

The float_from_pair() function applies the float() function to the first and second values of each item in the iterable, yielding a two-tuple of floats created from an input value. We've relied on Python's for statement to decompose the two-tuple.

The type hints insist that the input matches the Text_Iter type alias—it must be an iterable source of pairs of Text values. The result uses the LL_Iter type alias—this must be an iterable of pairs of float values. The LL_Iter type alias may be used elsewhere in a complex set of function definitions.

We can use this function in the following context:

legs(
    float_from_pair(
        lat_lon_kml(
            row_iter_kml(source))))

We're going to create legs that are built from float values that come from a KML file. It's fairly easy to visualize the processing, as each stage in the process is a simple prefix function. Each function's input is the output from the next function in the nested processing steps.

When parsing, we often have sequences of string values. For numeric applications, we'll need to convert strings to float, int, or Decimal values. This often involves inserting a function such as the float_from_pair() function into a sequence of expressions that clean up the source data.

Our previous output was all strings; it looked like the following code snippet:

(('37.54901619777347', '-76.33029518659048'), 
 ('37.840832', '-76.27383399999999'), 
 ... 
 ('38.976334', '-76.47350299999999'))

We'll want data like the following code snippet, where we have floats:

(((37.54901619777347, -76.33029518659048), 
 (37.840832, -76.273834)), ((37.840832, -76.273834), 
 ... 
 ((38.330166, -76.458504), (38.976334, -76.473503)))

We'll need to create a pipeline of simpler transformation functions. Here, we arrived at flt= ((float(lat), float(lon)) for lat,lon in lat_lon_kml(...)). We can exploit the substitution rule for functions and replace a complex expression such as (float(lat), float(lon)) for lat,lon in lat_lon_kml(...)) with a function that has the same value, in this case float_from_pair(lat_lon_kml(...)). This kind of refactoring allows us to be sure that the simplification has the same effect as a more complex expression.

There are some simplifications that we'll look at in Chapter 5, Higher-Order Functions. We will revisit this in Chapter 6, Recursions and Reductions, to see how to apply these simplifications to the file-parsing problem.