Structuring flat sequences
Sometimes, we'll have raw data that is a flat list of values that we'd like to bunch up into subgroups. This is a bit more complex. We can use the itertools module's groupby() function to implement this. This will have to wait until Chapter 8, The Iterools Module.
Let's say we have a simple flat list, as follows:
flat= ['2', '3', '5', '7', '11', '13', '17', '19', '23', '29',
'31', '37', '41', '43', '47', '53', '59', '61', '67', '71',
... ]
We can write nested generator functions to build a sequence-of-sequence structure from flat data. To do this, we'll need a single iterator that we can use multiple times. The expression looks like the following code snippet:
>>> flat_iter = iter(flat) >>> (tuple(next(flat_iter) for i in range(5))
... for row in range(len(flat)//5)
... ) <generator object <genexpr> at 0x101cead70> >>> list(_) [('2', '3', '5', '7', '11'),
('13', '17', '19', '23', '29'),
('31', '37', '41', '43', '47'),
('53', '59', '61', '67', '71'),
...
]
First, we create an iterator that exists outside either of the two loops that we'll use to create our sequence-of-sequences. The generator expression uses tuple(next(flat_iter) for i in range(5)) to create five-item tuples from the iterable values in the flat_iter variable. This expression is nested inside another generator that repeats the inner loop the proper number of times to create the required sequence of values.
This works only when the flat list is divided evenly. If the last row has partial elements, we'll need to process them separately.
We can use this kind of function to group data into same-sized tuples, with an odd-sized tuple at the end, using the following definitions:
ItemType = TypeVar("ItemType")
Flat = Sequence[ItemType]
Grouped = List[Tuple[ItemType, ...]]
def group_by_seq(n: int, sequence: Flat) -> Grouped:
flat_iter=iter(sequence) full_sized_items = list( tuple(next(flat_iter) for i in range(n)) for row in range(len(sequence)//n)) trailer = tuple(flat_iter) if trailer: return full_sized_items + [trailer] else: return full_sized_items
Within the group_by_seq() function, an initial list is built and assigned to the variable full_sized_items. Each tuple in this list is of size n. If there are leftovers, the trailing items are used to build a tuple with a non-zero length that we can append to the list of full-sized items. If the trailer tuple is of the length zero, it can be safely ignored.
The type hints include a generic definition of ItemType as a type variable. The intent of a type variable is to show that whatever type is an input to this function will be returned from the function. A sequence of strings or a sequence of floats would both work properly.
The input is summarized as a Sequence of items. The output is a List of Tuples of items. The items are all of a common type, described with the ItemType type variable.
This isn't as delightfully simple and functional-looking as other algorithms we've looked at. We can rework this into a simpler generator function that yields an iterable instead of a list.
The following code uses a while loop as part of tail-recursion optimization:
ItemType = TypeVar("ItemType")
Flat_Iter = Iterator[ItemType]
Grouped_Iter = Iterator[Tuple[ItemType, ...]]
def group_by_iter(n: int, iterable: Flat_Iter) -> Grouped_Iter:
row = tuple(next(iterable) for i in range(n))
while row:
yield row
row = tuple(next(iterable) for i in range(n))
We've created a row of the required length from the input iterable. At the end of the input iterable, the value of tuple(next(iterable) for i in range(n)) will be a zero-length tuple. This can be the base case of a recursive definition. This was manually optimized into the terminating condition of the while statement.
The type hints have been modified to reflect the way this works with an iterator. It is not limited to sequences. Because it uses next() explicitly, it has to be used like this: group_by_iter(7, iter(flat)). The iter() function must be used to create an iterator from a collection.