Expert Python Programming(Third Edition)
上QQ阅读APP看书,第一时间看更新

Weaknesses and alternatives

For a very long time, one of the most common pitfalls regarding dictionaries was expecting that they preserve the order of elements in which new keys were added. The situation has changed a bit in Python 3.6, and the problem was finally solved in Python 3.7 on the level of language specification.

But, before we dig deeper into the situation of Python 3.6 and later releases, we need to make a small detour and examine the problem as if we were still stuck in the past, when the only Python releases available were older than 3.6. In the past, you could have a situation where the consecutive dictionary keys also had hashes that were consecutive values too. And, for a very long time, this was the only situation when you could expect that you would iterate over dictionary elements in the same order as they were added to the dictionary. The easiest way to present this is by using integer numbers, as hashes of integer numbers are the same as their value:

>>> {number: None for number in range(5)}.keys()
dict_keys([0, 1, 2, 3, 4])

Using other datatypes that hash differently could show that the order is not preserved. Here is an example that was executed in CPython 3.5:

>>> {str(number): None for number in range(5)}.keys()
dict_keys(['1', '2', '4', '0', '3'])
>>> {str(number): None for number in reversed(range(5))}.keys()
dict_keys(['2', '3', '1', '4', '0'])

As shown in the preceding code, for CPython 3.5 (and also earlier versions), the resulting order is both dependent on the hashing of the object and also on the order in which the elements were added. This is definitely not what can be relied on, because it can vary with different Python implementations.

So, what about Python 3.6 and later releases? Starting from Python 3.6, the CPython interpreter uses a new compact dictionary representation that has a noticeably smaller memory footprint and also preserves order as a side effect of that new implementation. In Python 3.6, the order preserving nature of dictionaries was only an implementation detail, but in Python 3.7, it has been officially declared in the Python language specification. So, starting from Python 3.7, you can finally rely on the item insertion order of dictionaries.

In parallel to the CPython implementation of dictionaries, Python 3.6 introduced another change in the syntax that is related to the order of items in dictionaries. As defined in the PEP 486 "Preserving the order of **kwargs in a function" document, the order of keyword arguments collected using the **kwargs syntax must be the same as presented in function call. This behavior can be clearly presented with the following example:

>>> def fun(**kwargs):
... print(kwargs)
...
>>> fun(a=1, b=2, c=3)
{'a': 1, 'b': 2, 'c': 3}
>>> fun(c=1, b=2, a=3)
{'c': 1, 'b': 2, 'a': 3}

However the preceding changes can be used effectively only in the newest releases of Python. So, what should you do if you have a library that must work on older versions of Python too, and some parts of its code requires order-preserving dictionaries? The best option is to be clear about your expectations regarding dictionary ordering and use a type that explicitly preserves the order of elements.

Fortunately, the Python standard library provides an ordered dictionary type called OrderedDict in the collections module. The constructor of this type accepts iterable as the initialization argument. Each element of that argument should be a pair of a dictionary key and value, as in the following example:

>>> from collections import OrderedDict
>>> OrderedDict((str(number), None) for number in range(5)).keys()
odict_keys(['0', '1', '2', '3', '4'])

It also has some additional features, such as popping items from both ends using the popitem() method, or moving the specified element to one of the ends using the move_to_end() method. A full reference on that collection is available in the Python documentation (refer to https://docs.python.org/3/library/collections.html). Even if you target only Python in version 3.7 or newer, which guarantees the preservation of the item insertion order, the OrderedDict type is still useful. It allows you to make your intention clear. If you define your variable with OrderedDict instead of a plain dict, it becomes obvious that, in this particular case, the order of inserted items is important. 

The last interesting note is that, in very old code bases, you can find dict as a primitive set implementation that ensures uniqueness of elements. While this will give proper results, you should avoid such use of that type unless you target Python versions lower than 2.3. Using dictionaries in this way is wasteful in terms of resources. Python has a built-in set type that serves this purpose. In fact, it has very similar internal implementation to dictionaries in CPython, but offers some additional features, as well as specific set-related optimizations.