Deciding what to hash
Not every object should provide a hash value. Specifically, if we're creating a class of stateful, mutable objects, the class should never return a hash value. The definition of __hash__
should be None
.
Immutable objects, on the other hand, might sensibly return a hash value so that the object can be used as the key in a dictionary or a member of a set. In this case, the hash value needs to parallel the way the test for equality works. It's bad to have objects that claim to be equal and have different hash values. The reverse—objects with the same hash that are actually not equal—is acceptable.
The __eq__()
method, which we'll also look at in the section on comparison operators, is intimately tied up with hashing.
There are three tiers of equality comparison:
- Same Hash Value: This means that two objects could be equal. The hash value provides us with a quick check for likely equality. If the hash value is different, the two objects cannot possibly be equal, nor can they be the same object.
- Compare As Equal: This means that the hash values must also have been equal. This is the definition of the
==
operator. The objects may be the same object. - Same IDD: This means that they are the same object. They also compare as equal and will have the same hash value. This is the definition of the
is
operator.
The Fundamental Law of Hash (FLH) is this: objects that compare as equal have the same hash value.
We can think of a hash comparison as being the first step in an equality test.
The inverse, however, is not true. Objects can have the same hash value but compare as not equal. This is valid and leads to some expected processing overhead when creating sets or dictionaries. We can't reliably create distinct 64 bit hash values from much larger data structures. There will be unequal objects that are reduced to coincidentally equal hash values.
Coincidentally, equal hash values are an expected overhead when working with sets
and dicts
. These collections have internal algorithms to use alternate locations in the event of hash collisions.
There are three use cases for defining equality tests and hash values via the __eq__()
and __hash__()
method functions:
- Immutable objects: These are stateless objects of types such as tuples, namedtuples, and frozensets that cannot be updated. We have two choices:
- Define neither
__hash__()
nor__eq__()
. This means doing nothing and using the inherited definitions. In this case,__hash__()
returns a trivial function of the ID value for the object, and__eq__()
compares the ID values. The default equality test may sometimes be counterintuitive. Our application might require two instances ofCard( 1, Clubs )
to test as equal and compute the same hash; this won't happen by default. - Define both
__hash__()
and__eq__()
. Note that we're expected to define both for an immutable object.
- Define neither
- Mutable objects: These are stateful objects that can be modified internally. We have one design choice:
- Define
__eq__()
but set__hash__
toNone
. These cannot be used asdict
keys or items insets
.
- Define
Note that there's an additional possible combination: defining __hash__()
but using a default definition for __eq__()
. This is simply a waste of code, as the default __eq__()
method is the same as the is
operator. The default __hash__()
method would have involved writing less code for the same behavior.
We'll look at each of the three situations in detail.