Introduction
See also: Hash functionAssume we want to map keys from some universe into bins (labelled ). The algorithm will have to handle some data set of keys, which is not known in advance. Usually, the goal of hashing is to obtain a low number of collisions (keys from that land in the same bin). A deterministic hash function cannot offer any guarantee in an adversarial setting if the size of is greater than, since the adversary may choose to be precisely the preimage of a bin. This means that all data keys land in the same bin, making hashing useless. Furthermore, a deterministic hash function does not allow for rehashing: sometimes the input data turns out to be bad for the hash function (e.g. there are too many collisions), so one would like to change the hash function.
The solution to these problems is to pick a function randomly from a family of hash functions. A family of functions is called a universal family if, .
In other words, any two keys of the universe collide with probability at most when the hash function is drawn randomly from . This is exactly the probability of collision we would expect if the hash function assigned truly random hash codes to every key. Sometimes, the definition is relaxed to allow collision probability . This concept was introduced by Carter and Wegman in 1977, and has found numerous applications in computer science (see, for example ). If we have an upper bound of on the collision probability, we say that we have -almost universality.
Many, but not all, universal families have the following stronger uniform difference property:
- , when is drawn randomly from the family, the difference is uniformly distributed in . Note that the definition of universality is only concerned with whether, which counts collisions. The uniform difference property is stronger.
(Similarly, a universal family can be XOR universal if, the value is uniformly distributed in where is the bitwise exclusive or operation. This is only possible if is a power of two.)
An even stronger condition is pairwise independence: we have this property when we have the probability that will hash to any pair of hash values is as if they were perfectly random: . Pairwise independence is sometimes called strong universality.
Another property is uniformity. We say that a family is uniform if all hash values are equally likely: for any hash value . Universality does not imply uniformity. However, strong universality does imply uniformity.
Given a family with the uniform distance property, one can produce a pairwise independent or strongly universal hash family by adding a uniformly distributed random constant with values in to the hash functions. (Similarly, if is a power of two, we can achieve pairwise independence from an XOR universal hash family by doing an exclusive or with a uniformly distributed random constant.) Since a shift by a constant is sometimes irrelevant in applications (e.g. hash tables), a careful distinction between the uniform distance property and pairwise independent is sometimes not made.
For some applications (such as hash tables), it is important for the least significant bits of the hash values to be also universal. When a family is strongly universal, this is guaranteed: if is a strongly universal family with, then the family made of the functions for all is also strongly universal for . Unfortunately, the same is not true of (merely) universal families. For example the family made of the identity function is clearly universal, but the family made of the function fails to be universal.
Read more about this topic: Universal Hashing
Famous quotes containing the word introduction:
“We used chamber-pots a good deal.... My mother ... loved to repeat: When did the queen reign over China? This whimsical and harmless scatological pun was my first introduction to the wonderful world of verbal transformations, and also a first perception that a joke need not be funny to give pleasure.”
—Angela Carter (19401992)
“For better or worse, stepparenting is self-conscious parenting. Youre damned if you do, and damned if you dont.”
—Anonymous Parent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)
“The role of the stepmother is the most difficult of all, because you cant ever just be. Youre constantly being testedby the children, the neighbors, your husband, the relatives, old friends who knew the childrens parents in their first marriage, and by yourself.”
—Anonymous Stepparent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)