Pearson Hashing

Pearson hashing is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte lookup table containing a permutation of the values 0 through 255.

This hash function is a CBC-MAC that uses an 8-bit random block cipher implemented via the permutation table. An 8-bit block cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong; but it offers these benefits:

  • It is extremely simple.
  • It executes quickly on resource-limited processors.
  • There is no simple class of inputs for which collisions (identical outputs) are especially likely.
  • Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.

The algorithm can be described by the following pseudocode, which computes the hash of message C using the permutation table T:

h := 0 for each c in C loop index := h xor c h := T end loop return h

Famous quotes containing the word pearson:

    Misquotation is, in fact, the pride and privilege of the learned. A widely-read man never quotes accurately, for the rather obvious reason that he has read too widely.
    —Hesketh Pearson (1887–1964)