Rolling Hash - Content Based Slicing Using Rabin-Karp Hash

Content Based Slicing Using Rabin-Karp Hash

One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network and a simple byte addition at the front of the file would cause all the fixed size windows to become updated, while in reality, only the first ‘chunk’ has been modified.

The simplest approach to calculate the dynamic chunks is to calculate the rolling hash and if it matches a pattern (like the lower N bits are all zeroes) then it’s a chunk boundary. This approach will ensure that any change in the file will only affect it’s current and possibly the next chunk, but nothing else.

When the boundaries are known, the chunks need to be compared by their hash values to detect which one was modified and needs transfer across the network.

Read more about this topic:  Rolling Hash

Famous quotes containing the words content and/or based:

    Thoughts tending to content flatter themselves
    That they are not the first of fortune’s slaves,
    Nor shall not be the last, like silly beggars
    Who, sitting in the stocks, refuge their shame
    That many have and others must sit there,
    And in this thought they find a kind of ease.
    William Shakespeare (1564–1616)

    Italy is such a delightful place to live in if you happen to be a man. There one may enjoy that exquisite luxury of Socialism—that true Socialism which is based not on equality of income or character, but on the equality of manners. In the democracy of the caffè or the street the great question of our life has been solved, and the brotherhood of man is a reality. But it is accomplished at the expense of the sisterhood of women.
    —E.M. (Edward Morgan)