Rolling Hash - Content Based Slicing Using Rabin-Karp Hash

Content Based Slicing Using Rabin-Karp Hash

One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network and a simple byte addition at the front of the file would cause all the fixed size windows to become updated, while in reality, only the first ‘chunk’ has been modified.

The simplest approach to calculate the dynamic chunks is to calculate the rolling hash and if it matches a pattern (like the lower N bits are all zeroes) then it’s a chunk boundary. This approach will ensure that any change in the file will only affect it’s current and possibly the next chunk, but nothing else.

When the boundaries are known, the chunks need to be compared by their hash values to detect which one was modified and needs transfer across the network.

Read more about this topic:  Rolling Hash

Famous quotes containing the words content and/or based:

    Strange that so few ever come to the woods to see how the pine lives and grows and spires, lifting its evergreen arms to the light,—to see its perfect success; but most are content to behold it in the shape of many broad boards brought to market, and deem that its true success! But the pine is no more lumber than man is, and to be made into boards and houses is no more its true and highest use than the truest use of a man is to be cut down and made into manure.
    Henry David Thoreau (1817–1862)

    A two-parent family based on love and commitment can be a wonderful thing, but historically speaking the “two-parent paradigm” has left an extraordinary amount of room for economic inequality, violence and male dominance.
    Stephanie Coontz (20th century)