Content Based Slicing Using Rabin-Karp Hash
One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network and a simple byte addition at the front of the file would cause all the fixed size windows to become updated, while in reality, only the first ‘chunk’ has been modified.
The simplest approach to calculate the dynamic chunks is to calculate the rolling hash and if it matches a pattern (like the lower N bits are all zeroes) then it’s a chunk boundary. This approach will ensure that any change in the file will only affect it’s current and possibly the next chunk, but nothing else.
When the boundaries are known, the chunks need to be compared by their hash values to detect which one was modified and needs transfer across the network.
Read more about this topic: Rolling Hash
Famous quotes containing the words content and/or based:
“No healthy man, in his secret heart, is content with his destiny. He is tortured by dreams and images as a child is tortured by the thought of a state of existence in which it would live in a candy store and have two stomachs.”
—H.L. (Henry Lewis)
“Our children evaluate themselves based on the opinions we have of them. When we use harsh words, biting comments, and a sarcastic tone of voice, we plant the seeds of self-doubt in their developing minds.... Children who receive a steady diet of these types of messages end up feeling powerless, inadequate, and unimportant. They start to believe that they are bad, and that they can never do enough.”
—Stephanie Martson (20th century)