Rsync algorithm

The rsync algorithm is an algorithm (invented by Australian computer programmer Andrew Tridgell) for efficiently transmitting a file across a communications link when the receiving computer already has a different version of the same file. It is used by the rsync file transfer protocol and utility.

The recipient splits its copy of the file into fixed-size non-overlapping chunks, say of size S, and computes two checksums of each chunk: the MD4 hash, and a weaker 'rolling checksum'. It sends these checksums to the sender.

The sender computes the rolling checksum for every chunk of size S in its own version of the file, even overlapping chunks. This can be done efficiently because of a special property of the rolling checksum: If the rolling checksum of bytes n through n+S-1 is R, it is easy to compute the rolling checksum of bytes n+1 through n+S from R, byte n, and byte n+S; the intervening bytes need not be reexamined.

The sender then compares its rolling checksums with the set sent by the recipient to determine if there are any matches. If there are, the match is verified by computing the MD4 checksum for the matching block and comparing it with the MD4 checksum sent by the recipient.

The sender then sends the recipient the parts of its file that didn't match any of the recipient's blocks, and assembly instructions about how to merge these blocks into the recipient's version to create a file identical to the sender's copy.

If the sender's and recipient's versions of the file have many sections in common, relatively little data needs to be transferred to synchronize the two files.

External Links




copyright 2004 FactsAbout.com