Git Essentials(Second Edition)
上QQ阅读APP看书,第一时间看更新

Git doesn't use deltas

Now it's time to investigate another well-known difference between Git and other versioning systems. Take Subversion as an example: when you do a new commit, Subversion creates a new numbered revision that only contains deltas between the previous one; this is a smart way to archive changes to files, especially among big text files, because if only a line of text changes, the size of the new commit will be much smaller.

Instead, in Git even if you change only a char in a big text file, it always stores a new version of the file: Git doesn't do deltas (at least not in this case), and every commit is actually a snapshot of the entire repository.

At this point, people usually exclaim: "Gosh, Git waste a large amount of disk space in vain!". Well, this is simply untrue.

In a common source code repository, with a certain amount of commit, Git usually won't need more space than other versioning systems. As an example, when Mozilla went from Subversion to Git, the exact same repository went from 12GB to 420MB disk space required; look at this comparison page to learn more: https://git.wiki.kernel.org/index.php/GitSvnComparsion

Furthermore, Git has a clever way to deal with files; let's take a look again at the last commit:

[28] ~/grocery (master)
$ git cat-file -p e4a5e7b
tree 4c931e9fd8ca4581ddd5de9efd45daf0e5c300a0
parent a57d783905e6a35032d9b0583f052fb42d5a1308
author Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1503586854 +0200
committer Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1503586854 +0200

Add an apple

Okay, now to the tree:

[29] ~/grocery (master)
$ git cat-file -p 4c931e9
100644 blob 907b75b54b7c70713a79cc6b7b172fb131d3027d README.md
100644 blob e4ceb844d94edba245ba12246d3eb6d9d3aba504 shoppingList.txt

Annotate the two hashes on a notepad; now we have to look at the tree of the first commit; cat-file the commit:

[30] ~/grocery (master)
$ git cat-file -p a57d783
tree a31c31cb8d7cc16eeae1d2c15e61ed7382cebf40
author Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1502970693 +0200
committer Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1502970693 +0200
Add a banana to the shopping list

Then cat-file the tree:

[31] ~/grocery (master)
$ git cat-file -p a31c31c
100644 blob 907b75b54b7c70713a79cc6b7b172fb131d3027d README.md
100644 blob 637a09b86af61897fb72f26bfb874f2ae726db82 shoppingList.txt

Guess what! The hash of the README.md file is the same in the two trees of the first and second commit; this allows us to understand another simple but clever strategy that Git adopts to manage files; when a file is untouched, while committing Git creates a tree where the blob for the file points to the already existing one, recycling it and avoiding waste of disk space.

The same applies to the trees: if my working directory has some folders and files within them that will remain untouched, when we do a new commit Git recycles the same trees.