Even deeper - the Git storage object model
Okay, now we know there are different Git objects, and we can inspect inside them using some plumbing commands. But how and where does Git store them?
Do you remember the .git folder? Let's put our nose inside it:
[20] ~/grocery (master) $ ll .git/ total 13 drwxr-xr-x 1 san 1049089 0 Aug 18 17:22 ./ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 ../ -rw-r--r-- 1 san 1049089 294 Aug 17 13:52 COMMIT_EDITMSG -rw-r--r-- 1 san 1049089 208 Aug 17 13:51 config -rw-r--r-- 1 san 1049089 73 Aug 17 11:11 description -rw-r--r-- 1 san 1049089 23 Aug 17 11:11 HEAD drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 hooks/ -rw-r--r-- 1 san 1049089 217 Aug 18 17:22 index drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 info/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 logs/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 objects/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 refs/
Within it, there is an objects subfolder; let's take a look:
[21] ~/grocery (master) $ ll .git/objects/ total 4 drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 ./ drwxr-xr-x 1 san 1049089 0 Aug 18 17:22 ../ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 63/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 90/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 a3/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 a5/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 c7/ drwxr-xr-x 1 san 1049089 0 Aug 17 11:11 info/ drwxr-xr-x 1 san 1049089 0 Aug 18 17:12 pack/
Other than info and pack folders, which are not interesting for us right now, as you can see there are some other folders with a strange two-character name; let's go inside the 63 folder:
[22] ~/grocery (master)
$ ll .git/objects/63/
total 1
drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 ./
drwxr-xr-x 1 san 1049089 0 Aug 18 17:15 ../
-r--r--r-- 1 san 1049089 20 Aug 17 13:34 7a09b86af61897fb72f26bfb874f2ae726db82
Hmmm...
Look at the file within it, and think: 63 + 7a09b86af61897fb72f26bfb874f2ae726db82 is actually the hash of our shoppingList.txt blob!
Git is amazingly smart and simple: to be quicker while searching through the filesystem, Git creates a set of folders where the name is two characters long, and those two characters represent the first two characters of a hash code; inside those folders, Git writes all the objects using as a name the other 38 characters of the hash, regardless of the kind of Git object.
So, the a31c31cb8d7cc16eeae1d2c15e61ed7382cebf40 tree is stored in the a3 folder, and the a57d783905e6a35032d9b0583f052fb42d5a1308 commit in the a5 one.
Isn't that the most clever and simple thing you have ever seen?
Now, if you try to inspect those files with a common cat command, you will be deluded: those files are plain text files, but Git compresses them using the zlib library to reserve space on your disk. This is why we use the git cat-file -p command, which decompresses them on the fly for us.
This highlights once again the simplicity of Git: no metadata, no internal databases, or useless complexity, but simple files and folders are enough to make it possible to manage any repository.
At this point, we know how Git stores objects, and where they are archived; we also know that there is no database, no central repository or stuff like that, so how is Git able to reconstruct the history of our repository? How can it define which commit precedes or follows another one?
To become aware of this, we need a new commit. So, let's now proceed modifying the shoppingList.txt file:
[23] ~/grocery (master) $ echo "apple" >> shoppingList.txt [24] ~/grocery (master) $ git add shoppingList.txt [25] ~/grocery (master) $ git commit -m "Add an apple" [master e4a5e7b] Add an apple 1 file changed, 1 insertion(+)
Use the git log command to check the new commit; the --oneline option allows us to see the log in a more compact way:
[26] ~/grocery (master) $ git log --oneline e4a5e7b Add an apple a57d783 Add a banana to the shopping list
Okay, we have a new commit, with its hash. Time to see what's inside it:
[27] ~/grocery (master) $ git cat-file -p e4a5e7b tree 4c931e9fd8ca4581ddd5de9efd45daf0e5c300a0 parent a57d783905e6a35032d9b0583f052fb42d5a1308 author Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1503586854 +0200 committer Ferdinando Santacroce <ferdinando.santacroce@gmail.com> 1503586854 +0200 Add an apple
There's something new!
I'm talking about the parent a57d783905e6a35032d9b0583f052fb42d5a1308 row; did you see? A parent of a commit is simply the commit that precedes it. In fact, the a57d783 hash is actually the hash of the first commit we made. So, every commit has a parent, and following these relations between commits, we can always navigate from a random one down to the first one, the already mentioned root commit.
If you remember, the first commit did not have a parent, and this is the main (and only) difference between all commits and the first one. Git, while navigating and reconstructing our repository, simply knows it is done when it finds a commit without a parent.