新人剛使用 git 的時候,就像去到一個既不識當地文字也不會說當地語言的陌生的國家。只要你知道你在什麼地方、要去哪裏,一切都 OK,而一旦你迷路,麻煩就來了。
網上已經有許多關於學習基本的 git 命令的文章,但是本文不屬於這一類,而是嘗試另闢蹊徑。
新手總是被 git 嚇到,事實上也很難不被嚇到。可以肯定的是 git 是很強大的工具但還不夠友好。大量的新概念,有些命令用文件做參數和不用文件做參數各自執行的動作截然不同,還有隱晦的回饋等…
我以爲克服第一道難關的方法就是不僅僅是使用 git commit/push 就完了。如果我們花點時間去真正瞭解到底git是由什麼構造的,那將會省去不少麻煩。
初探 .git
那麼我們開始吧。當你創建一個倉庫的時候,使用 git init 指令, git 將會創建一個神奇的目錄:.git。這個目錄下包含了所有 git 正常工作所需要的信息。說白一點,如果你想從你的項目中刪除 git 但是又要保留項目文件,只需要刪除 .git 文件夾就可以了。但是,你確定要辣麼做?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ├── HEAD ├── branches ├── config ├── description ├── hooks │ ├── pre-commit.sample │ ├── pre-push.sample │ └── ... ├── info │ └── exclude ├── objects │ ├── info │ └── pack └── refs ├── heads └── tags |
這就是你第一次提交之前 .git 目錄的樣子:
- 這個我們稍後會討論
- 這個文件包含你倉庫的設置信息。例如這裏會放你遠程倉庫的 URL,你的 email 地址,你的用戶名等…。 每次你在控制檯使用“git config…”指令時,修改的就是這裏。
- gitweb(可以說是 github 的前身)用來顯示倉庫的描述。
- 這是一個有意思的特性。Git 提供了一系列的腳本,你可以在 git 每一個有實質意義的階段讓它們自動運行。這些腳本就是 hooks,可以在 commit/rebase/pull…. 的前後運行。腳本的名字表示它什麼時候被運行。例如一個有用的預推送 hook 可能會測試關於保持遠程倉庫一致性的式樣原則。
- 你可以把你不想讓 git 處理的文件放到 .gitignore 文件裏。那麼,exclude 文件也有同樣的作用,不同的地方是它不會被共享,比如當你不想跟蹤你的自定義的 IDE 相關的配置文件時,即使通常情況下 .gitignore 就足夠了(如果你用到了這個請在評論中告訴我)。
commit 的真相
每一次你創建一個文件並跟蹤它會發現,git 會對其進行壓縮然後以 git 自己的數據結構形式來存儲。這個壓縮的對象會有一個唯一的名字,即一個哈希值,這個值存放在 object 目錄下。
在探索 object 目錄前,我們先要問自己 commit 到底是何方神聖。commit 大致可以視爲你工作目錄的快照,但是它又不僅僅只是一種快照。
實際上,當你提交的時候,爲創建你工作目錄的快照 git 只做了兩件事:
- 如果這個文件沒有改變,git 僅僅只把壓縮文件的名字(就是哈希值)放入快照。
- 如果文件發生了變化,git 會壓縮它,然後把壓縮後的文件存入 object 目錄。最後再把壓縮文件的名字(哈希值)放入快照。
這裏只是簡單介紹,整個過程有一點複雜,以後的博客裏會作說明的。
一旦快照創建好,其本身也會被壓縮並且以一個哈希值命名。那麼所有的壓縮對象都放在哪裏呢?答案是object 目錄。
1
2
3
4
5
6
7
8
|
├──
4c
│
└──
f44f1e3fe4fb7f8aa42138c324f63f5ac85828
//
hash
├──
86
│
└──
550c31847e518e1927f95991c949fc14efc711
//
hash
├──
e6
│
└──
9de29bb2d1d6434b8b29ae775ad8c2e48c5391
//
hash
├──
info
//
let's ignore that
└──
pack
//
let's ignore that too
|
這就是我創建一個空文件 file_1.txt 並提交後 object 目錄看起來的樣子。請注意如果你的文件的哈希值是“89faaee…”,git 會把這個文件存在 “89” 目錄下然後命名這個文件爲 “faaee…”。
你會看到3個哈希。一個對應 file_1.txt ,另一個對應在提交時所創建的快照。那麼第三個是什麼呢?其實是因爲 commit 本身也是一個對象並且也被壓縮存放在 object 目錄下。
現在,你需要記住的是一個 commit 包含四個部分:
- 工作目錄快照的哈希
- 提交的說明信息
- 提交者的信息
- 父提交的哈希值
如果我們解壓縮一個提交,你自己可以看看到底是什麼:
1 2 3 4 | // by looking at the history you can easily find your commit hash // you also don't have to paste the whole hash, only enough // characters to make the hash unique git cat-file -p 4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828 |
這是我看到的
1
2
3
4
|
tree
86550c31847e518e1927f95991c949fc14efc711
author
Pierre
De
Wulf
<test@gmail.com>
1455775173
-0500
committer
Pierre
De
Wulf
<test@gmail.com>
1455775173
-0500
commit
A
|
如你所見我們得到了所期望看到的的:快照的哈希,作者,提交信息。這裏有兩樣東西很重要:
- 正如預料的一樣,快照的哈希 “86550…” 也是一個對象並且能在object目錄下找到。
- 因爲這是我的第一個提交,所以沒有父提交。
那我的快照裏面到底是些什麼呢?
1 2 | git cat-file -p 86550c31847e518e1927f95991c949fc14efc711 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 file_1.txt |
到這裏我們看到的最後一個對象是我們先前提到的唯一會存在於快照中的對象。它是一個 blob(二進制文件),這裏就不作深究了。
分支,標籤,HEAD 都是一家人
那麼現在你知道 git 的每一個對象都有一個正確的哈希值。現在我們來看看 HEAD 吧!那麼,在 HEAD 裏又有什麼呢?
1
2
|
cat
HEAD
ref:
refs/heads/master
|
這看起來 HEAD 不是一個hash,倒是容易理解,因爲 HEAD 可以看作一個你目前所在分支的指針。如果我們看看 refs/heads/master,就會發現這些:
1
2
|
cat
refs/heads/master
4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828
|
是不是很熟悉?是的,這和我們第一個提交的哈希完全一樣。由此表明分支和標籤就是一個提交的指針。明白這一點你就可以刪除所有你想刪除的分支和標籤,而他們指向的提交依然在那裏。只是有點難以被訪問到。如果你想對這部分了解更多,請參考git book。
尾聲
到目前爲止你應該瞭解到, git 所做的事就是當你提交的時候“壓縮”當前的工作目錄,同時將其和其他一些信息一併存入 objects 目錄。但是如果你足夠了解 git 的話,你就能完全控制提交時哪些文件應該放進去而哪些不應該放。
我的意思是,一個提交併非真正意義上是一個你當前工作目錄的快照,而是一個你想提交的文件的快照。在提交之前 git 把你想提交的文件放在哪裏? git 把他們放在 index 文件裏。
關於index文件:
以下摘自git/Documentation/technical/index-format.txt
Git index format | |
================ | |
== The Git index file has the following format | |
All binary numbers are in network byte order. Version 2 is described | |
here unless stated otherwise. | |
- A 12-byte header consisting of | |
4-byte signature: | |
The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") | |
4-byte version number: | |
The current supported versions are 2, 3 and 4. | |
32-bit number of index entries. | |
- A number of sorted index entries (see below). | |
- Extensions | |
Extensions are identified by signature. Optional extensions can | |
be ignored if Git does not understand them. | |
Git currently supports cached tree and resolve undo extensions. | |
4-byte extension signature. If the first byte is 'A'..'Z' the | |
extension is optional and can be ignored. | |
32-bit size of the extension | |
Extension data | |
- 160-bit SHA-1 over the content of the index file before this | |
checksum. | |
== Index entry | |
Index entries are sorted in ascending order on the name field, | |
interpreted as a string of unsigned bytes (i.e. memcmp() order, no | |
localization, no special casing of directory separator '/'). Entries | |
with the same name are sorted by their stage field. | |
32-bit ctime seconds, the last time a file's metadata changed | |
this is stat(2) data | |
32-bit ctime nanosecond fractions | |
this is stat(2) data | |
32-bit mtime seconds, the last time a file's data changed | |
this is stat(2) data | |
32-bit mtime nanosecond fractions | |
this is stat(2) data | |
32-bit dev | |
this is stat(2) data | |
32-bit ino | |
this is stat(2) data | |
32-bit mode, split into (high to low bits) | |
4-bit object type | |
valid values in binary are 1000 (regular file), 1010 (symbolic link) | |
and 1110 (gitlink) | |
3-bit unused | |
9-bit unix permission. Only 0755 and 0644 are valid for regular files. | |
Symbolic links and gitlinks have value 0 in this field. | |
32-bit uid | |
this is stat(2) data | |
32-bit gid | |
this is stat(2) data | |
32-bit file size | |
This is the on-disk size from stat(2), truncated to 32-bit. | |
160-bit SHA-1 for the represented object | |
A 16-bit 'flags' field split into (high to low bits) | |
1-bit assume-valid flag | |
1-bit extended flag (must be zero in version 2) | |
2-bit stage (during merge) | |
12-bit name length if the length is less than 0xFFF; otherwise 0xFFF | |
is stored in this field. | |
(Version 3 or later) A 16-bit field, only applicable if the | |
"extended flag" above is 1, split into (high to low bits). | |
1-bit reserved for future | |
1-bit skip-worktree flag (used by sparse checkout) | |
1-bit intent-to-add flag (used by "git add -N") | |
13-bit unused, must be zero | |
Entry path name (variable length) relative to top level directory | |
(without leading slash). '/' is used as path separator. The special | |
path components ".", ".." and ".git" (without quotes) are disallowed. | |
Trailing slash is also disallowed. | |
The exact encoding is undefined, but the '.' and '/' characters | |
are encoded in 7-bit ASCII and the encoding cannot contain a NUL | |
byte (iow, this is a UNIX pathname). | |
(Version 4) In version 4, the entry path name is prefix-compressed | |
relative to the path name for the previous entry (the very first | |
entry is encoded as if the path name for the previous entry is an | |
empty string). At the beginning of an entry, an integer N in the | |
variable width encoding (the same encoding as the offset is encoded | |
for OFS_DELTA pack entries; see pack-format.txt) is stored, followed | |
by a NUL-terminated string S. Removing N bytes from the end of the | |
path name for the previous entry, and replacing it with the string S | |
yields the path name for this entry. | |
1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes | |
while keeping the name NUL-terminated. | |
(Version 4) In version 4, the padding after the pathname does not | |
exist. | |
Interpretation of index entries in split index mode is completely | |
different. See below for details. | |
== Extensions | |
=== Cached tree | |
Cached tree extension contains pre-computed hashes for trees that can | |
be derived from the index. It helps speed up tree object generation | |
from index for a new commit. | |
When a path is updated in index, the path must be invalidated and | |
removed from tree cache. | |
The signature for this extension is { 'T', 'R', 'E', 'E' }. | |
A series of entries fill the entire extension; each of which | |
consists of: | |
- NUL-terminated path component (relative to its parent directory); | |
- ASCII decimal number of entries in the index that is covered by the | |
tree this entry represents (entry_count); | |
- A space (ASCII 32); | |
- ASCII decimal number that represents the number of subtrees this | |
tree has; | |
- A newline (ASCII 10); and | |
- 160-bit object name for the object that would result from writing | |
this span of index as a tree. | |
An entry can be in an invalidated state and is represented by having | |
a negative number in the entry_count field. In this case, there is no | |
object name and the next entry starts immediately after the newline. | |
When writing an invalid entry, -1 should always be used as entry_count. | |
The entries are written out in the top-down, depth-first order. The | |
first entry represents the root level of the repository, followed by the | |
first subtree--let's call this A--of the root level (with its name | |
relative to the root level), followed by the first subtree of A (with | |
its name relative to A), ... | |
=== Resolve undo | |
A conflict is represented in the index as a set of higher stage entries. | |
When a conflict is resolved (e.g. with "git add path"), these higher | |
stage entries will be removed and a stage-0 entry with proper resolution | |
is added. | |
When these higher stage entries are removed, they are saved in the | |
resolve undo extension, so that conflicts can be recreated (e.g. with | |
"git checkout -m"), in case users want to redo a conflict resolution | |
from scratch. | |
The signature for this extension is { 'R', 'E', 'U', 'C' }. | |
A series of entries fill the entire extension; each of which | |
consists of: | |
- NUL-terminated pathname the entry describes (relative to the root of | |
the repository, i.e. full pathname); | |
- Three NUL-terminated ASCII octal numbers, entry mode of entries in | |
stage 1 to 3 (a missing stage is represented by "0" in this field); | |
and | |
- At most three 160-bit object names of the entry in stages from 1 to 3 | |
(nothing is written for a missing stage). | |
=== Split index | |
In split index mode, the majority of index entries could be stored | |
in a separate file. This extension records the changes to be made on | |
top of that to produce the final index. | |
The signature for this extension is { 'l', 'i', 'n', 'k' }. | |
The extension consists of: | |
- 160-bit SHA-1 of the shared index file. The shared index file path | |
is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the | |
index does not require a shared index file. | |
- An ewah-encoded delete bitmap, each bit represents an entry in the | |
shared index. If a bit is set, its corresponding entry in the | |
shared index will be removed from the final index. Note, because | |
a delete operation changes index entry positions, but we do need | |
original positions in replace phase, it's best to just mark | |
entries for removal, then do a mass deletion after replacement. | |
- An ewah-encoded replace bitmap, each bit represents an entry in | |
the shared index. If a bit is set, its corresponding entry in the | |
shared index will be replaced with an entry in this index | |
file. All replaced entries are stored in sorted order in this | |
index. The first "1" bit in the replace bitmap corresponds to the | |
first index entry, the second "1" bit to the second entry and so | |
on. Replaced entries may have empty path names to save space. | |
The remaining index entries after replaced ones will be added to the | |
final index. These added entries are also sorted by entry name then | |
stage. | |
== Untracked cache | |
Untracked cache saves the untracked file list and necessary data to | |
verify the cache. The signature for this extension is { 'U', 'N', | |
'T', 'R' }. | |
The extension starts with | |
- A sequence of NUL-terminated strings, preceded by the size of the | |
sequence in variable width encoding. Each string describes the | |
environment where the cache can be used. | |
- Stat data of $GIT_DIR/info/exclude. See "Index entry" section from | |
ctime field until "file size". | |
- Stat data of core.excludesfile | |
- 32-bit dir_flags (see struct dir_struct) | |
- 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file | |
does not exist. | |
- 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does | |
not exist. | |
- NUL-terminated string of per-dir exclude file name. This usually | |
is ".gitignore". | |
- The number of following directory blocks, variable width | |
encoding. If this number is zero, the extension ends here with a | |
following NUL. | |
- A number of directory blocks in depth-first-search order, each | |
consists of | |
- The number of untracked entries, variable width encoding. | |
- The number of sub-directory blocks, variable width encoding. | |
- The directory name terminated by NUL. | |
- A number of untracked file/dir names terminated by NUL. | |
The remaining data of each directory block is grouped by type: | |
- An ewah bitmap, the n-th bit marks whether the n-th directory has | |
valid untracked cache entries. | |
- An ewah bitmap, the n-th bit records "check-only" bit of | |
read_directory_recursive() for the n-th directory. | |
- An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data | |
is valid for the n-th directory and exists in the next data. | |
- An array of stat data. The n-th data corresponds with the n-th | |
"one" bit in the previous ewah bitmap. | |
- An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit | |
in the previous ewah bitmap. | |
- One NUL. |