上週看MIT, <<Introduction to algorithms>> 的時候, 覺得B tree 實現起來有點麻煩, 正好可以練習一下。
花了一天時間,晃晃悠悠, 終於寫完了, 非常的沒有效率啊。
邏輯不復雜,但很多分支。 關鍵是下標容易出錯。 書上的刪除方法不是很明瞭, 還費了些周折。 在動手寫之前關鍵是明白B tree到底是怎麼實現的。
我先實現了insert, 因爲這個方法簡單一些, 也藉機加深對B tree的認識。 再實現的刪除方法。
關鍵的註釋在源代碼(下載) 裏面都有。insert寫到一半, 發現我患了一個低級錯誤: 方法的傳參混亂, 比如, insert()是在Node裏面, 所以, 必須由一個Node來調用, 而方法裏面又有要求傳遞一個Node作爲參數。 這樣一來的話, 一個node可以修改另一個node了。 如果要求一個node可以修改另一個node, 可以把方法設置爲static, 如果不用, 那麼參數就重複傳遞了。 但我沒有改過來, 後面所有的方法都是這麼一個模式:
node.method(node, ...)
我只是做了簡單的測試, insert方法是依次插入值[1,18], 每插入一個值, 和參考【1】的樹比較, 看是否一樣。 delete的測試是首先構建一個樹, 包含值[1,18], 然後從18到1, 依次刪除並驗證。
手都酸了。。
參考:
2. Collection of BTree info
3. MIT <<Introduction to algorithms>>
附, 參考3上提供的方法, 我加了一些註釋:
Insert
B-TREE-SPLIT-CHILD(x, i, y)
1 z ← ALLOCATE-NODE()
2 leaf[z] ← leaf[y]
3 n[z] ← t - 1
4 for j ← 1 to t - 1
5 do keyj[z] ← keyj+t[y]
6 if not leaf [y]
7 then for j ← 1 to t
8 do cj[z] ← cj+t[y]
9 n[y] ← t - 1
10 for j ← n[x] + 1 downto i + 1
11 do cj+1[x] ← cj [x]
12 ci+1[x] ← z
13 for j ← n[x] downto i
14 do keyj+1[x] ← keyj[x]
15 keyi[x] ← keyt[y]
16 n[x] ← n[x] + 1
17 DISK-WRITE(y)
18 DISK-WRITE(z)
19 DISK-WRITE(x)
B-TREE-INSERT(T, k)
1 r ← root[T]
2 if n[r] = 2t - 1
3 then s ← ALLOCATE-NODE()
4 root[T] ← s
5 leaf[s] ← FALSE
6 n[s] ← 0
7 c1[s] ← r
8 B-TREE-SPLIT-CHILD(s, 1, r)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)
B-TREE-INSERT-NONFULL(x, k)
1 i ← n[x]
2 if leaf[x]
3 then while i ≥ 1 and k < keyi[x]
4 do keyi+1[x] ← keyi[x]
5 i ← i - 1
6 keyi+1[x] ← k
7 n[x] ← n[x] + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < keyi[x]
10 do i ← i - 1
11 i ← i + 1
12 DISK-READ(ci[x])
13 if n[ci[x]] = 2t - 1
14 then B-TREE-SPLIT-CHILD(x, i, ci[x])
15 if k> keyi[x]
16 then i ← i + 1
17 B-TREE-INSERT-NONFULL(ci[x], k)
Deletion
There are two special cases to consider when deleting an element:
- the element in an internal node may be a separator for its child nodes
- deleting an element may put it under the minimum number of elements and children.
Algorithm
-
If the key k is in node x and x is a leaf, delete the key k from x.
-
If the key k is in node x and x is an internal node, do the following.
-
If the child y that precedes k in node x has at least t keys, then find the predecessor k′ of k in the subtree rooted at y. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the largest key of the left subtree (??????If y is a leaf within t keys, after the deletion, y has t - 1 keys. Then, it's possible that an element is deleted from y next time, which result in y 's key size to be t - 2, ??? see rule 3)
-
Symmetrically, if the child z that follows k in node x has at least t keys, then find the successor k′ of k in the subtree rooted at z. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the smallest key of the right subtree
-
Otherwise, if both y and z have only t - 1 keys, merge k and all of z into y, so that x loses both k and the pointer to z, and y now contains 2t - 1 keys. Then, free z and recursively delete k from y. that is, merge the children, that is, merge the two children
-
-
If the key k is not present in internal node x, determine the root ci[x] of the appropriate subtree that must contain k, if k is in the tree at all. If ci[x] has only t - 1 keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least t keys. Then, finish by recursing on the appropriate child of x.(while traversing down)
-
If ci[x] has only t - 1 keys but has an immediate sibling with at least t keys, give ci[x] an extra key by moving a key from x down into ci[x], moving a key from ci[x]'s immediate left or right sibling up into x, and moving the appropriate child pointer from the sibling into ci[x].
-
If ci[x] and both of ci[x]'s immediate siblings have t - 1 keys, merge ci[x] with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node.
-
------borrow an element from the sibling, otherwise, merge the sibling and the key between the sibling and ci[x]. that is, to ensure the lower bound of every node(the special case 2)