原文地址:http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql
介紹
許多人都遇到過需要在MySQL中處理樹狀結構數據的情況,毫無疑問管理樹狀結構並不是關係型數據庫的強項。關係型數據庫的表並不是樹狀結構(比如XML)而是一種扁平結構。樹狀結構中的”父–子”關係並不被MySQL天生支持。
在我們看來,樹狀結構是一類數據的集合,集合中每一個元素都有一個父節點和零或多個子節點(除了根節點,根節點沒有父節點)。樹狀結構在很多程序中都會出現,比如論壇、郵件列表、公司組織架構圖、內容分類管理、產品目錄等。在這裏,我們使用下面的產品分來目錄來虛構一個電子產品目錄:
上面的分類構成了一種樹狀結構,在這篇文章中我們將使用2種方法來在MySQL中操作它。首先使用傳統的鄰接表的方式。
鄰接表
典型的分類示例將以下面的結構存儲在表中(下面包含了完整的create和insert操作,所以你可以自行測試):
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);
SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
10 rows in set (0.00 sec)
在鄰接表中,每一條數據都有一個字段指向它的父節點。本例中的根節點ELECTRONICS的parent則爲NULL。鄰接表的特點就是簡單,可以很容易的看出FLASH是MP3 PLAYERS的子節點。同時MP3 PLAYERS是PORTABLE ELECTRONICS的子節點,而PORTABLE ELECTRONICS又是ELECTRONICS的子節點。儘管鄰接表可以在客戶端代碼被很容易的處理,但對於原生的SQL語句則可能會產生問題。
遍歷樹
在操作樹狀結構時最常見的一個操作就是以縮進的形式來展示整個樹,使用SQL語句實現的最常見方法就是使用join語句:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)
找出所有葉子節點
我們可以使用LEFT JOIN語句來找出所有的葉子節點(沒有子節點的節點)
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
尋找單一路徑
可以使用JOIN語句在樹狀結構中查看某一條路徑:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';
+-------------+----------------------+-------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+-------------+-------+
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
+-------------+----------------------+-------------+-------+
1 row in set (0.01 sec)
這種方式最大的缺點就是每一層查詢都要使用JOIN語句,隨着樹狀結構的層次越來越複雜,查詢的性能也越來越低下。
鄰接表的侷限性
使用原生SQL語句操作鄰接表是十分困難的,在我們知道某個節點在分類中的路徑之前,我們必須知道它所在的層級。除此之外,在執行刪除操作時要額外的小心,因爲這個操作可能潛在的使整個子樹都變成孤兒節點(比如刪除了PORTABLE ELECTRONICS則它所有的子節點都會成爲孤兒節點)。這些限制可以通過客戶端代碼或存儲過程來解決。在程序代碼中,我們可以通過從樹底部向上遍歷返回完整的樹或一個單一路徑,也可以通過重新指定一個父節點並對其他子節點進行重新排序來讓他們指向新的父節點的方式來刪除一個節點而不產生孤兒節點。
嵌套集
這篇文章中我想重點解釋一下嵌套集(Nested Set Model)。在嵌套集中,我們可以用一種新的視角來看待樹狀結構。不是使用節點或行,而是使用嵌套的容器。想想一下我們的結構如下:
請注意我們仍然保持了樹狀結構,作爲父節點的分類包含了子節點。我們使用左值和右值的方式來在表中表現節點的層級關係:
CREATE TABLE nested_category (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
);
INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4),
(4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13),
(9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);
SELECT * FROM nested_category ORDER BY category_id;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 20 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
+-------------+----------------------+-----+-----+
我們使用lft和rgt是因爲left和right是SQL中的關鍵字,在這裏可以看到完整的關鍵字列表。
那麼,我們如何來確定節點的左值和右值呢?我們從左向右依次進行編號:
這種設計可以使用樹狀結構來展示:
構建這種樹我們需要從左向右、每次一層的向下遍歷其子節點,對於葉子節點則指定其右值並移動到其右邊的兄弟節點。這種方法被稱爲“前序遍歷樹算法變異版”(modified preorder tree traversal algorithm)。
遍歷樹
我們可以基於這樣一個前提遍歷整個樹:一個節點的左值總是處在父節點的左值和右值之間:
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+----------------------+
與前面說的鄰接表不同,這裏的查詢不會考慮樹的深度。也無需考慮節點的右值因爲右值永遠小於父節點的右值。
找出所有葉子節點
在嵌套集中找出所有葉子節點比在鄰接表中使用JOIN查詢簡單的多。如果你仔細觀察,會發現葉子節點的左值和右值永遠是連續的,所以找到葉子節點,我們僅需找到rgt = lft + 1的節點即可:
SELECT name
FROM nested_category
WHERE rgt = lft + 1;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
尋找單一路徑
在嵌套集中我們可以尋找單一路徑而不使用大量的JOIN操作:
SELECT parent.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'FLASH'
ORDER BY parent.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
+----------------------+
獲取節點深度
我們已經知道了如何遍歷整個樹,但如何表示每個節點在樹中的深度呢?如何更好的識別每個節點所處的層次呢?這裏可以使用COUNT以及GROUP BY來實現:
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| ELECTRONICS | 0 |
| TELEVISIONS | 1 |
| TUBE | 2 |
| LCD | 2 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 1 |
| MP3 PLAYERS | 2 |
| FLASH | 3 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 2 |
+----------------------+-------+
也可以使用depth結合CONCAT以及REPEAT函數來在前面添加空格:
SELECT CONCAT( REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+-----------------------+
當然,在客戶端程序中你可能更喜歡直接使用depth來展示樹狀結構,WEB開發人員可以通過循環來遍歷樹,使用depth控制
子樹的深度
當我們需要子樹的深度信息時,我們既不能限制表中的子節點也不能限制父節點,因爲這樣做會破壞結果。相反的,我們添加一個新的自關聯查詢來構造一個子樹,並將根節點指向構造出來的子樹後進行深度的查詢:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| FLASH | 2 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
這個函數可以被應用到任何節點上,包括根節點。獲得的深度總是相對於給出的節點。
找到一個節點的直屬子節點
想像一下你要在一個網站上展示電子產品的分類,當用戶點擊某個分類後,你想展示這個分類的直屬子分類而不是全部的子分類(ps:即相對這個節點深度爲1的子節點)。例如,當展示PORTABLE ELECTRONICS分類時,我們僅想展示MP3 PLAYERS、CD PLAYERS、2 WAY RADIOS,而不包括FLASH。
這可以通過添加HAVING關鍵字來實現:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth <= 1
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
如果你不想展示父節點,把HAVING depth <= 1替換成HAVING depth = 1即可。
在嵌套集中使用聚合函數
首先添加一個product表來方便演示
CREATE TABLE product
(
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(40),
category_id INT NOT NULL
);
INSERT INTO product(name, category_id) VALUES('20" TV',3),('36" TV',3),
('Super-LCD 42"',4),('Ultra-Plasma 62"',5),('Value Plasma 38"',5),
('Power-MP3 5gb',7),('Super-Player 1gb',8),('Porta CD',9),('CD To go!',9),
('Family Talk 360',10);
SELECT * FROM product;
+------------+-------------------+-------------+
| product_id | name | category_id |
+------------+-------------------+-------------+
| 1 | 20" TV | 3 |
| 2 | 36" TV | 3 |
| 3 | Super-LCD 42" | 4 |
| 4 | Ultra-Plasma 62" | 5 |
| 5 | Value Plasma 38" | 5 |
| 6 | Power-MP3 128mb | 7 |
| 7 | Super-Shuffle 1gb | 8 |
| 8 | Porta CD | 9 |
| 9 | CD To go! | 9 |
| 10 | Family Talk 360 | 10 |
+------------+-------------------+-------------+
下面我們進行一個查詢,來統計每個分類下的產品數量:
SELECT parent.name, COUNT(product.name)
FROM nested_category AS node ,
nested_category AS parent,
product
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.category_id = product.category_id
GROUP BY parent.name
ORDER BY node.lft;
+----------------------+---------------------+
| name | COUNT(product.name) |
+----------------------+---------------------+
| ELECTRONICS | 10 |
| TELEVISIONS | 5 |
| TUBE | 2 |
| LCD | 1 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 5 |
| MP3 PLAYERS | 2 |
| FLASH | 1 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 1 |
+----------------------+---------------------+
這是一種典型的示例,展示了使用COUNT、GROUP BY以及WHERE語句和product表進行關聯查詢。正如你看到的,每一個分類的數量都被統計出來了並反映在父類別中。
添加節點
之前我們學習瞭如何查詢,現在來看看如何新增一個節點。讓我們再看一下嵌套集的示例圖:
如果我們想在TELEVISIONS和PORTABLE ELECTRONICS節點之間添加一個新節點,這個新節點左值應該是10而右值爲11,而它右邊所有的節點的值都應該加2。我們可以在MySQL5中使用存儲過程來實現,這裏我假設MySQL版本爲4.1(譯者注:這是一篇舊文,寫作時mysql穩定版本還是4.1)。使用下面的語句:
LOCK TABLE nested_category WRITE;
SELECT @myRight := rgt FROM nested_category
WHERE name = 'TELEVISIONS';
UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myRight;
INSERT INTO nested_category(name, lft, rgt) VALUES('GAME CONSOLES', @myRight + 1, @myRight + 2);
UNLOCK TABLES;
我們可以來檢查一下:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| GAME CONSOLES |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+-----------------------+
如果我們想給一個葉子節點添加子節點,需要修改一下代碼。這裏我們給2 WAY RADIOS添加一個FRS做子節點:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft FROM nested_category
WHERE name = '2 WAY RADIOS';
UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myLeft;
INSERT INTO nested_category(name, lft, rgt) VALUES('FRS', @myLeft + 1, @myLeft + 2);
UNLOCK TABLES;
在這個示例中我們修改了新的父節點的右值,正如所見,新的節點被插入了正確的位置:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| GAME CONSOLES |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
刪除節點
最後一個基礎操作就是刪除節點。刪除節點的行爲取決於被刪除節點在樹狀結構中所處的層級,刪除葉子節點比刪除子節點容易,因爲不用考慮孤兒節點的問題。
當刪除葉子節點時,操作僅僅和添加節點相反,代碼如下:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'GAME CONSOLES';
DELETE FROM nested_category WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - @myWidth WHERE lft > @myRight;
UNLOCK TABLES;
我們來執行查詢操作確認刪除成功:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
這個方法也可以用於刪除子節點以及這個節點的所有節點:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'MP3 PLAYERS';
DELETE FROM nested_category WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - @myWidth WHERE lft > @myRight;
UNLOCK TABLES;
再來查詢一遍,確認我們是否刪除了整個子樹:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
在某些情況下我們僅想刪除父節點而保留子節點,而這些子節點則被提升至和父節點平級:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'PORTABLE ELECTRONICS';
DELETE FROM nested_category WHERE lft = @myLeft;
UPDATE nested_category SET rgt = rgt - 1, lft = lft - 1 WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - 2 WHERE lft > @myRight;
UNLOCK TABLES;
在這裏,我們把這個節點的所有右側節點的值減2(因爲沒有子節點的寬度爲2),並且把這個節點的子節點的值減1(使用被刪除的父節點的左值來彌補差距)。再來確認一下操作是否成功:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+---------------+
| name |
+---------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+---------------+
其他的情形比如:移動節點到兄弟節點下、使一個子節點替代原來的父節點等,這裏不再進行說明。