十八般武藝玩轉GaussDB(DWS)性能調優(三):好味道表定義

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"摘要"},{"type":"text","text":":表結構設計是數據庫建模的一個關鍵環節,表定義好壞直接決定了集羣的有效容量以及業務查詢性能,本文從產品架構、功能實現以及業務特徵的角度闡述在GaussDB(DWS)的中表定義時需要關注的一些關鍵因素。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)是企業級的大規模並行處理關係型數據庫,採用Shared-nothing架構的MPP(Massive Parallel Processing)系統,支持PB級別數據量的處理,適用於詳單查詢、數據倉庫、混合負載和大數據分析等場景。Shared-nothing架構天然支持數據打散分佈到各個數據節點(DataNode)以及多節點協同計算機制,同時這種機制對錶定義涉及提出了更高的訴求,表定義會直接影響集羣的有效容量以及業務查詢性能。本文從產品架構、功能實現以及業務特徵的角度闡述GaussDB(DWS)的中表定義需要關注的一些關鍵因素。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 存儲方式設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)支持行存儲(row-based storage)和列存儲(column-based storage)兩種存儲方式,這兩種存儲格式分別適用不同的業務場景。通常來講典型的點查詢爲主的場景推薦使用行存儲,典型的統計分析型業務推薦使用列存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.1"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"行存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行存儲模式下,一條數據的所有列組合在一起稱之爲一個tuple多個tuple組成一個page,所有的page構成表的數據文件。pages是行存數據存取的最小單元,一個page默認8KB。page的基本結構如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/03/039cbc73b5812ab135a6392422417aa9.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行存儲模式下,所有數據列集中存儲在一個tuple中,所以行存儲的更新(UPDATE)、刪除(DELETE)、索引點查性能較好,但是當查詢列只涉及所有列的很少一部分的時候,所有列的數據也都會被讀取,導致大量的無效IO,因此推薦比較簡單點查場景或者存在頻繁的數據更新的業務採用行存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.2"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"列存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列存儲下把數據表中的每一列單獨存儲,每個列的 6w條數據組成一個CU,每個列的所有的CU構成一個列的數據文件,每個列都會有單獨的數據文件。CU的基本結構如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81a29a00216ebc6177dc6c2f265d2856.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列數據之間具有更高的相似度,所以列存儲的壓縮性能更好。當只查詢少量列且查詢數據量較大時,列存儲的IO性能收益很明顯。因爲數據分列存儲,導致更新(UPDATE)、刪除(DELETE)、索引點查性的時候需要訪問或者刷新更多的文件,導致大量的隨機IO;因此相比行存儲,列存儲的更新、刪除、索引點查詢的性能較差。同時列存儲天然的可以跟向量化執行引擎對接,在表關聯、匯聚等重計算場景下可以使用向量化執行引擎提升計算性能,因此統計分析等重IO和重計算型業務推薦使用列存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.3"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"表存儲方式選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表的存儲類型是表定義設計的第一步,"},{"type":"text","marks":[{"type":"strong"}],"text":"客戶業務屬性是表的存儲類型的決定性因素"},{"type":"text","text":"。根據以上我們對行存儲和列存儲原理的介紹,重查詢分析(大量的多表關聯、group by操作)場景推薦使用使用列存表,典型的有數倉場景;以點查詢爲主的場景推薦使用行存表,典型的有詳單查詢場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/04312ed7101eb4a792b667692aa1f53d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)支持單個database中同時存儲行存儲和列存儲類型的表,以更好的支持混合負載場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.4"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"表存儲方式定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表的行/列存儲通過表定義的orientation屬性定義。當指定orientation屬性爲row時,表爲行存儲;當指定orientation屬性爲column時,表爲列存儲;如果不指定,默認爲行存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行存儲表定義方式如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE storage\n(\n c_id int,\n c_d_id int NOT NULL,\n c_w_id int NOT NULL,\n c_first varchar(16) NOT NULL\n)WITH(orientation=row)\nDISTRIBUTE BY HASH(c_d_id);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列存儲表定義方式如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE storage\n(\n c_id int,\n c_d_id int NOT NULL,\n c_w_id int NOT NULL,\n c_first varchar(16) NOT NULL\n)WITH(orientation=column)\nDISTRIBUTE BY HASH(c_d_id);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 數據分佈方式設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)的MPP架構,天然支持通過散列的方式進行水平分表,將業務數據表的元組打散存儲到各個數據節點(DataNode)上,通過並行利用各個數據節點的IO能力提升數據掃描的效率。爲了優化高頻關聯小表的查詢性能,GuassDB(DWS)支持複製的數據分佈方式。"},{"type":"text","marks":[{"type":"strong"}],"text":"表的分佈方式取決於表的業務屬性,事實表一般數據量較大,且數據增加或者變化很頻繁,建議使用散列分佈;維度表數據量較小,且數據一般不會變化,只有定期更新操作,建議使用複製分佈"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.1"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"散列分佈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"散列分佈是按照某種散列規則,把表數據map到指定的數據節點(DataNode)上進行存儲的方式。散列分佈可以利用各個節點的IO資源,提升各個數據節點的IO能力。GaussDB(DWS)中採用hash的散列策略,按照表定義時指定的分佈列組合,對一條記錄的某一個或幾個字段進行hash運算後,生成對應的hash值,然後根據DN實例與哈希值的映射關係獲得該元組的目標存儲位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3a/3a8ea7069f74cdcd595777bcc1d6439a.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於散列分佈的表,分佈列的選擇非常重要。當分佈列選擇合理時,Hash散列策略可以大大減小計算節點之間的數據交互,大幅提升查詢性能;但是當hash分佈列選擇不合理時,會導致數據傾斜(某個或者某些DataNode的數據量嚴重超過其它DataNode的數據量),因爲短板效應導致集羣的有效容量下降。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"散列主要使用於客戶業務表,這些表有數據量大、數據量逐漸增加的特徵,適用散列分佈可以有效的提升表查詢性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.2"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"複製分佈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複製分佈(replication)策略將表中的全量數據在集羣的每一個DN實例上保留一份。在關聯操作中複製表可以避免數據重分佈操作,減小網絡開銷,同時減少了plan segment(每個plan segment都會起對應的線程)的個數;但是複製分佈策略會導致比較嚴重的數據冗餘,因此只有小表才適合複製分佈策略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/be/be6c927f15acc320c96127d6b6fed5d3.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際生產上只有小數據量、查詢頻繁、更新(DELETE/INSERT/UFPATE)很少的表(基本都是維度表)纔會定義replication分佈策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分佈方式選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表數據分佈方式主要依據表的業務屬性和數據屬性決定,簡單總結如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dd/dd48b65ce4f4eeb70158f2292ddd038d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.4"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分佈列定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表的複製分佈屬性可以通過表定義指定:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE storage\n(\n c_id int,\n c_d_id int NOT NULL,\n c_w_id int NOT NULL,\n c_first varchar(16) NOT NULL\n)WITH(orientation=row)\nDISTRIBUTE BY REPLICATRRION;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表的散列分分佈屬性可以通過表定義:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE storage\n(\n c_id int,\n c_d_id int NOT NULL,\n c_w_id int NOT NULL,\n c_first varchar(16) NOT NULL\n)WITH(orientation=row)\nDISTRIBUTE BY HASH(c_d_id);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 分佈列設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於採取散列分佈策略的表,分佈列的選擇取決於表數據的特徵以及表相關的業務查詢特徵,推薦使用經常做關聯查詢的列、且數據分佈均勻的列作爲分佈列。好的分佈列可以通過減少跨節點的數據計劃節省網絡資源開銷,優化查詢性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a45fb42fbc831d99d8f99600e91b63ff.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.1"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分佈列選擇策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hash分佈表的分佈列選取至關重要,需要滿足以下原則:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"列值應比較離散,以便數據能夠均勻分佈到各個DN"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分佈列值分佈不均勻會導致數據在數據節點分佈不均勻(某些DataNode上數據量大,某些DataNode上數據量小),這會導致不同DataNode上數據掃面的計算量不均衡,從而拖慢整個表掃描的性能;同時會因爲部分DataNode的磁盤容量提前爆滿,集羣只讀,導致集羣有效容量下降。通常情況下使用表的主鍵列或者唯一索引列作爲表的分佈列是一個不錯的選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"考慮選擇查詢中的連接條件爲分佈列"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)的散列策略是hash,根據GaussDB(DWS)的分佈式查詢框架,當兩表等值關聯(join)列剛好是表的分佈列時(如果分佈列是多列,那麼要求所有列都存在等值關聯條件),join任務可以不再數據重分佈的情況下直接Join,這樣可以省去數據重分佈的時間開銷和網絡資源開銷,從而提升查詢計算性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"c)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"在滿足前面兩條原則的情況下儘量不要選取存在常量等值filter的列"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)會協調節點(Coordinator)上進行任務規劃,此時會根據表的過濾條件(Filter)進行掃面操作剪枝優化,以較小IO資源開銷。如果表dwcjk的分佈列是zqdh,且表dwcjk掃描時存在Filter條件zqdh=’000001’,而根據散列策略zqdh=’000001’的值都分佈在數據節點DN1上,那麼協調節點(Coordinator)上進行任務規劃時會對dwcjk表的掃描操作進行剪枝(指定只有在數據節點DN1對錶dwcjk進行數據掃描操作)。這樣對於表掃描的實際壓力會值落在節點DN1,導致不同數據節點的IO壓力不均衡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意此策略主要適用於統計分析類的重查詢場景,對於詳單查詢等以點查爲主要場景的查詢類業務,在滿足前兩個約束的前提下,可以優選存在常量等值Filter約束列作爲分佈列。因爲這種場景在數據節點上使用索引加速查詢,查詢耗時往往以ms或者幾十ms計,通過剪枝把查詢任務map到具體的某個數據節點上執行,節省無效操作(不用連接到所有的數據節點上操作),同時也會大大的提高併發能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.2"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分佈列選擇的限制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)的列存儲格式的表不支持主鍵和唯一約束,行存儲格式表支持主鍵和唯一約束。但是存儲格式表的主鍵和唯一約束的創建存在嚴格約束:分佈列的集合是主鍵列或者索引列的子集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多個列作爲分佈列時,分佈列的順序會影響數據分佈,即同一條記錄在distribute by hash(col1, col2)方式下,跟在distribute by hash(col2, col1)分佈方式下可能會map到不同的DataNode上進行存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GaussDB(DWS)對分佈列的個數沒有限制,但是建議分佈列的個數儘量少,一方面可以減小數據map到不同DN的計算開銷,同時也可以更好的全匹配join條件,提升查詢性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.3"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分佈列離散性校驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於當前已創建並且導入數據的表,可以使用如下SQL檢驗表數據分佈的離散型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"-- 'public'是表的schema名稱,'storage'是表名\nSELECT * FROM table_distribution('public.storage') ORDER BY dnsize;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a8/a8a1df955ad64a745340b1b864ec3cc3.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於已經創建並且導入數據的表,如果我們認爲當前的分佈列不夠離散,在修改爲其它列之前,可嘗試使用如下SQL判斷目標分佈列的離散性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"-- 'public'是表的schema名稱,'storage'是表名,c_id是要檢測的列名\nSELECT * FROM table_skewness('public.storage', 'c_id') ORDER BY seqnum;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca8085deb5223a483591c96c533dcb3e.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當確定目標分佈列之後,可以使用如下SQL實現分佈列的修改"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"-- 'public'是表的schema名稱,'storage'是表名,c_id是修改後的目標分佈列\nALTER TABLE public.storage DISTRIBUTE BY HASH(c_id);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 表分區設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通俗的講表,分區就是把一個大表按照條件分割爲若干個小表,這種分割體現在數據庫內部的數據管理上,對錶數據的常規操作(UPDATE/DELETE/INSERT/SELECT)是透明的。一般對數據和查詢都有明顯區間段特徵的表使用分區策略,可通過減少不必要的數據掃描提升查詢性能。如下case中,使用分區表可以減少60%的數據掃描量,SQL查詢整體性能提升46左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/68/68a2ff8fe47b394c655e9096b68b5183.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4.1"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區表的優勢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區表把邏輯上的一張表根據範圍分區策略分成幾張物理塊庫進行存儲,這張邏輯上的表稱之爲分區表,物理塊稱之爲分區。分區表是一張邏輯表,不存儲數據,數據實際是存儲在分區上的。分區表和普通表相比具有以下優點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"改善查詢性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對分區對象的查詢可以僅搜索自己關心的分區,提高檢索效率"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"增強可用性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果分區表的某個分區出現故障,表在其他分區的數據仍然可用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"c)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"提升可維護性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於需要週期性刪除的過期歷史數據,可以通過drop/truncate分區的方式快速高效處理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4.2"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區策略選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當表有以下特徵時,可以考慮使用表分區策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"數據具有明顯區間性的字段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區表需要根據有明顯區間性字段進行表分區。通常我們比如日期、區域、數值等字段進行分區,時間字段是最常見的分區字段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"業務查詢有明顯的區間範圍特徵"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢數據可落到區間範圍指定的分區內,這樣才能通過分區剪枝,只掃描查詢需要的分區,從而提升數據掃描效率,降低數據掃描的IO開銷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"c)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"表數據量比較大"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小表掃描本身耗時不大,分區表的性能收益不明顯,因此只建議對大表採取分區策略。列存儲模式下因爲每個列是單獨的文件出處,且最小的存儲單元CU可存儲6w行數據,因此對於列存分區表,建議每個分區的數據不小於DN個數*6w"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4.3"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區表定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區表策略定義分爲兩種方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"簡單區間切割"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種是最常見的通用的分區定義策略,適合所有的分區定義場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE web_returns_p1\n(\n wr_returned_date_sk integer,\n wr_returned_time_sk integer,\n wr_item_sk integer NOT NULL,\n wr_refunded_customer_sk integer\n)\nWITH (orientation = column)\nDISTRIBUTE BY HASH (wr_item_sk)\nPARTITION BY RANGE(wr_returned_date_sk)\n(\n PARTITION p2016 VALUES LESS THAN(20161231),\n PARTITION p2017 VALUES LESS THAN(20171231),\n PARTITION p2018 VALUES LESS THAN(20181231),\n PARTITION p2019 VALUES LESS THAN(20191231),\n PARTITION pxxxx VALUES LESS THAN(maxvalue)\n);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"指定策略切割"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此方式適用於分區間隔固定、批量創建分區的場景。當分區個數很多時,此方式可大大節省創建分區的工作量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE web_returns_p1\n(\n wr_returned_date_sk integer,\n wr_returned_time_sk integer,\n wr_item_sk integer NOT NULL,\n wr_refunded_customer_sk integer\n)\nWITH (orientation = column)\nDISTRIBUTE BY HASH (wr_item_sk)\nPARTITION BY RANGE(wr_returned_date_sk)\n(\n PARTITION p2016 START(20161231) END(20191231) EVERY(10000),\n PARTITION p0 END(maxvalue)\n);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4.4"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區鍵選擇限制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類似表分佈列的選擇,對於行存儲格式的表,如果表存在主鍵或者唯一約束,分區鍵應當是是主鍵列或者唯一約束的索引列的子集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4.5"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區表查詢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只有查詢語句可以進行分區剪枝的時候,分區表查詢纔會產生數據掃描上的性能收益。一般只有當分區鍵跟常量值存在直接的比較(>、=)操作時,分區表纔可以正常剪枝。我們可以通過對查詢語句執行explain命令查看分區剪枝的效果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aa0e6f96ce98e9a3057290bf5bff2d18.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時我們期望編寫的SQL語句可以進行分區剪枝,但是實際上並沒有走到分區剪枝,這種預期外的行爲往往是因爲以下因素導致"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區鍵上有函數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當分區鍵上存在函數調用時,分區表無法剪枝"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/81749d40141c6a9d442f6cf177173e23.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"分區鍵字段的存在隱式類型轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種場景往往是因爲分區鍵跟常量值的數據類型不一致,導致計劃規劃時分區鍵的數據類型發生隱式類型轉換,導致分區無法剪枝"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/85/85d05719531332a20c7a2f518734fa77.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 字段設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表字段設計時需要注意以下內容"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"使用執行效率比較高的數據類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說整型數據的運算(包括=、>、<、≧、≦、≠等常規的比較運算,以及group by等運算)效率比字符串、浮點數要高。能使用整型的場景儘量使用整型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"使用短字段的數據類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"長度較短的數據類型不僅可以減小數據文件的大小,提升IO性能;同時也可以減小計算相關計算時的內存消耗,提升計算性能。比如我們需要一個整型數據,如果可以用smallint就儘量不用int,如果可以用int就儘量不用bigint。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"c)"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"strong"}],"text":"關聯列使用一致的數據類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表關聯列儘量使用相同的數據類型。如果表關聯列數據類型不同,在執行時數據庫會動態地轉化爲相同的數據類型進行比較,這種轉換會帶來一定的性能開銷,同時可能會因爲類型轉換導致表關聯操作時發生數據重分佈,導致額外的性能和資源開銷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6 約束設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1) 非空(not null)約束"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"明確不存在null值的字段加上not null約束。在特定場景下,優化器會對包含not null的查詢語句進行自動優化,提升查詢效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2) 主鍵/唯一約束"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行存儲表支持唯一/主鍵約束,如果表是散列分佈,那麼約束列必須包含所有的分佈列;如果表做了分區,那麼約束列也必須包含所有的分區列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3) 局部聚簇約束"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"局部聚簇(partial cluster key,簡稱PCK)是列存儲表一種局部聚簇技術,這種技術可以讓表數據在批量入庫的時,先對錶進行局部排序,然後再寫盤。這樣可以讓相同/相似的數據連續存儲,提高數據的壓縮比;同時在查詢時也可以依賴列存儲表的min/max稀疏索引實現表的CU過濾,從實現高效的數據過濾效果(參見"},{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=https%3A//bbs.huaweicloud.com/forum/thread-69066-1-1.html","title":null},"content":[{"type":"text","text":"《GaussDB(DWS)性能調優:列存表scan性能優化》"}]},{"type":"text","text":")。一張表上只能建立一個PCK,一個PCK可以包含多列,但是一般不建議超過2列。通常我們使用經常出現的、過濾效果比較好的簡單表達式中的列作爲PCK列,局部聚簇約束的定義方式跟主鍵約束的定義方式類似"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE web_returns_p1\n(\n wr_returned_date_sk integer,\n wr_returned_time_sk integer,\n wr_item_sk integer NOT NULL,\n wr_refunded_customer_sk integer,\n PARTIAL CLUSTER KEY(wr_returned_date_sk)\n)\nWITH (orientation = column)\nDISTRIBUTE BY HASH (wr_item_sk);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e75b752dc716528d0d57dd246434f458.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"7 表定義總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後簡單總結下表定義流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/97/9785039d3df42e5af497605899820709.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://bbs.huaweicloud.com/blogs?utm_source=infoq&utm_medium=bbs-ex&utm_campaign=other&utm_content=content","title":""},"content":[{"type":"text","text":"點擊關注,第一時間瞭解華爲雲新鮮技術~"}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章