ID選擇,你做對了嗎?

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"何爲ID"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ID 是標識符(identifier)的前綴,它代表一個可以唯一識別一個對象或者物體的名稱。在軟件系統中,ID 用於對一組信息進行標識,它是信息系統裏最底層、最基礎的概念,從系統誕生到消亡,都與 ID 息息相關。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但我們經常會發現,在很多系統的早期,都會採用自增的 Mysql Int ID,筆者可以認爲這是對 ID 未做過深入思考的選擇,而到後期時,才發現 ID 變更已經幾乎不可能實現。由於信息系統相互引用無處不在,如果 ID 選擇不當,帶來的負面影響往往非常深遠。還有一些常見的坑,比如聚合層服務要聚合不同底層服務 ID 時,才發現它們類型不同;資源被黑客遍歷攻擊,才發現早期 ID 使用的是自增的 Int 類型; Int64 傳遞至 JavaScript 發現錯誤等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者調查了一些 ID 選擇的情況,如下表,各有不同。那麼ID 該怎麼選呢,如類型、長度等屬性,還有其他屬性要考量的麼?ID 的選擇,並非技術複雜度問題,更多是對 ID 屬性的認知程度,以及統一規範問題。本文會重點對 ID 的屬性進行分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/68\/681d97f869ef75726aca8c567fcfbc74.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"屬性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們列出 5 個屬性來分析,並按事物的性質與事物之間關係的角度將 ID 屬性分爲兩類:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自身屬性:類型、長度"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"領域屬性:唯一性、稀疏度、遞增性"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"唯一性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"唯一性是 ID 的本質屬性,ID 一定能幫助我們在其領域內識別唯一對象,否則就不是標識符。唯一性的實現需要各種生成策略。目前,業界已經有多種解決方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於Int 類型,最簡單的生成策略可以依賴數據庫中間件,如依賴 Mysql 的自增 ID,但依賴數據庫中間件的缺點也很明顯,不支持水平分片架構,且對數據庫有依賴,每種數據庫可能實現不同,一旦數據庫切換時涉及到代碼的修改,則不利於擴展。而且依賴數據庫的自增 ID 也會有安全問題,容易被遍歷。更加有效成熟的解決方案,是依賴集中式發號器,被廣泛使用的發號器有通常被使用的雪花算法及其變種,網傳實際測試每秒最多生成 26w 個id,可以有效的生成全局的 ID。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"String 類型 ID 的生成方式已經被標準化,主要依賴SDK生成。如 UUID 被開放軟件基金會(OSF)標準化,各種語言均提供按標準實現的SDK,它由時間戳、網卡、時鐘序列等構成標準,確保了在分佈式環境下,單機高速生成,支持100ns級併發;MongoDB 的主鍵是 STRING 類型,其標準爲: ObjectId = epoch 時間 + 機器標識 + 進程號 PID + 計數器,依賴不同語言的驅動來生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們經常會聽到一種說法,ID 應該是無意義的。這裏的無意義,有人闡述爲 ID 中不應該包含任何的具體場景信息,避免 ID 在可標識數量上降低而唯一性受到挑戰。對此說法,筆者不能認同,因爲在分佈式 ID 生成的策略裏,包含具體場景是必須的,我們通常會利用時間場景和機器場景來保證分佈式場景下 ID 各種屬性的要求。比如上文 UUID 嵌入網卡、時間等具體的場景信息。下圖經典的 Snowflake 算法,也嵌入 41 位時間戳和 5 位數據中心的場景信息,用來保證唯一性和遞增趨勢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8e\/8ed547296849b8971975b8b8eb3b025b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者認爲ID 的無意義是指 ID 不應當包含具體的業務信息,從而避免因業務的發展,出現對 ID 的唯一性挑戰,如避免使用郵箱做 ID、或用戶賬號內包含用戶生日等。一個領域的 ID 在其他領域被引用時,會具有外部引用的業務意義,也要避免。如下面案例,用戶會員表直接使用用戶 ID 做其數據庫主鍵 ID,當新需求希望一個用戶,需要同時擁有多個類型的會員時,就擴展艱難了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"UserID \/\/ 主鍵, 賬號ID,全局唯一\nMembership \/\/ 該賬號的會員類型 "}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類型選擇無外乎:字符類型、整形。這兩種類型的本質,都是支撐人類思維能力所形成的標準。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整數是人類計數體系的標準,它不但天然具有符合自然公理的完整運算規則,而且有來自CPU、操作系統、編譯器、甚至中間件等各個層級的直接支持,如當代的 CPU 一般具有 64 位寬的整數型寄存器。Mysql 提供可以自增的整形主鍵,Redis 提供整數對象池來節省存儲資源。這些支持,使整數存儲上更加節省資源,運算上更加高效,其運算時間複雜度通常爲O(1)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"字符是有人類語言體系的標準,天生爲了標識而存在。字符不具有天然的計算規則,其常見的比較策略是逐一比較,算法時間複雜度爲O(n)。字符的實現方式,依賴不同的編碼標準及操作系統的實現,有 ASCII、Unicode、UTF-8 等。字符可以標識更大範圍,理論上是無限空間。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整形類型的優點在於充分利用底層基礎設施的性能優化措施,使得其支持系統或中間件達到最佳狀態,帶來這一優勢的是計算機體系的支撐,自然也會有它的約束。字符類型的優點在於標識空間的巨大,當然這一優勢,是以存儲空間和計算性能爲代價的。具體對比如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a5\/a5a9aa86c27a98e1d6a206eb9d3d7de0.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從運算屬性來看,整形天然具有運算屬性的優勢,整形的對比時間複雜度爲O(1)。 字符類型不具備天然的運算屬性,其比較通常要自行定義排序規則。如 MySQL 和 Oracle 數據庫中,字符串類型比較規則是按照相同位置的字符的 ASCII 碼值的大小進行排序的,時間複雜度爲 O(N)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從標識空間來看,Int64 必然受到最大64位的限制,空間雖然非常巨大,但依舊是有限的。字符類型可以標識的空間是無限的,也容易嵌入業務信息。那麼字符表示更廣範圍的基礎是什麼?當然是佔用了更多的二進制存儲空間。筆者也很期待,像從 16 位、32 位,到發展到今天的 64 位一樣,未來會出現 128 位的整形,那時候 Int 的空間也足夠大了,可以和 UUID 直接轉換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從約束度來看,Int 類型是標準最爲嚴格的,可發揮空間小,位數也都受到較爲嚴格的約束,屬於強約束 ID, 難以嵌入業務信息。String 類型,容易被擴展或違反約束,這一條,對 Int 既是劣勢,也是優勢。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着計算機 CPU 和存儲的快速發展,存儲和計算複雜度,都不再成爲影響的關鍵因素時,String 和 Int 類型作爲 ID,各自的優勢變得並非十分明顯,所以,互聯網上也開始經常出現爭論:究竟使用哪種類型更好。筆者建議:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"統一最爲重要,參考存留的系統,如果大量的使用 Int,建議相關聯的系統仍保持使用 Int,如果大量使用 String、建議使用 String。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"參考中間件的基礎特性。如選擇的 ID 是否可以使 Mysql、MongoDB 等中間件是否可以達到最佳狀態。如果大量基礎設施是 Mysql,則遞增的 int 類型作爲 ID 更合適;如果都是 MongoDB,則可以使用其自動生成的字符類型 ID 。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"分佈式海量數據的 ID,可以採用字符類型,字符類型分佈式 ID 生成更加簡單,如服務鏈路的日誌追蹤。 需要埋入較多業務數據的,應當使用字符類型,如阿里 SPM 埋點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一條更通用的建議是:首先,優先考慮統一,要儘量避免一個關聯繫統內,Int 和 String 類型共存的情況。其次,優先考慮使用 Int 類型,Int 類型的約束更爲嚴格,被中間件的支撐度和可遷移性更好,而且 Int 類型對海量數據支撐能力通常也是足夠的,如雪花算法及各種改進,每秒可以生成幾十萬個 ID,滿足絕大多數分佈式場景。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"長度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在存儲已經不是瓶頸的今天,採用 Int 類型時,建議強制使用 Int64,筆者經歷過兩次公司級別將用戶 Id 由 Int32 改造爲 Int64 的過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用 String 類型時,建議制定約束規範,有以下長度可以借鑑:UUID 通常是36位,MongoDB的 ObjectID 是 24 位、Yuotube 的視頻 ID 是11位。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏存在一個常見的坑:JavaScript 語言對 Int64 支持度仍然不夠,其使用 53 位以上的 Int64 類型會有精度損失。因爲 JavaScript 語言內置數值類型依賴 IEEE 754 規範的雙精度浮點數,IEEE 754 規範字節分配如下圖。最大的安全整數是 52 位 fraction bit 剛好用到的情況,即:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"katexinline","attrs":{"mathString":"$$2^{53} - 1 = 9007199254740991$$"}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e3\/e369e9379159b0d836a207f871deceac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JavaScript 提供瞭解決方案 BigInt 來解決,但預計需要到 2025 年,瀏覽器更新覆蓋率才能全面使用("},{"type":"link","attrs":{"href":"https:\/\/caniuse.com\/bigint?fileGuid=t3TV9DvWCJxTpgjy","title":"","type":null},"content":[{"type":"text","text":"https:\/\/caniuse.com\/bigint"}]},{"type":"text","text":")。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"遞增趨勢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出於數據庫中間件索引和連續存儲的要求,ID 的遞增趨勢是非常有必要的。遞增的實現並不複雜,ID 標準生成方案已經完全可以做到支撐遞增且高性能。遞增趨勢有兩種,如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非嚴格遞增,要求整體上後生成的 ID 大於之前生成 ID。由於多數 RDBMS 使用 B-tree 的數據結構來存儲索引數據,索引頁的數據是按邏輯大小連續存儲的,如果使用非自增主鍵,MySQL不得不爲了將新紀錄插到合適位置而移動數據,頻繁的移動會大大降低數據庫的性能。 分佈式 ID 生成算法通常會將 ID 內嵌入一定的時間戳,來實現遞增趨勢,如雪花算法、MongoDB 的主鍵。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嚴格遞增,要求保證下一個 ID 一定大於上一個 ID,例如事務版本號、IM 增量消息、排序等特殊需求,需要嚴格遞增的邏輯來確保業務正確。嚴格遞增的這一約束,通常只能單機集中式實現。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"稀疏度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增加 ID 之間的稀疏度,可以提高惡意遍歷、碰撞攻擊的成本。稀疏度越高則效果越好。例如 YouTube 使用 11 位字符來作爲視頻編碼標識,11 位字符有超過 73 億種可能的組合,目前在 YouTube 上有 5 到 100 億個視頻,這意味着,如果依賴隨機輸入 11 個字符組成 URL 的方式獲得視頻,平均每嘗試七百萬次纔可以訪問到資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分佈式 ID 生成策略所生產的 ID 稀疏度通常足夠,參見上文的 snowflake 示例圖,理論上單機每毫秒理論最多生成 2^12 個 ID,再加上機器位,則每秒可生成的 ID 數目理論上爲 40 億。對大部分信息資源而言,每秒生成的 ID 散落在 40 億的 ID 空間內,稀疏度是夠用的。例如我司的 PGC 內容生產而言,其稀疏度通常在 100 億以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ID 稀疏度能大大降低未公開的資源,被猜測到的行爲,再去尋找漏洞進行單點突破的行爲。比如還未上架的課程已經被黑客公開了,會是件非常被動的事情。而且,如果數據的總量對外是機密的,ID 的稀疏度,還可以避免被黑客猜到數據總量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但 ID 稀疏度不能解決已公開資源被遍歷問題,已公開的資源很容易通過爬蟲收集 ID。完全公開的資源,通常是沒必要考慮進行安全限制的,比如上文表格中看到微博的用戶,美團的商品仍採用連續的 Int,因爲其不需要保密。保密級別高的資源,僅靠稀疏度來確保權限遠遠不夠,需要配合的措施有很多,如關鍵接口對指定 Token 限制訪問頻率等、APP 使用防混淆配合簽名加解密等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mysql 自增 ID 的稀疏度是 1,作爲需要防止遍歷的資源 ID 不是合適的。但筆者也見過一些公司做了另外一個極端:一定要把所有的 ID(如雪花算法生成的稀疏度在 100億以上)轉換成 String 類型的加密 ID 對 Web 端輸出,這種做法都是想當然的決策,除了徒增複雜度外並無太多價值。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實踐"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"ID域"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對微服務體系來說,當各種不同服務的 ID 需要聚合時,引入域標識是有必要的,因爲我們也需要對不同的 ID 去不同的微服務引用資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ID 域的主要作用是標識這個 ID 屬於哪個業務領域,對應哪個微服務。ID 引入域,就像編程里語言引入 namespacing 一樣,那是否需要引入多個層級的域概念呢?未嘗不可。但是層級越多,複雜度越高,通常建議只引如一個域層級,標識 ID 屬於不同業務領域即可。當 ID 傳遞至聚合層業務時,域標識和 ID 一併進行被傳遞或存儲,但不允許存在 ID 到對應服務的反查服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"AType \/\/ 域標識,屬於業務分類概念,對應於微服務劃分\nID \/\/ 該商品類型下 ID"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"變更"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了本質屬性外,其他幾種 ID 屬性都有可能變更:類型、長度、稀疏度、遞增性。不同屬性的變更成本代價不一樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類型變更通常最爲困難,因爲 Int 和 String 類型的本質不同,其支持系統如數據庫的實現方式差別巨大,除變更代碼外,歷史數據處理都困難重重。由於 ID 的引用無處不在,變更類型,所需的關聯繫統的改造也非常困難,甚至不可能,比如改變 ID 類型,會影響用戶的購買記錄、歷史記錄、收藏記錄,甚至大數據團隊的日常報表等系統的存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"稀疏度、遞增性的變更,只需要變更 ID 生成方式即可。如我們很容易可以將 ID 生成方式,由 Mysql 的自增,變更爲使用發號器來生成,來增加稀疏度。但歷史數據的處理仍會比較困難。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"長度的變更相對容易,如 Int32 變更爲 Int64,不會涉及業務邏輯變化,也無需處理歷史數據。前文也提到,筆者經歷過兩次公司級 int32至 int64 的變更,推動的都很順利。有些人也在擔心 snowflake 41 位時間戳可以使用 69 年,但筆者認爲,到時候 128 位的整形也應該普及了, ID 再次進行一次長度的變更即可。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"兼容"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個系統初始便選定合適的 ID 生成策略,使得各種屬性,稀疏度、長度、生成方式,類型等滿足要求自然最好。如果未選好,而且也無法變更。該怎麼辦呢?兼容。兼容一定會存在關聯 ID 的轉換,大的原則是但要"},{"type":"text","marks":[{"type":"strong"}],"text":"儘可能的減少雙 ID 的擴散範圍"},{"type":"text","text":"。參考如下兩個方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"將對應關係存儲下來。如將數字 ID 進行加鹽 Hash(或生成對應UUID)生成合適的 String ID,直接關聯存儲在數據庫內,但要儘可能的減少雙 ID 的擴散範圍。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"使用規則或者對稱加密算法進行可逆轉換,不做存儲。Int 可以較爲容易利用規則或算法轉換爲String 類型,反之不能,因爲畢竟巨大的 String 空間是無法依賴算法被壓縮入較小 Int 空間。本方案一定要注意,避免新 ID 被相關係統存入數據庫,包括客戶端的數據庫,以免導致無法變更算法。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實踐中,雖不理想,但也各有場景需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖,當聚合層服務聚合不同的資源 ID 時,有的底層服務用 Int,有的資源用 String,可以使用方案一,聚合服務將 ID 轉換保留在自己服務體內,對外輸出仍使用原有類型,避免了雙 ID 的擴散。筆者反對源服務提供兩種類型 ID 的做法,這樣會造成雙 ID 擴散和引用關係複雜,帶來的一系列難以維護的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e1\/e1fcf689d08220eadcaf81e0793125cd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖,底層服務和聚合層服務都已經存儲大量連續(不安全) ID,後期安全問題暴露,可以採用方案二。該做法首先要止損,將 ID 生成方式由原來的自增,改爲發號器;再採用 SDK 對 ID 進行加密對外輸出,以減少被遍歷風險。但從下圖可以看到,各SDK 採用相同祕鑰,破壞了系統之間的邊界,而且祕鑰變更也有引發故障的風險,屬於“瘸子裏挑將軍”。如果讀者有更合適的實踐,比如接口適配等,歡迎留言探討。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3a\/3a81d422b3d2a59dc15f315bbbc94118.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ID 的選擇,往往不經意卻影響深遠。它並非是技術複雜度問題,更多是對 ID 屬性的認知程度,以及架構的約束規範問題。特別是採用微服務體系時,各服務獨立設計,ID 選擇更容易失控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文從類型、長度、唯一性、稀疏度、遞增性的角度對 ID 進行全面分析,並提出一些通用的約束。文中也提出了一個粗略的選擇建議:業務生產類,建議使用 Int64 類型,採用發號器生成,並維護遞增趨勢。分佈式海量數據且領域較窄的,可以採用標準的 UUID 等 String 類型 ID。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容的大原則是減少雙 ID 的擴散,文中所列的實踐,您可能也會遇到過,並有自己心得,歡迎留言探討,更歡迎留言您遇到的其他場景及解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"奇正,曾在 Adobe 、百度任高級工程師,現任某互聯網公司技術總監,致力於業務架構、項目管理等方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章