數據編目已過時,數據發現正當道

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據做爲企業的關鍵資產,日益用於賦能數字化產品、輔助決策制定並驅動創新。掌握數據的健康狀態和可靠性,業已成爲企業的立基之本。數十年來,各組織的數據治理一直依賴於數據編目系統。但這是否就足夠了呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文邀請了 Debashis Saha 和 Barr Moses,探討數據編目系統無法滿足現代數據棧需求的原因所在,以及做爲新方法的數據發現(data discovery)是如何爲元數據管理和數據可靠性提供了必要的便捷途徑。Saha 是 AppZen 的工程副總,曾任職於 eBay 和 Intuit;Moses 是 Monte Carlo 公司聯合創始人兼 CEO。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,要理解數據對業務的影響,需掌握數據的具體位置和訪問情況。事實上,成功地構建一個數據平臺,關鍵在於對數據做好集中的組織管理,同時提供便利的訪問方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據編目(data catalogs)可類比爲實體圖書館書籍編目,它充當元數據的存放地,並向用戶提供獲取數據可訪問性、健康和位置等內容所需的必要信息。在當前的自助式商業智能(SSBI,self-service business intelligence)時代,數據編目同樣提供了強大的數據管理和數據治理工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由此,建立數據編目是大多數數據負責人的當務之急。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據編目系統至少需具備解決如下問題的能力:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據應從何處查找?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據是否重要?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據表示了什麼?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據的相關性和重要性?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據如何使用?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,隨着數據操作的日益成熟,以及數據流水線的日益複雜,傳統的數據編目顯露出不足,通常難以滿足上述需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這也是爲什麼一些優秀的數據工程團隊正持續創新自身的元數據管理方法。下面介紹他們的做法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據編目的不足之處"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據編目提供爲數據建立資料的功能,但對於支持用戶“發現”並實時洞悉數據真實狀況這一根本性問題,數據編目在很大程度上依然無能爲力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們所知的數據編目未能實現與時俱進,原因主要可歸爲三點:一是缺失自動化機制,二是難以隨着數據棧的增長和多樣化而擴展,三是不支持分佈式格式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"對自動化的需求與日俱增"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統數據編目和治理方法,通常依靠數據團隊手工完成繁重的數據錄入工作,並負責根據數據資產的變化情況相應地更新編目。該做法不僅非常耗時,而且需要大量的人工操作。如果能自動執行上述操作,那麼就可騰出數據工程師和分析人員的時間,讓他們聚焦於真正具有產出的項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"掌握數據的狀態,是數據專業人士的常態化工作。這意味着行業需要更加強大、更具定製能力的自動化技術。下面的實例有一定借鑑意義:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了釐清當前的報告或模型是由哪些數據集提供的信息,你是否常常在利益相關方開會之前瘋狂翻閱聊天記錄。爲了搞清楚生成某個關鍵報告的數據究竟爲什麼會在上週停止更新,你和團隊是否會聚在一起在白板上梳理上游和下游的全部連接情況?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"忽略細節信息,問題可能會是下圖這樣。對所有人而言,數據世系看起來就是一團亂麻。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/6c\/6c7cb97a85961fbad4c37adabcc0d0a4.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:圖片由 EgudinKa 發佈在 Shutterstock:"},{"type":"link","attrs":{"href":"http:\/\/www.shutterstock.com\/","title":"","type":null},"content":[{"type":"text","text":"http:\/\/www.shutterstock.com\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發生在你身上的此類情況並非孤例。許多需要解決這團亂麻的公司,已耗費數年時間手工釐清自身的數字資產。有些公司投入資源使問題得到了短期解決,或是使用內部工具去搜索和瀏覽數據。即便最終達到了目標,這些措施也已經給組織帶來沉重的負擔,耗費了數據工程團隊的時間和資金。而這些時間和資金本來可以用於產品研發和數據使用等方面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"隨數據變化而擴展的能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據編目非常適用於結構化數據。但時至 2020 年,數據並非完全是結構化的。隨着機器生成數據的不斷增加,以及企業在機器學習項目上的投資,非結構化數據越來越成爲常態,已超過所有新生成數據的 90%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結構化數據 通常使用數據湖存儲,並不具有預定義的模型,必須經過多次轉換才能使用。非結構化數據是完全動態的,在經歷轉換、建模和聚合等各處理階段後,數據的形態、來源和意義會隨時發生變化。對非結構化數據的轉換、建模、聚合和可視化等處理方式,導致難以在“期望的狀態”下對數據進行編目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於數據消費者,編目不僅需要給出對所訪問和使用的數據的基本描述,而且更重要的是需根據使用者的意圖和目的給出對數據的理解。數據生產者對資產所做的描述,可能會與數據消費者對其功能的理解大相徑庭。甚至在兩位不同的數據消費者之間,對數據含義的理解上也可能存在着巨大的差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,對於一個從 Salesforce 中抽取的數據集,在數據工程師和銷售團隊人員看來可能意義迥異。儘管數據工程師能理解“DW_7_V3”字段的含義,但銷售團隊則會抓狂,難以確定 Salesforce 數據集是否與他們的“2021 年收入預測”儀表盤相關。此類例子不勝枚舉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對數據做靜態描述本身就存在侷限性。時至 2021 年,要真正地理解數據,我們必須接受並適應數據的動態發展和推陳出新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據呈分佈態,但編目並非如此"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現代數據架構正向分佈式發展(參見“the data mesh”一文),半結構化和非結構化數據也在成爲常態,但大多數的數據編目系統依然將數據視爲一維實體。經聚合和轉換後的數據在流經數據棧的各個部分時,非常容易發生難以文檔化的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/2e\/2e57dd7f48e40f160110ffccab0c3ccd.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖釋:傳統數據編目獲取數據時生成元數據,即用於描述數據的數據。但數據是持續變化的,導致難以把握在流水線中變化的數據健康狀態。圖片由本文作者 Barr Moses 提供。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前,自描述 已成爲數據發展趨勢,即在數據中打包了數據本身以及描述數據格式和意義的元數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於傳統的數據編目並非分佈式的,很難做爲數據的單一事實來源(SSOT,single source of truth)。隨着從 BI 分析師到運營團隊等更廣泛用戶羣體對便捷訪問數據的需求增加,以及支持機器學習、運營和分析的數據流水線越來越複雜,該問題只會愈加嚴重。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現代數據編目需要跨域整合數據的含義。數據團隊應該理解數據域間的相互關聯,以及在哪些方面上需要使用聚合視圖。需要以一定程度上聚合的方式,才能作爲一個整體回答呈分散態的問題。換句話說,需要分佈式的聯邦數據編目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"建立更好的數據平臺,需要自一開始就採用正確的方法構建數據編目。進而幫助團隊實現數據民主化(democratize)、簡化數據探索、聚合重要的數據資產、儘量充分發揮數據的潛能。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據發現:新版的數據編目"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲確保數據編目正常工作,需建立嚴格的模型。但隨着數據流水線變得越來越複雜,以及非結構化數據大行其道,我們對數據的作用、用途和使用方式的理解可能未能反映現實的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們認爲,新一代數據編目系統需具備數據的學習、理解和推理能力,支持用戶自助地使用數據洞察力。但應該如何實現?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/47\/47718d8ff10a69ba414f97a96d3eab3b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖釋:數據發現在遵守着同一套中央治理標準的同時,爲不同領域的數據提供了分佈式實時洞悉。圖片由本文作者 Barr Moses 提供。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不僅是數據編目,元數據和數據的管理策略也必須結合數據發現。數據發現是一種實時瞭解分佈式數據資產運行狀況的新方法,借鑑了 Zhamak Deghani 提出的分佈式面向領域架構,以及 Thoughtworks 的 數據網格模型。數據發現提出,各數據所有者應對數據負起產品責任,推動分佈於不同位置的數據間的通信。數據一旦提供給特定的域,並在域中加以轉換後,域數據的所有者就可將數據用於滿足自身的運營或分析需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據發現根據一組特定使用者對數據的獲取、存儲、聚合和使用方式,動態地給出在特定域中對數據的理解,可替代數據編目。和使用數據編目一樣,治理標準和工具同樣是跨域聯合的,以支持更大的可訪問性和互操作性。不同於數據編目所給出的數據理想狀態或“編目”狀態,數據發現可實時瞭解當前數據狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據發現不僅能回答數據理想狀態的相關問題,而且能回答涉及不同域中的當前數據狀態:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"哪些數據集是最新的?哪些數據集可被禁用?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某個表的最新更新時間?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據域在特定領域中的意義?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"誰訪問了該數據?該數據的最新使用時間和使用者?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據的上游和下游依賴情況?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據質量是否達到生產環境要求?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據是否滿足特定領域業務需求?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶對數據有哪些要求?這些要求是否可滿足?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們認爲,新一代數據編目(即數據發現)應具備如下功能:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自助式的發現和自動化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即便沒有專門的團隊支持,數據團隊也能輕鬆地使用數據編目。自助服務、自動化和工作流程編排等數據工具,避免在數據流水線的各階段及過程中產生孤島,使人們更容易理解和訪問數據。更好的可訪問性,自然會增加對數據的採用,從而降低數據工程團隊的負擔。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"隨數據演進的可擴展性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着公司獲取的數據越來越多,並且非結構化數據已成爲常態,滿足需求的可擴展能力對於數據項目的成功是至關重要的。針對數據規模的擴展,數據發現利用機器學習技術獲得整體視圖,確保用戶對數據的理解能適應數據的發展。這樣,數據使用者就能做出更明智適時的決策,避免了依賴於過時的文檔(也就是說,描述數據的元數據過時了!),或是更糟糕的是感情用事做出決策。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據世系(Data lineage)的分佈式發現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據發現在很大程度上依賴於數據世系。數據世系自動形成表和字段級的世系關聯,用於映射數據資產間的上游和下游依賴。世系給出了顯示特定時間上的正確信息,這是數據發現的一項核心功能。世系還可以給出數據資產間的關聯關係,便於用戶更好地排查數據流水線管道發生中斷問題的時間。隨着現代數據棧爲適應更復雜用例而不斷改進,出現問題已變得越來越普遍出現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據可靠性是確保數據隨時可用的黃金準則"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實上,你的團隊可能已經在某種程度對數據發現做了一定的投入,無論團隊是通過手工驗證數據,由工程師編寫自定義的驗證規則,還是僅僅基於數據損壞或未被察覺的錯誤而制定決策成本。從數據質量監視,到更爲強大的可監視並告警數據流水線中問題的端到端 數據可觀察性平臺,現代數據團隊已開始使用自動化方法,確保流水線各階段的數據是高度可信的。一旦數據中斷,此類解決方案會通知用戶,以便第一時間定位致因,進而快速地解決問題,防止進一步發生宕機。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用數據發現,數據團隊可確認自身對數據的設想是符合現實的,從而跳出特定域的侷限,實現整個數據基礎架構中的動態發現和高可靠性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"展望未來"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不好的數據要比沒有數據更糟糕。同樣,不具備數據發現的數據編目系統,要比完全沒有數據編目更糟糕。爲了獲得真正可發現的數據,重要的是不僅是“編目”數據,而且要做到使用的數據是準確的、整潔的並完全可觀察的。換句話說,編目是可靠的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要實現強有力的數據發現,需依賴於自動、可擴展並符合數據系統分佈式新本質的數據管理。因此,要真正實現組織中的數據發現,我們需要重新考慮如何實現數據編目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只有瞭解自身數據在整個生命週期中跨各域的狀態及使用情況,才能去信任數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Barr Moses,Monte Carlo 公司 CEO,聯合創始人。Monte Carlo 的宗旨是通過與數據社區廣泛合作,發揮數據的全部潛力,幫助企業兌現數據價值,致力於使數據可靠,並簡化客戶運行復雜度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/data-catalogs-are-dead-long-live-data-discovery-a0dc8d02bd34","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/data-catalogs-are-dead-long-live-data-discovery-a0dc8d02bd34"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章