星環研發總監爲你揭祕TDH8.0的前因後果 | TDH8.0 使用必讀

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"星環科技於2021年3月發佈了星環極速大數據平臺TDH的8.0版本。相信很多用戶都對這款產品非常感興趣。本系列文章向您逐一介紹TDH8.0全新功能和技術創新。幫助企業級數據平臺用戶更全面、深入地瞭解前沿的大數據技術,更好地技術選型。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"談談TDH的產品使命","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們從TDH的名字的由來講起。TDH全稱叫做Transwarp Data Hub,所謂Data Hub,簡單來說,就是我們想做大數據的集線器。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0f/0f88709b38b20f4a5b05a226d2d8ce03.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從2013年星環創立開始,我們就想","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"提供一個大數據平臺和一系列的工具,用戶可以把所有的數據都匯聚起來,通過工具對數據進行操作,幫助客戶企業創造價值。","attrs":{}},{"type":"text","text":"要想做成這件事,這個平臺希望能滿足以下幾個需求:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,這是一個企業化的軟件,它是由很多子模塊組成的,比較複雜;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,我們要滿足一站式的數據處理需求,能幫助用戶完成一個數據處理的全鏈路;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三,我們要處理多種數據模型,結構化,圖數據,文本數據等等;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們要有強大的存儲和計算能力,有能力幫助客戶在海量數據中探索價值;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要真的去實現一個企業級,一站式,多數據模型的大數據平臺,其實還是挺難的。星環大數據平臺也攻克了不少技術難題,今天我們話題的圍繞多模大數據平臺來展開。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/620278300c3a5cf09214ec4975d7786b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我記得星環2013年剛剛成立,那個時候大數據技術非常火熱,各種大數據技術層出不窮,市場普遍對這些技術也都處於一個摸索的狀態。許多同時期的大數據基礎軟件公司,大多都會選用一些相對成熟的開源產品直接組合成爲自己的大數據解決方案,理由是許多國內外的互聯網企業已經證明了這個技術可靠,那我們沒必要自己再從輪子做起。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時至今日,從技術角度看,我也不認爲這是一個正確的做法,特別是對於底層軟件來說。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們面臨的是企業的複雜系統,我們需要承認我們所面對的問題的複雜性。直接用開源產品堆積成爲的解決方案,雖然在針對性場景下都有着一定的解決能力,但是對場景的劃分需要有比較專業知識。更重要的是,我們的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"企業客戶業務發展歷史是很悠久的,遠遠超過了互聯網公司,超過大數據技術的發展。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/05/053e93cba6b8c5afacff4bc9785bccdc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相比較他們的業務而言,大數據技術可以解決一些痛點問題,但是不夠系統。用戶沒辦法持續,長期只利用一兩個產品來持續開發。這個原因有兩個,一個開源的大數據技術功能比較少,第二個是大部分開源社區還是由國外技術人員主導,國內的場景面臨的問題考慮的少一些。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這和互聯網公司完全不同,互聯網公司沒有歷史業務,完全可以就着技術來進行業務的開發,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"所以我們不能認爲開源的技術在互聯網公司被驗證,就一定可以應用於傳統企業。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,時至今日,大數據技術已經被驗證是可以應用於企業關鍵的生產系統的,這點也是星環所堅持的。但是怎麼樣做一個好的產品,把這些技術融入,同時又能支撐企業複雜的場景,則是一個令我和我的團隊頭疼不已的事情。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"TDH架構設計原則-用戶第一,效率第二","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先是成本問題,作爲一個創業型公司,特別是剛剛創業的頭幾年,我們沒有足夠的研發人手,所以不可能去把市面上的開源產品都拿回來研究透徹,所以我們選擇的路就是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一方面學習核心的大數據技術,同時產品代碼儘量自主研發","attrs":{}},{"type":"text","text":",並且在研發的過程中對一些技術做迭代改進。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自主研發雖然在產品構建初期,速度可能偏慢,質量上也會難以把控,但是一旦完成雛形,後續的迭代速度會很快,道理非常簡單,就是你很熟悉自己的產品架構,哪裏該去擴展,哪裏可以重構,都非常清楚,代碼的演進和迭代是在合理的規劃和控制中的。引用我的一個同事的話說就是,都是自己寫的代碼,有啥不能實現的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bf/bf066213bbb584c8baa4bf59a98c0f56.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲人手有限,平臺需要的功能又比較多,所以最早在設計的時候,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"TDH的整體架構模塊化的是比較好的。每個研發都可以聚焦在自己的模塊內工作","attrs":{}},{"type":"text","text":",這樣效率比較高,也好測試,有經驗的研發負責人則會把接口定義的可擴展性強一些,我們也考慮到了日後需求的進一步迭代。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以一方面外因我們面臨的是複雜的企業化場景,內因上我們也想用高效的方法去實現一個自主可控的大數據平臺。內外因結合,使得我們最終","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"確定了抽象出一個統一的分佈式計算引擎和統一的分佈式存儲引擎,再由各個產品團隊來實現各自的存儲結構","attrs":{}},{"type":"text","text":"來滿足客戶業務需求的這麼一個架構。這樣設計也爲我們今天這樣一個多模型的大數據平臺打下了基礎。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在後續的架構演進過程中,通過客戶的需求也不斷驗證了我們這個設計的正確之處。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"這裏舉個例子","attrs":{}},{"type":"text","text":",我們在某個圖數據庫實施過程中,發現構建圖的時候有一個點的出入度特別大,就是那種成千上萬倍的大於這個圖的平均出入度。我們想好奇想查一下這份原始數據,於是我們就把圖數據庫用的引擎通過session的一個熱配置切換到了SQL的狀態,發現是數據和schema對錯了,導致大量的錯誤數據。 這個過程其實就是所謂統一引擎的一個好處。統一的存儲引擎類似,當遇到擴縮容,磁盤損壞等情況下,不用管是什麼數據模型,運維方式,命令都一樣,不需要針對每個組件都學一套運維方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"且不說諸如ElasticSearch這樣的分佈式運維方式比較獨出心裁的一種分佈式方案,光是不同的命令套系學起來就都還要費些功夫的。 當然星環的多模大數據平臺還有一些很不錯的功能,比如多種模型處理可以在一個進程裏,也可以獨立進程使得資源使用率上比較容易調配;優秀的SQL的支持度可以降低業務遷移成本;統一的運維方式和理念可以讓運維變得容易一些等。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"團隊積累8年的成果:TDH架構先進性的體現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以通過做一些具體的比較來說明這個問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bf/bf066213bbb584c8baa4bf59a98c0f56.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"一、集成式 vs 拼裝式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開源社區的軟件往往是針對某一個,或者幾個特定場景,要支持一個企業級的需求,開源的大數據平臺需要用很多組件來拼裝而成。星環的大數據平臺軟件和開源的大數據軟件棧相比,功能更爲強大,架構複雜度遠遠低於Hadoop生態圈。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在同等功能複雜度下,星環的組件和模塊個數是遠遠小於開源產品的組裝出來的方案的,這個是優勢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲簡單,去掉了不必要的交互。當然在功能需求單一的一些場景下的時候,目前我們的大數據平臺還是偏重了一些,不過隨時軟件越來越成熟,我們會通過模塊化等方式去瘦身,針對一些小場景做好軟件的瘦身工作。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"二、傳統企業場景 vs 互聯網場景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個話題,之前也提到了,這裏我們再細聊一下。傳統企業歷史悠久,比如就拿銀行的場景來看,實際上業務的完善度是很高的。我們在說創造新場景創造新價值的時候,首先需要考慮兼容性。我們不能繞過原來的業務去創造新的業務,那不切實際。所以實際上,原有業務能夠怎麼比較順利的遷移到TDH上,是我們考慮的第一個問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我覺得互聯網和傳統企業的問題,是兩類的問題。在解決問題的時候,技術是可以互相借鑑的,但是不能說誰更先進或者誰更有用。這個有點關公戰秦瓊的意思。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"TDH在選擇技術路線的時候,是比較喜歡嘗試新的技術的,但是不一味地追求新,而是追求能適用。","attrs":{}},{"type":"text","text":"新的技術,有價值的技術,必須能夠在企業應用裏落地。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"落地是我們在做技術選擇的時候最重要的一個指標。","attrs":{}},{"type":"text","text":"因此我們的TDH在技術上,用的是新的大數據技術,同時在落地上也是非常的接地氣,圍繞客戶的需求不停的迭代,這個是良性的發展,也會逐步形成產品的核心競爭力。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"三、JVM vs C Lang","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術圈的朋友其實經常面臨一個選擇。我直接談我們的觀點,Java,易學難精;Native的語言,上限高一些。星環的統一計算引擎是用JVM爲主的,而存儲引擎則是C++寫的。這樣的組合搭配是比較合適的目前的客戶的需求的。存儲引擎穩定,我們用C++做了很好的內存模型,事務管理,同時容災,擴容等能力也在隨着版本的迭代不斷的增強。計算引擎功能強大,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"我們在編程上,會更注意適配JVM的GC模型和Jit,使得我們可以快速的開發出性能和功能都比較強大的計算引擎。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"難點·嘗試·目標·等你","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在過去的一年多時間以來,爲了突破幾個關鍵性能,我們團隊始終在不斷嘗試。其實我們從一開始想做這個結構,到把這個結構做出來,也不是一帆風順,其實可以說是比較坎坷的。開發過程其實是一路踩坑的過程,印象比較深的就是去解決操作系統啊,JVM等偏底層的運行環境組件的問題。當然最經典的就是和GC去做搏殺,不過這個實在太習以爲常以至於沒什麼可以聊的,今天可以聊聊一個稍微偏冷門一點的故事,和Jit相關。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Jit是java程序運行的性能關鍵,一段Java代碼運行的到底如何全看C2編譯器的表現,我們遇到過很多運行過程中性能衰減的情況,簡單來說就是越跑越慢,我們通過看jit的彙編發現了一些問題的關鍵。 後面我們的工程框架設計的時候特別在意在jit的編譯之後的表現。如果不解決這些問題,我們也沒辦法在同一個JVM裏放這麼複雜的功能,去支持很多種數據模型。 國產基礎軟件發展時間還很短,我們還有很多很多的工作要做。我們會把更多的精力投入在平臺的易用性,穩定性,性能,同時也會開發更多的功能。希望TDH可以幫助客戶創造更大的價值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/_O17d9Q-eYwRutGXd58-0Q","title":"","type":null},"content":[{"type":"text","text":"星環研發總監爲你揭祕TDH8.0的前因後果 | TDH8.0 使用必讀","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章