實時計算的業務劣勢、思維誤區和改進之道

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1E4E79","name":"user"}}],"text":"技術優勢如何變成業務劣勢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“實時”一詞過於籠統,我們不妨通過“時效性”來進行量化:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性爲“天”級別以上的,從業務習慣來講我們稱之爲“離線計算”;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性爲“小時”級別的,我們稱之爲“準實時計算”;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性爲“分”、“秒”級別的,我們稱之爲“實時計算”;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“時效性”常常和“時間精度”混淆。其實兩者並沒有直接聯繫:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性爲“天”的“離線計算”,同樣可以提供時間精度爲“秒”的計算,只不過上一天數據的計算結果今天才會輸出;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性爲“秒”的“實時計算”,同樣可以提供時間精度爲“天”的計算,當天的計算結果當天就輸出並按秒更新,只是在一天結束前,計算結果都不完整;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"顯然,高時效性的計算技術在應用場景上有極大優勢:時效性高的計算技術可以用於時效性要求低的場景,但是時效性低的計算技術無論如何也滿足不了時效性高的場景的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算的特性爲高時效性,這會在數據業務上會產生什麼影響?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有點反直覺,會有一個偏負面的影響:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算的高時效性特性,令其在數據業務創新和推廣的生命週期中,處於下游、末端的地位。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實這就是當前的實時計算業務現狀。我們從邏輯上也不難推演:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"只有在需要關注和利用當前“分鐘”和“秒”級的高時效性信息,你才需要進行實時計算。注意這裏不要混淆“時效性”和“時間精度”:如果你需要 “分鐘”精度的歷史數據,你並不需要進行實時計算。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"需要使用高時效性數據的數據業務,只有投產階段才需要實時計算。如果把數據業務的創新和推廣流程分爲探索、調研、實驗、投產這四個階段,那麼在前三個階段,可能需要高時間精度的數據,但基本不需要高時效性的數據:數據業務的探索、調研、實驗等環節耗時都以天爲單位計量,“分鐘”和“秒”級時效的數據自然沒有太大意義和幫助。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據業務中,探索、調研、實驗這三個環節的業務附加值很高,但實時計算卻無法參與,只有在最後的投產環節,纔出現其身影。儘管實時計算在技術上不可或缺,然而處於業務流程的下游和末端,在創新和推廣上奉獻價值反而很小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合當前數據業務中研發和工程的分工,實時計算的高時效性這一特性,更是爲其相關工作的開展製造了一個困局。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"順便一提。回到一開始的量化,所謂“實時計算”和“離線計算”,從數據業務角度看,可能更多是時效性高低的區別。然而,我們使用“實時計算”“離線計算”這樣的二元化定義,可能讓自己潛意識地跳進認知陷阱,還導致了一些思維誤區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1E4E79","name":"user"}}],"text":"我們常抱有什麼樣的思維誤區?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"普遍的思維誤區主要有兩點:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們潛意識中認爲,數據時效性越高,價值自然越高,從而忽視了業務規律,進而導致我們沒有聚焦到與實時計算更契合更有價值的業務上。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"我們潛意識中認爲,實時計算的推進的關鍵和瓶頸在於技術上。只要技術進步,業務自然會出現,使得我們習慣被動等待需求,加上實時計算的特性,讓我們更容易與業務脫鉤。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更進一步分析,這些思維誤區,和目前數據業務的分工模式也有着較大的關聯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前,數據業務通常會分工爲研發和工程兩種。簡單來說,就是數據科學家和數據工程師兩種角色。筆者所在的公司和團隊即是如此。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"早期,這兩種角色通常是一個人兼職的。後來,因爲技術門檻和開發耗時等原因,部分人員就分化出來專職從事數據工程師這一角色,專門負責數據開發、業務實現和進一步的平臺化等工作,這樣效率更高,更有規模效應。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算自然更多是數據工程師的範疇。然而,在業務方面,這種分工模式會帶來一些深層次的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,如前文所言,因爲實時計算的高時效性,其在數據業務創新和開發流程中,參與程度會更加小。按照當前數據業務分工模式,數據工程師只專注於業務實現上的話,業務空間自然會壓縮得更小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,基於技術進步和數據工程師二次開發的框架,實時計算業務開發的成本和門檻已經大大降低。將來,數據科學家有更好的條件進行業務實現,這也會進一步壓縮數據工程師的業務空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,實時計算的高時效性特性,其業務的個性化特性會更加明顯。在當前分工模式下,實時計算的落地和推廣實際上依賴和受制於數據科學家的需求,這會帶來一些負面影響:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要高時效數據的業務的個性化明顯,數據科學家在探索、調研和實驗階段可能難以利用現有平臺,而是手動實現數據流程,導致路徑依賴,對實時計算方案感知度低;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據工程師主要把數據科學家作爲業務推進的目標受衆,因爲數據科學家對實時計算的感知度低,不能給數據工程師提供有效的反饋,導致實時計算業務的推廣和落地受阻;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些因素相互影響和交織,形成實時計算業務推廣的瓶頸和阻礙,這是目前實時計算面臨的困境。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實時計算這一領域,專注業務實現、被動等待需求,給業務創新和推廣帶來的瓶頸和阻礙會越來越明顯,不能長久。現在已經不是能不能實現的問題,我們要更多地考慮實現什麼的問題。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1E4E79","name":"user"}}],"text":"業務推進策略該如何調整?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算面臨的困境,是由數據業務的分工模式和實時計算的高時效性特性共振而來。所以,若要擺脫這個困境,我們首先可以嘗試突破這種分工模式,直接面向產品,主動尋找可進入的業務:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接觸業務,主動發現問題和推動問題解決。同時注意“需要更快的馬”的典型僞需求;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據業務創新和業務的流程中,以合理分工的方式,參與更上游的環節;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以面向產品的角度,進行業務的落地和推廣工作;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子,原本實時計算推進工作是主要面向數據科學家,按照其拆解的需求,完成實時計算的業務實現。現在變爲,數據工程師和數據科學家一起,直接面向產品的完整需求,通過合理分工參與數據業務上游部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣做的好處,一是兩者都可以彌補在業務和技術知識上的缺失和差距,二是實時計算的推廣工作可以轉向以產品爲目標,有更好的反饋和成效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,我們應該爲實時計算尋找有價值的業務點和進入方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算應該進入的頭部業務,應該具有以下兩個特性:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"高時效性的數據可以產生更大價值,或者必須使用高時效性數據;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"業務產出儘量可以客觀量化,需要人主觀評價的成分少;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,個性化推薦就是一個適合實時計算進入的業務,進入角度可以是全業務過程的系統化、平臺化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,個性化推薦直接滿足上述兩個特性:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"許多對照實驗證明實時推薦效果更好,並且一些推薦業務必須基於實時數據;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,命中率和使用率就足以衡量業務的價值。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嚴格來說,個性化推薦並不是特別“新穎”的着力點。直觀上,談起實時計算,個性化推薦大部分人第一時間能想到的應用點。不過,這不意味着騰挪空間就少。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個性化推薦的形式非常多樣,如商品推薦、好友推薦、助戰推薦,等等。要把這些統合在一起,可做的工作很多;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"個性化推薦不僅僅只是推薦本身,還涉及概率控制、輿論控制、升降檔、兜底等因素要考慮,在全系統上也有很多可做文章之處;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,數據預警(異常用戶、異常交易等),數據接口(實時大屏等)等,也是適合實時計算的頭部業務,分析過程類似,關鍵在於進入的角度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章