數據採集方案設計與實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一. 數據採集是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所謂“數據採集”,指的是針對特定用戶行爲或事件進行捕獲、處理和發送的相關技術及其實施過程。比如用戶某個icon點擊次數、觀看某個視頻的時長等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據採集的技術實質,是先監聽軟件應用運行過程中的事件,當需要關注的事件發生時進行判斷和捕獲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據分析的整個體系中,通常是由數據採集、數據傳輸、數據建模、數據統計\/分析、數據可視化反饋5個步驟組成,我們認爲,第一個步驟,也即數據採集是最核心的問題。數據採集是否豐富,採集的數據是否準確,採集是否及時,都直接影響整個數據分析的效果。所以如何選擇正確的數據採集方式,採集哪些數據對做好數據分析至關重要。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、數據採集的現狀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"企業在數據採集的道路上經常會遇到各種各樣的問題,如何採?採哪些?用什麼手段?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前現有企業的數據採集工作,通常會選擇三種途徑,分別是第三方統計工具、通過業務數據庫做統計分析和、通過後端接口去做代碼打點並結合業務數據庫做精細化統計分析。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、第三方統計工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,友盟、百度統計等第三方統計工具,通過嵌入 APP SDK 或 JS SDK 來直接查看統計數據。這種方式簡單、免費,基本滿足宏觀基礎數據分析需求,如訪問量、活躍用戶量等。但使用這類統計工具的用戶很快便會發現簡單免費的同時存在一些問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"a."},{"type":"text","marks":[{"type":"strong"}],"text":"由於數據採集不夠完整,無法實現深度分析"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式的 SDK 只能採集到一些基本的用戶行爲數據,如設備的基本信息、用戶執行的基本操作等數據,無法採集到一些精細化的維度。例如,在一些提交操作中,提交對應的個人信息、事項等信息無法採集,導致後續的分析成了“巧婦難爲無米之炊”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"b. "},{"type":"text","marks":[{"type":"strong"}],"text":"安全顧慮"},{"type":"text","text":",雲模式的數據分析平臺讓不少企業不願意將核心數據放在第三方平臺上。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"c. "},{"type":"text","marks":[{"type":"strong"}],"text":"基於第三方平臺,缺乏靈活度,無法滿足定製化的需求,"},{"type":"text","text":"比如數據採集的各種採集策略動態控制等等。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、通過業務數據庫做統計分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過業務數據庫實現統計分析時,一些基於業務數據庫中存儲的事項、用戶信息等數據,進行常規的統計分析需求,實時且準確,但也有不足之處。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,性能較差,無法進行批量數據操作。業務數據表設計針對高併發、低延遲的小操作,而數據分析常常針對大數據進行批量操作,導致性能很差。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,缺少必要的數據字段。業務數據庫是爲滿足正常的業務運轉服務的,而有些分析需求用到的信息並不會在業務數據庫中出現。比如瀏覽器版本信息,設備信息等,我們在進行數據分析時就會用到,分析不同設備版本的用戶轉化情況,但是正常的業務流程並不使用,這時我們就無法進行對應的分析。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、通過後端接口去做代碼打點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在某個控件操作發生時通過預先寫好的代碼來發數據的代碼埋點,能夠做到精細化的獲取用戶行爲數據,但包括用戶的綁定、用戶行爲路徑的分析,頁面時長的統計、以及後續數據的流轉並沒有成體系,往往還是需要結合業務數據庫去做最終的數據統計分析,所以與通過業務數據庫做統計分析存在同樣的性能問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、數據採集系統總體方案設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上述現狀,結合政務項目自身特點(我們希望能保證採集數據的安全問題,又能攜帶用戶屬性和事件屬性實現精細化的數據運營,還需要保證批量分析操作的性能,同時又能靈活方便的動態修改數據採集的採集策略,滿足需求的多變性),我們搭建了一套自己的數據採集系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們整個數據集採系統共包括5個模塊:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"前端sdk、後端服務、etl解析模塊、可視化埋點websocket服務、後臺管理端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的數據採集系統支持多個前端渠道(包括PC、移動端(iOS、android)、小程序)的sdk集成,同時支持多種數據採集方式,包括代碼埋點、全埋點、可視化埋點三種。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a、代碼埋點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是目前常用的數據採集方式,主要包括web、h5頁面的JS埋點、移動端的iOS、Android埋點、微信小程序等.通過代碼方式進行埋點,優點是數據採集比較全面、準確"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b、全埋點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相對於傳統的採集方式,全埋點的採集方式更加簡單、快捷,並且可以看到頁面元素點擊的情況,更加了解自身的產品特點。缺點是採集的數據過於多,只要是可點擊元素都會採集,上傳數據多,消耗流量多。無法採集到更深維度的信息,如事件的屬性,用戶的屬性等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"c、可視化埋點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可視化埋點是基於全埋點之上,需要業務同事對頁面的元素進行圈選,被選擇的元素纔會採集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可視化埋點基本和全埋點相同,具有同樣的優缺點,雖然解決了全埋點數據雜亂的問題,但是每次頁面的結構變化,都會使選擇失效,需要重新圈選才可以,業務人員工作量較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以根據不同的業務需求去選擇不同的數據採集方式組合,從而真正實現精細化的數據採集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時我們還支持以雲配置的方式,滿足不同的數據採集策略實時變更的需求,省去了傳統的變更採集策略需要發版本的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b6\/b65cc99dae6318fdd81dde74d2f829ee.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、核心技術實現"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、全埋點"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a、代碼攔截系統事件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以 iOS 爲例。動態地在函數調用前後插入相應的代碼,在 Objective-C 中我們可以利用 Runtime 特性,用 Method Swizzling 來 hook 相應的函數,爲了給所有類方便地 hook,我們可以給 NSObject 添加個 Category,名字叫做 NSObject+MethodSwizzling,"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"+(void)swizzleMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector {\nClassclass=[selfclass];\n\/\/原有方法\nMethod originalMethod = class_getInstanceMethod(class, originalSelector);\n\/\/替換原有方法的新方法\nMethod swizzledMethod = class_getInstanceMethod(class, swizzledSelector);\n\/\/先嚐試給源SEL添加IMP,這裏是爲了避免源SEL沒有實現IMP的情況\n BOOL didAddMethod = class_addMethod(class,originalSelector,\n method_getImplementation(swizzledMethod),\n method_getTypeEncoding(swizzledMethod));\nif(didAddMethod){\/\/添加成功:表明源SEL沒有實現IMP,將源SEL的IMP替換到交換SEL的IMP\n class_replaceMethod(class,swizzledSelector,\n method_getImplementation(originalMethod),\n method_getTypeEncoding(originalMethod));\n}else{\/\/添加失敗:表明源SEL已經有IMP,直接將兩個SEL的IMP交換即可\n method_exchangeImplementations(originalMethod, swizzledMethod);\n}\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b、全量收集"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全量收集採用 hook AppDelegate 代理、UIViewController 生命週期、按鈕點擊事件、手勢事件、各種系統控件的點擊回調方法、應用狀態切換等實現。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
動作
事件
UIViewController 生命週期函數
給 UIViewController 添加分類,hook 生命週期
UIButton 等點擊
UIButton 添加分類,hook 點擊事件
手勢事件 UITapGestureRecognizer、UIControl、UIResponder
相應系統事件"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以統計PV 事件爲例,我們對 UIViewController 進行 hook,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\/\/ load 方法裏面添加 dispatch_once 是爲了防止手動調用 load 方法。\n+(void)load {\nstaticdispatch_once_t onceToken;\n dispatch_once(&onceToken,^{\n@autoreleasepool{\n[[selfclass] swizzleMethod:@selector(viewWillAppear:) swizzledSelector:@selector(zg_viewDidAppear:)];\n[[selfclass] swizzleMethod:@selector(viewWillDisappear:) swizzledSelector:@selector(zg_viewDidAppear:)];\n}\n});\n}\n\n-(void)zg_viewDidAppear:(BOOL)animated {\n\/\/ do something\n CCBFT * CCBFT =[CCBFT sharedInstance];\nNSMutableDictionary*data =[NSMutableDictionary dictionary];\n[data setObject:@\"pv\" forKey:@\"$eid\"];\n[data setObject:isNil([selfCCBFTScreenName]) forKey:@\"$url\"];\n[data setObject:isNil([selfCCBFTScreenTitle]) forKey:@\"$page_title\"];\n[data setObject:isNil(CCBFT.ref) forKey:@\"$ref\"];\n[CCBFT autoTrack:data];\n[CCBFT startTrack:@\"test\"];\n[self zg_viewDidAppear:animated];\n}\n-(void)zg_viewDidDisappear:(BOOL)animated {\n\/\/ do something\n CCBFT * CCBFT =[CCBFT sharedInstance];\n[CCBFT endTrack:@\"test\" properties:@{@\"\":@\"\"}];\n[self zg_viewDidDisappear:animated];\n}"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、可視化埋點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/69\/69696b6a422abedfe4d9c338f60c0857.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"可視化埋點方案圖示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據標識來識別每一個事件, 針對指定的事件進行取參埋點。而事件的標識與參數信息都寫在配置表中,通過動態下發配置表來實現埋點統計。設置狀態通過特定動作來與前端建立socket連接,並傳遞當前應用的界面信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工作狀態從服務器後臺獲取當前的配置信息,依據路徑表來查找想要監測的view,並添加代理來統計行爲,利用一份配置表來管理這個“事件唯一標識符“。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏主要分爲兩個部分 :"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"a、事件的鎖定"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件的鎖定主要是靠 “事件唯一標識符”來鎖定,而事件的唯一標識是由我們寫入配置表中的。這裏分爲兩種,本地配置表和線上下載的配置表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"b、埋點數據的上報。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"埋點數據的數據又分爲兩種類型:固定數據與可變的業務數據, 而固定數據我們可以直接寫到配置表中,通過唯一標識來獲取。而對於業務數據,我是這麼理解的:數據是有持有者的,例如我們Controller的一個屬性值,又或者數據再Model的某一個層級。這麼的話我們就可以通過KVC的的方式來遞歸獲取該屬性的值來取到業務數據。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總而言之,要做好用戶數據行爲分析,數據源很重要,我們要更“全”、更“細”地採集數據。無論選取什麼樣的數據採集方式,這些都是手段,需要根據不同的應用場景,靈活設計數據採集方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上,就是我們針對北京事業羣政務項目需要選擇的數據採集方式的介紹,希望能對你們瞭解數據採集與埋點有幫助!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:金科優源匯(ID:jkyyh2020)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/xFymhdKaTlbjyHn7n9Z5CQ","title":"xxx","type":null},"content":[{"type":"text","text":"數據採集方案設計與實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章