實用機器學習筆記一:概述

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"前言:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文是個人在B站自學李沐老師的實用機器學習課程【斯坦福2021秋季中文同步】的學習筆記。目前已","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經看了三個視頻,感覺沐神講解的非常棒yyds。爲什麼叫做實用機器學習呢?老師在課程中說到,他的這個機器學習課程和以往學校開設的或者網課開設的不同,這個課程更加接地氣,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"更加貼和工業界的落地實現以及遇到的一些問題和解決方案。","attrs":{}},{"type":"text","text":"個人認爲,對於已經工作,或者即將工作的來說,這門課程絕對是你所需要的【這裏只是強烈推薦一下這個課程,哈哈。因爲講的太好了】。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"bgcolor","attrs":{"color":"#BAE7A1","name":"green"}},{"type":"strong","attrs":{}}],"text":"系列文章地址請下滑至頁面最底部!","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"機器學習工作流:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 工業界落地機器學習和學術界會有一些不同,學術界拿到數據集訓練之後,效果有漲點,說明設計的","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型還不錯,可能就寫個論文發表就完事了。但是工業界要考慮很多因素,要監控上線的模型的預測結果是否符合預期,是否爲業務帶來了收益,用戶的數據分佈變化了模型是否依然可用等等問題,因此要持續監控模型的工作情況,然後不斷地進行訓練調整等。在工業界機器學習的落地工作流可以用下圖表示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0f/0f12511b9633bc9e132da237f7627429.png","alt":null,"title":"機器學習落地工作流","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"boxShadow"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上圖中我們可以看到,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.","attrs":{}},{"type":"text","text":" 首先要進行問題建模,不過要切記的是:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"不是所有的問題都可以建模爲機器學習問題,有些很複雜的問題我們可以用機器學習來解決,但是有些比較簡單的問題,我們卻不能用機器學習來解決","attrs":{}},{"type":"text","text":"。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.","attrs":{}},{"type":"text","text":" 當建模完成之後,就要收集數據,對數據進行處理,做成數據集。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.","attrs":{}},{"type":"text","text":" 解決就要使用數據集來訓練模型,並不斷微調。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.","attrs":{}},{"type":"text","text":" 模型訓練完成之後,就要上線了,讓模型服務於公司的某個業務,提高盈利。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.","attrs":{}},{"type":"text","text":" 但是模型上線後不能說不管了,我們還要一直監控模型的運行情況,比如預測是否準確,公司的盈利情況相比以前是否有增長等。並且由於模型是長期服務的,用戶羣體可能會發生變化,導致數據的分佈規律發生變化,這就會影響到模型的準確率,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.","attrs":{}},{"type":"text","text":" 因此還要收集新的數據並處理數據對模型進行重新訓練並微調。這是一個不斷輪迴的過程。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"挑戰:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"問題建模:","attrs":{}},{"type":"text","text":"在工業中,並不是所有可以建模爲機器學習的問題都要用機器學習來解決,還要考慮各種成本,只有某些業務的收入佔比本來比較高,使用機器學習之後,並可以獲得更高收益的,值得用機器學習來解決。也就是說解決最有價值的工業問題。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據:","attrs":{}},{"type":"text","text":"信息時代,不缺數據,只缺少有用的數據,好的數據。而且數據還涉及到隱私問題。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"訓練模型:","attrs":{}},{"type":"text","text":"現在的模型是越來越大,需要的數據也是越來越多,訓練成本也是越來越高。如何平衡,是一個挑戰。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部署模型:","attrs":{}},{"type":"text","text":"繁重的計算量對於實時推理不友好。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"監控:","attrs":{}},{"type":"text","text":"數據分佈變化,公平性問題(模型是公平的,但是訓練模型的數據是有偏好的)","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"人的角色:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"領域專家:","attrs":{}},{"type":"text","text":"有商業領域的知識,知道哪些數據是重要的,以及如何獲取,並且可以論證機器學習模型對業務的影響。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據科學家:","attrs":{}},{"type":"text","text":"主要聚焦於數據挖掘,模型訓練和部署。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"機器學習專家:","attrs":{}},{"type":"text","text":"訓練,選擇,調整SOTA 機器學習模型。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"軟件開發工程師:","attrs":{}},{"type":"text","text":"打通數據流,訓練模型,維護模型(更換模型,重新訓練模型等)和代碼。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"技能提升路徑:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/98d306ec99e6cb7d138c81204f106687.png","alt":null,"title":"技能提升路徑","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"主要內容:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"如何收集處理數據,數據分佈變化","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"模型驗證,融合,超參數調整,遷移學習,多模態","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"如何部署,性能考慮,設備選擇,模型蒸餾","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"模型公平,模型可解釋性","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"系列文章地址導航:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記二:數據獲取 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/dec72118e895e75277e29dd16","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/dec72118e895e75277e29dd16","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記三:網頁數據抓取 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/ee87dd03d59f8a6c5b4e6e887","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/ee87dd03d59f8a6c5b4e6e887","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記四:數據標註 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/019b54c1ea5aff5fcf2095eae","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/019b54c1ea5aff5fcf2095eae","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記五:探索性數據分析 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/cb001e4a42c1d99d2e2a0d4dd","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/cb001e4a42c1d99d2e2a0d4dd","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記六:數據清理 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/4e1c4d16b1b27cda01a6155f8","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/4e1c4d16b1b27cda01a6155f8","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記七:數據變換 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/a02f23baf6afa1b4a29d9ce06","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/a02f23baf6afa1b4a29d9ce06","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記八:特徵工程 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/a63e4d7486f6a7180591d3a64","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/a63e4d7486f6a7180591d3a64","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記九:數據部分總結 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/f89f336ef589a81ced0b7263d","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/f89f336ef589a81ced0b7263d","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十:機器學習模型 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/c2a90c25f168b242f3fd066a5","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/c2a90c25f168b242f3fd066a5","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十一:決策樹 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/13eadf2dbf9c1a010d1edbfbc","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/13eadf2dbf9c1a010d1edbfbc","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十二:線性模型 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/d543865af854129654000837e","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/d543865af854129654000837e","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十三:隨機梯度下降 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/aa9b4e0293b203634b6827c4f","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/aa9b4e0293b203634b6827c4f","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十四:多層感知機 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/808f216ce8b15122f0c2fbd67","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/808f216ce8b15122f0c2fbd67","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十五:卷積神經網絡 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/b1cde2c53bccc1888ebb5e14c","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/b1cde2c53bccc1888ebb5e14c","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"實用機器學習筆記十六:循環神經網絡 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/4f27c495504b7029cb175050e","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/4f27c495504b7029cb175050e","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章