实用机器学习笔记一:概述

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"前言:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文是个人在B站自学李沐老师的实用机器学习课程【斯坦福2021秋季中文同步】的学习笔记。目前已","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"经看了三个视频,感觉沐神讲解的非常棒yyds。为什么叫做实用机器学习呢?老师在课程中说到,他的这个机器学习课程和以往学校开设的或者网课开设的不同,这个课程更加接地气,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"更加贴和工业界的落地实现以及遇到的一些问题和解决方案。","attrs":{}},{"type":"text","text":"个人认为,对于已经工作,或者即将工作的来说,这门课程绝对是你所需要的【这里只是强烈推荐一下这个课程,哈哈。因为讲的太好了】。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"bgcolor","attrs":{"color":"#BAE7A1","name":"green"}},{"type":"strong","attrs":{}}],"text":"系列文章地址请下滑至页面最底部!","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"机器学习工作流:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 工业界落地机器学习和学术界会有一些不同,学术界拿到数据集训练之后,效果有涨点,说明设计的","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型还不错,可能就写个论文发表就完事了。但是工业界要考虑很多因素,要监控上线的模型的预测结果是否符合预期,是否为业务带来了收益,用户的数据分布变化了模型是否依然可用等等问题,因此要持续监控模型的工作情况,然后不断地进行训练调整等。在工业界机器学习的落地工作流可以用下图表示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0f/0f12511b9633bc9e132da237f7627429.png","alt":null,"title":"机器学习落地工作流","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"boxShadow"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从上图中我们可以看到,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.","attrs":{}},{"type":"text","text":" 首先要进行问题建模,不过要切记的是:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"不是所有的问题都可以建模为机器学习问题,有些很复杂的问题我们可以用机器学习来解决,但是有些比较简单的问题,我们却不能用机器学习来解决","attrs":{}},{"type":"text","text":"。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.","attrs":{}},{"type":"text","text":" 当建模完成之后,就要收集数据,对数据进行处理,做成数据集。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.","attrs":{}},{"type":"text","text":" 解决就要使用数据集来训练模型,并不断微调。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.","attrs":{}},{"type":"text","text":" 模型训练完成之后,就要上线了,让模型服务于公司的某个业务,提高盈利。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.","attrs":{}},{"type":"text","text":" 但是模型上线后不能说不管了,我们还要一直监控模型的运行情况,比如预测是否准确,公司的盈利情况相比以前是否有增长等。并且由于模型是长期服务的,用户群体可能会发生变化,导致数据的分布规律发生变化,这就会影响到模型的准确率,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.","attrs":{}},{"type":"text","text":" 因此还要收集新的数据并处理数据对模型进行重新训练并微调。这是一个不断轮回的过程。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"挑战:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"问题建模:","attrs":{}},{"type":"text","text":"在工业中,并不是所有可以建模为机器学习的问题都要用机器学习来解决,还要考虑各种成本,只有某些业务的收入占比本来比较高,使用机器学习之后,并可以获得更高收益的,值得用机器学习来解决。也就是说解决最有价值的工业问题。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"数据:","attrs":{}},{"type":"text","text":"信息时代,不缺数据,只缺少有用的数据,好的数据。而且数据还涉及到隐私问题。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"训练模型:","attrs":{}},{"type":"text","text":"现在的模型是越来越大,需要的数据也是越来越多,训练成本也是越来越高。如何平衡,是一个挑战。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"部署模型:","attrs":{}},{"type":"text","text":"繁重的计算量对于实时推理不友好。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"监控:","attrs":{}},{"type":"text","text":"数据分布变化,公平性问题(模型是公平的,但是训练模型的数据是有偏好的)","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"人的角色:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"领域专家:","attrs":{}},{"type":"text","text":"有商业领域的知识,知道哪些数据是重要的,以及如何获取,并且可以论证机器学习模型对业务的影响。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"数据科学家:","attrs":{}},{"type":"text","text":"主要聚焦于数据挖掘,模型训练和部署。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"机器学习专家:","attrs":{}},{"type":"text","text":"训练,选择,调整SOTA 机器学习模型。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"软件开发工程师:","attrs":{}},{"type":"text","text":"打通数据流,训练模型,维护模型(更换模型,重新训练模型等)和代码。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"技能提升路径:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/98d306ec99e6cb7d138c81204f106687.png","alt":null,"title":"技能提升路径","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"主要内容:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"如何收集处理数据,数据分布变化","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"模型验证,融合,超参数调整,迁移学习,多模态","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"如何部署,性能考虑,设备选择,模型蒸馏","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"模型公平,模型可解释性","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"系列文章地址导航:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记二:数据获取 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/dec72118e895e75277e29dd16","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/dec72118e895e75277e29dd16","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记三:网页数据抓取 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/ee87dd03d59f8a6c5b4e6e887","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/ee87dd03d59f8a6c5b4e6e887","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记四:数据标注 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/019b54c1ea5aff5fcf2095eae","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/019b54c1ea5aff5fcf2095eae","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记五:探索性数据分析 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/cb001e4a42c1d99d2e2a0d4dd","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/cb001e4a42c1d99d2e2a0d4dd","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记六:数据清理 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/4e1c4d16b1b27cda01a6155f8","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/4e1c4d16b1b27cda01a6155f8","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记七:数据变换 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/a02f23baf6afa1b4a29d9ce06","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/a02f23baf6afa1b4a29d9ce06","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记八:特征工程 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/a63e4d7486f6a7180591d3a64","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/a63e4d7486f6a7180591d3a64","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记九:数据部分总结 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/f89f336ef589a81ced0b7263d","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/f89f336ef589a81ced0b7263d","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十:机器学习模型 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/c2a90c25f168b242f3fd066a5","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/c2a90c25f168b242f3fd066a5","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十一:决策树 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/13eadf2dbf9c1a010d1edbfbc","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/13eadf2dbf9c1a010d1edbfbc","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十二:线性模型 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/d543865af854129654000837e","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/d543865af854129654000837e","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十三:随机梯度下降 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/aa9b4e0293b203634b6827c4f","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/aa9b4e0293b203634b6827c4f","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十四:多层感知机 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/808f216ce8b15122f0c2fbd67","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/808f216ce8b15122f0c2fbd67","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十五:卷积神经网络 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/b1cde2c53bccc1888ebb5e14c","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/b1cde2c53bccc1888ebb5e14c","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"实用机器学习笔记十六:循环神经网络 ","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/4f27c495504b7029cb175050e","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/4f27c495504b7029cb175050e","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章