Project
1.AML
有label的: - down sampling /xgboost/Hql
無label的: - Autoencoder
2.CRANE: 改正features / add new features
3. Branchpiitsstop
- R/R shiny/Xgboost explainer/Shap value
4. Spark
- 改寫pyspark
- Audit report 重新clustering (LDA)
hql和sql 的區別:https://blog.csdn.net/qq_28633249/article/details/77884062
項目用到的算法:
Xgboost(原理 https://zhuanlan.zhihu.com/p/92229766/調參 https://zhuanlan.zhihu.com/p/29649128);
boosting/bagging/stacking https://zhuanlan.zhihu.com/p/41809927;Decisoin tree;Autoencoder;LDA
機器學習算法
1.常用算法
LR https://zhuanlan.zhihu.com/p/40994642
SVM https://zhuanlan.zhihu.com/p/84796233
GBDT
Decision tree https://blog.csdn.net/sinat_30353259/article/details/80917362(CART/IDR3/C4.5)
random forest
LightGBM https://zhuanlan.zhihu.com/p/99069186
GBDT/DT https://zhuanlan.zhihu.com/p/81368182
https://zhuanlan.zhihu.com/p/34534004
2. 常用異常檢測算法
Isolation forest https://zhuanlan.zhihu.com/p/27777266
dbscan https://zhuanlan.zhihu.com/p/88747614
autoencoder https://blog.csdn.net/Jasminexjf/article/details/88720999
3. 常用圖概念 https://zhuanlan.zhihu.com/p/28298952
Pagerank
autority
hub score
4. 聚類 https://zhuanlan.zhihu.com/p/37381630
神經網絡
Autoencoder:https://blog.csdn.net/Jasminexjf/article/details/88720999
CNN:https://zhuanlan.zhihu.com/p/44255667
RNN LSTM https://zhuanlan.zhihu.com/p/88892937
參數/如何調參:https://zhuanlan.zhihu.com/p/45091568
神經網絡優化算法總結:https://zhuanlan.zhihu.com/p/89957194
LDA:https://zhuanlan.zhihu.com/p/92229766
基本排序算法 https://blog.csdn.net/weixin_39840982/article/details/100751141
樹的遍歷算法 https://zhuanlan.zhihu.com/p/70720129
Python:https://zhuanlan.zhihu.com/p/54430650
sql:https://zhuanlan.zhihu.com/p/38354000
pyspark:https://www.jianshu.com/p/7a8fca3838a4
一般流程
需求/數據- 做特徵- 特徵工程PCA/featuresel/建立新特徵- 數據層面(downsampling/upsampling)-normalize/scaler -feature selection -train-val_test- model -metrics(auc/roc curve/precison/recall/f1 score) - overfitting/underfitting- explainer
PCA:https://zhuanlan.zhihu.com/p/77151308
roc曲線: https://www.zhihu.com/question/22844912/answer/246037337
shap ratio: https://zhuanlan.zhihu.com/p/85791430
特徵選擇:https://www.zhihu.com/question/28641663/answer/110165221
評估方式:https://zhuanlan.zhihu.com/p/106649884
https://www.zhihu.com/question/23259302/answer/527513387
Sparkml lib