環境搭建
hadoop安裝(單節點)
版本:2.7.7
https://cloud.tencent.com/developer/article/1348631
scala及spark安裝
scala:2.12.8
spark:2.4.4-hadoop2.7
https://www.jianshu.com/p/8c0b1b39d0e5
使用pyspark-shell時,添加環境變量
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=“notebook --ip=192.168.1.69”
依賴庫安裝(以matplotlib爲例)
pip3 install matplotlib -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
總結(思維導圖)
ps: 需要原始思維導圖的點擊