http://spark.apache.org/
1.下載文件並上傳
spark-2.2.0-bin-hadoop2.7.tgz
解壓:tar -zvxf spark-2.2.0-bin-hadoop2.7.tgz
2.準備4臺機器
bigdata01,bigdata02,bigdata03,bigdata04
Master:bigdata01,bigdata02
Worker:bigdata01,bigdata02,bigdata03,bigdata04
3.修改配置文件
/root/training/spark-2.2.0-bin-hadoop2.7/conf
3.1 修改spark-env.sh,基本配置
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
選擇standalone模式
Options for the daemons used in the standalone deploy mode
export JAVA_HOME=export JAVA_HOME=/root/training/jdk1.8.0_144/( 可以使用改命令:r!which java)
export SPARK_MASTER_HOST=bigdata01
export SPARK_MASTER_PORT=7077
3.2 修改slaves,具體執行任務的節點
mv slaves.template slaves
vim slaves
bigdata01
bigdata02
bigdata03
bigdata04
3.3拷貝到其他機器
for i in {2..4};
do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0$i:$PWD ;
done
for i in {2..4};do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0$i:$PWD ; done
4. 啓動shell,最好使用單獨shell腳本(start-master.sh和start-slave.sh),本文只是簡單搭建直接啓動start-all.sh
如果沒有免密碼登錄,配置一下免密碼登錄,否則每啓動一臺都需要輸入密碼
cd /root/training/spark-2.2.0-bin-hadoop2.7
sbin/start-all.sh
jps
只有01同時存在Master Worker,其他機器都爲Worker
5.瀏覽器查看spark集羣
http://bigdata01:8080/ (netty)
URL: spark://bigdata01:7077
REST URL: spark://bigdata01:6066 (cluster mode)
Alive Workers: 4
Cores in use: 4 Total, 0 Used 線程的數量
Memory in use: 4.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
192.168.111.103:44524
44524:work和master通訊的端口