首先啓動環境:
1、啓動hdfs
[root@master conf]# start-dfs.sh
2、然後啓動spark
[root@master spark-2.2.0]# sbin/start-all.sh --master spark://master.hadoop:7077
[root@master spark-2.2.0]# bin/spark-shell --master spark://master.hadoop:7077
[root@master spark-2.2.0]# bin/spark-shell --master spark://slave3.hadoop:7077
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/08/30 13:59:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/30 13:59:39 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.1.5:4040
Spark context available as 'sc' (master = spark://slave3.hadoop:7077, app id = app-20180830135927-0000).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
執行此腳本集羣會發生的變化,master上會啓動這樣的shell,而work上會啓動CoarseGrainedExecutorBackend進程
master機器的進程:
[root@master spark-2.2.0]# jps
8643 DataNode
8548 NameNode
9205 Jps
8874 Master
worker機器的進程:
[root@slave1 conf]# jps
3472 DataNode
3540 SecondaryNameNode
3798 Worker
3862 CoarseGrainedExecutorBackend
3958 Jps
編寫wordcount程序:
首先將要計算的數據上傳至hdfs,我上傳至了/spark目錄下。
sc.textFile("hdfs://master.hadoop:9000/spark").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).sortBy(_._2).collect
並將結果收集到客戶端,並以數組的形式展示,執行結果如下:
scala> sc.textFile("hdfs://slave3.hadoop:9000/a.txt").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).sortBy(_._2).collect
res1: Array[(String, Int)] = Array((jim,1), (jarry,1), (wo,1), (ni,1), (hello,4))
scala>