一、實驗目的
- 學會啓用spark
- 將文本上傳到hdfs上
-
在scala模式下編寫單詞統計
二、實驗過程
- 瞭解spark的構成
2、具體步驟
1、打開一個終端,啓動hadoop
hadoop@dblab-VirtualBox:/usr/local/hadoop/sbin$ ./start-all.sh
2、啓動spark
hadoop@dblab-VirtualBox:/usr/local/spark/bin$ ./spark-shell
如下所示則spark啓動成功
18/08/29 20:09:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/29 20:09:26 WARN Utils: Your hostname, dblab-VirtualBox resolves to a loopback address: 127.0.1.1, but we couldn't find any external IP address!
18/08/29 20:09:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://127.0.1.1:4040
Spark context available as 'sc' (master = local[*], app id = local-1535544589211).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
3、打開第二個終端,進行編寫需要統計的文件並上傳
hadoop@dblab-VirtualBox:/usr/local/spark/bin$ vim a
hadoop@dblab-VirtualBox:/usr/local/spark/bin$./hadoop fs -mkdir /input
hadoop@dblab-VirtualBox:/usr/local/hadoop/bin$ ./hdfs dfs -put a /
hadoop@dblab-VirtualBox:/usr/local/hadoop/bin$ cat a
kjd,kjd,ASDF,sjdf,jsadf
klfgldf.fdgjkaj
4、回到第一個終端,在scala下進行讀取
scala> sc.textFile("hdfs:localhost:9000/a").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect
結果如下
scala> sc.textFile("hdfs://localhost:9000/a").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect
res1: Array[(String, Int)] = Array((sjdf,1), (klfgldf.fdgjkaj,1), (kjd,2), (ASDF,1), (jsadf,1))