大數據實驗——用Spark實現wordcount單詞統計

一、實驗目的

  1. 學會啓用spark
  2. 將文本上傳到hdfs上
  3. 在scala模式下編寫單詞統計

二、實驗過程

  1. 瞭解spark的構成

 

2、具體步驟

    1、打開一個終端,啓動hadoop

hadoop@dblab-VirtualBox:/usr/local/hadoop/sbin$ ./start-all.sh

    2、啓動spark

hadoop@dblab-VirtualBox:/usr/local/spark/bin$ ./spark-shell

        如下所示則spark啓動成功

18/08/29 20:09:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/08/29 20:09:26 WARN Utils: Your hostname, dblab-VirtualBox resolves to a loopback address: 127.0.1.1, but we couldn't find any external IP address!

18/08/29 20:09:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

Spark context Web UI available at http://127.0.1.1:4040

Spark context available as 'sc' (master = local[*], app id = local-1535544589211).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0

      /_/

         

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

Type in expressions to have them evaluated.

Type :help for more information.

scala>

 

3、打開第二個終端,進行編寫需要統計的文件並上傳

 

hadoop@dblab-VirtualBox:/usr/local/spark/bin$ vim a

hadoop@dblab-VirtualBox:/usr/local/spark/bin$./hadoop fs -mkdir /input

hadoop@dblab-VirtualBox:/usr/local/hadoop/bin$ ./hdfs dfs -put a /

hadoop@dblab-VirtualBox:/usr/local/hadoop/bin$ cat a

kjd,kjd,ASDF,sjdf,jsadf

klfgldf.fdgjkaj

4、回到第一個終端,在scala下進行讀取

scala> sc.textFile("hdfs:localhost:9000/a").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect

     結果如下

scala> sc.textFile("hdfs://localhost:9000/a").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect

res1: Array[(String, Int)] = Array((sjdf,1), (klfgldf.fdgjkaj,1), (kjd,2), (ASDF,1), (jsadf,1))

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章