環境介紹
系統 :Centos6.5
軟件版本: hadoop2.6.0 jdk1.8 scala-2.11.7 spark-1.4.1-bin-hadoop2.6
集羣狀態:
master: www 192.168.78.110
slave1: node1 192.168.78.111
slave2: node2 192.168.78.112
hosts 文件
192.168.78.110 www
192.168.78.111 node1
192.168.78.112 node2
確保三臺機器之間互ping 主機名能ping通
1. 下載hadoop,scala,spark,並解壓到/opt/hadoop下
[hadoop@www hadoop]$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1-bin-hadoop2.6.tgz
[hadoop@www hadoop]$ wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz?_ga=1.262254604.1613215006.1446896742
[hadoop@www hadoop]$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz
[hadoop@www hadoop]$ tar -xzvf spark-1.4.1-bin-hadoop2.6.taz //解壓壓縮包
[hadoop@www hadoop]$ tar -xzvf scala-2.11.7.tgz
[hadoop@www hadoop]$ tar -xzvf hadoop-2.6.2.tar.gz
最後結果爲
[hadoop@www hadoop]$ pwd
/opt/hadoop
[hadoop@www hadoop]$ ll
總用量 12
drwxr-xr-x. 11 hadoop hadoop 4096 11月 8 08:30 hadoop-2.6.2
drwxr-xr-x. 6 hadoop hadoop 4096 11月 8 18:40 scala-2.11.7
drwxr-xr-x. 11 hadoop hadoop 4096 11月 8 18:40 spark-1.4.1-bin-hadoop2.6
- 配置hadoop完全分佈式集羣環境,詳情見http://blog.csdn.net/erujo/article/details/49716841
- 編輯 ~/.bashrc文件 配置環境變量配置
[hadoop@www scala-2.11.7]$ vimx ~/.bashrc
# User specific aliases and functions
export JAVA_HOME=/usr/java/jdk1.8.0_65
export SCALA_HOME=/opt/hadoop/scala-2.11.7
export HADOOP_HOME=/opt/hadoop/hadoop-2.6.2
export SPARK_HOME=/opt/hadoop/spark-1.4.1-bin-hadoop2.6
PATH=$PATH:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${HADOOP_HOME}/bin
[hadoop@www scala-2.11.7]$ source !$
source ~/.bashrc
測試scala
[hadoop@www scala-2.11.7]$ scala
Welcome to Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65).
Type in expressions to have them evaluated.
Type :help for more information.
scala> //說明成功
copy到slave機器
[hadoop@www scala-2.11.7]$ scp ~/.bashrc [email protected]:~/.bashrc
- 在master主機配置spark
4.1 spark-env.sh
[hadoop@www hadoop]$ cd spark-1.4.1-bin-hadoop2.6/conf/
[hadoop@www conf]$ mv spark-env.sh.template spark-env.sh
[hadoop@www conf]$ vimx spark-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_65
export SCALA_HOME=/opt/hadoop/scala-2.11.7
export SPARK_MASTER_IP=192.168.78.110
export SPARK_WORKER_MEMORY=2g
export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.6.2/etc/hadoop
4.2 slaves
[hadoop@www conf]$ vimx slaves
node1
node2
配置好後將spark目錄複製到從節點上
- 啓動spark分佈式集羣並查看信息
[hadoop@www conf]$ /opt/hadoop/hadoop-2.6.2/sbin/start-all.sh
[hadoop@www conf]$ /opt/hadoop/spark-1.4.1-bin-hadoop2.6/sbin/start-all.sh
查看進程
master
[hadoop@www spark-1.4.1-bin-hadoop2.6]$ jps
8725 jps
8724 Master
6679 ResourceManager
6504 SecondaryNameNode
6264 NameNode
slave
[hadoop@node1 spark-1.4.1-bin-hadoop2.6]$ jps
8880 Worker
8993 Jps
6770 NodeManager
6349 DataNode
如果進程都有則啓動成功
- 啓動spark-shell控制檯
[hadoop@www spark-1.4.1-bin-hadoop2.6]$ spark-shell
之前我們在/input 目錄上傳了一個test.log文件,我們現在就用spark讀取hdfs中test.log文件
現在用spark進行測試
scala> val file = sc.textFile("hdfs://master:9000/input/test.log")
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
scala> count.collect()
最後一行可見
15/11/08 19:49:28 INFO scheduler.DAGScheduler: Job 0 finished: collect at <console>:26, took 16.682841 s
res0: Array[(String, Int)] = Array((hadoop,1), (hello,2), (world,1))
在http://192.168.78.110:4040/stages網頁上也可以看到相關內容
停止spark
[hadoop@www spark-1.4.1-bin-hadoop2.6]$ /opt/hadoop/spark-1.4.1-bin-hadoop2.6/sbin/stop-all.sh