DayDayUP_大數據學習課程[2]_spark1.4.1集羣環境的搭建

環境介紹
系統 :Centos6.5
軟件版本: hadoop2.6.0 jdk1.8 scala-2.11.7 spark-1.4.1-bin-hadoop2.6
集羣狀態:
master: www 192.168.78.110
slave1: node1 192.168.78.111
slave2: node2 192.168.78.112
hosts 文件
192.168.78.110 www
192.168.78.111 node1
192.168.78.112 node2
確保三臺機器之間互ping 主機名能ping通
1. 下載hadoop,scala,spark,並解壓到/opt/hadoop下

[hadoop@www hadoop]$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1-bin-hadoop2.6.tgz
[hadoop@www hadoop]$ wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz?_ga=1.262254604.1613215006.1446896742
[hadoop@www hadoop]$  wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz  
[hadoop@www hadoop]$  tar -xzvf spark-1.4.1-bin-hadoop2.6.taz //解壓壓縮包
[hadoop@www hadoop]$  tar -xzvf scala-2.11.7.tgz
[hadoop@www hadoop]$  tar -xzvf hadoop-2.6.2.tar.gz 

最後結果爲

[hadoop@www hadoop]$ pwd
/opt/hadoop
[hadoop@www hadoop]$ ll
總用量 12
drwxr-xr-x. 11 hadoop hadoop 4096 118 08:30 hadoop-2.6.2
drwxr-xr-x.  6 hadoop hadoop 4096 118 18:40 scala-2.11.7
drwxr-xr-x. 11 hadoop hadoop 4096 118 18:40 spark-1.4.1-bin-hadoop2.6
  1. 配置hadoop完全分佈式集羣環境,詳情見http://blog.csdn.net/erujo/article/details/49716841
  2. 編輯 ~/.bashrc文件 配置環境變量配置
[hadoop@www scala-2.11.7]$ vimx ~/.bashrc 
# User specific aliases and functions
export JAVA_HOME=/usr/java/jdk1.8.0_65
export SCALA_HOME=/opt/hadoop/scala-2.11.7
export HADOOP_HOME=/opt/hadoop/hadoop-2.6.2
export SPARK_HOME=/opt/hadoop/spark-1.4.1-bin-hadoop2.6
PATH=$PATH:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${HADOOP_HOME}/bin
[hadoop@www scala-2.11.7]$ source !$
source ~/.bashrc

測試scala

[hadoop@www scala-2.11.7]$ scala
Welcome to Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65).
Type in expressions to have them evaluated.
Type :help for more information.

scala>   //說明成功

copy到slave機器

[hadoop@www scala-2.11.7]$ scp  ~/.bashrc  [email protected]:~/.bashrc
  1. 在master主機配置spark
    4.1 spark-env.sh
[hadoop@www hadoop]$ cd spark-1.4.1-bin-hadoop2.6/conf/
[hadoop@www conf]$ mv spark-env.sh.template spark-env.sh
[hadoop@www conf]$ vimx spark-env.sh 
export JAVA_HOME=/usr/java/jdk1.8.0_65
export SCALA_HOME=/opt/hadoop/scala-2.11.7
export SPARK_MASTER_IP=192.168.78.110
export SPARK_WORKER_MEMORY=2g
export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.6.2/etc/hadoop

4.2 slaves

[hadoop@www conf]$ vimx slaves
node1
node2

配置好後將spark目錄複製到從節點上

  1. 啓動spark分佈式集羣並查看信息
[hadoop@www conf]$ /opt/hadoop/hadoop-2.6.2/sbin/start-all.sh
[hadoop@www conf]$ /opt/hadoop/spark-1.4.1-bin-hadoop2.6/sbin/start-all.sh 

查看進程
master

[hadoop@www spark-1.4.1-bin-hadoop2.6]$ jps
8725 jps
8724 Master
6679 ResourceManager
6504 SecondaryNameNode
6264 NameNode

slave

[hadoop@node1 spark-1.4.1-bin-hadoop2.6]$ jps
8880 Worker
8993 Jps
6770 NodeManager
6349 DataNode

如果進程都有則啓動成功

  1. 啓動spark-shell控制檯
[hadoop@www spark-1.4.1-bin-hadoop2.6]$ spark-shell 

之前我們在/input 目錄上傳了一個test.log文件,我們現在就用spark讀取hdfs中test.log文件
現在用spark進行測試

scala> val file = sc.textFile("hdfs://master:9000/input/test.log")
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
scala> count.collect()

最後一行可見

15/11/08 19:49:28 INFO scheduler.DAGScheduler: Job 0 finished: collect at <console>:26, took 16.682841 s
res0: Array[(String, Int)] = Array((hadoop,1), (hello,2), (world,1))

http://192.168.78.110:4040/stages網頁上也可以看到相關內容

  1. 停止spark

    [hadoop@www spark-1.4.1-bin-hadoop2.6]$ /opt/hadoop/spark-1.4.1-bin-hadoop2.6/sbin/stop-all.sh

發佈了45 篇原創文章 · 獲贊 3 · 訪問量 4萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章