Apache Spark簡介
Apache Spark是一個開源集羣運算框架。使用Spark需要搭配集羣管理和分佈式存儲系統。Spark 支持獨立、Hadoop YARN或者Apache Mesos 集羣管理。Spark可以和Hadoop Distrubuted File System (HDFS), MapR File System (MapR-FS), Cassandra, OpenStack Swift, Amazon S3 等分佈式存儲系統配合使用。Spark的運行依賴Scala編程語言。
Apache Spark 集羣搭建
1. 下載安裝Scala
Scala2.11.4 下載地址:http://www.scala-lang.org/download/2.11.4.html
解壓:[spark@Node1 apache_spark]$ tar zxvf scala-2.11.4.tgz
配置:編輯 ~/.bash_profile文件 增加SCALA_HOME環境變量配置
export SCALA_HOME=/home/spark/apache_spark/scala-2.11.4
PATH=$PATH:$SCALA_HOME/bin
export PATH
驗證Scala環境是否正常?(source ~/.bash_profile)
2. 安裝JDK
下載地址:http://www.oracle.com/technetwork/cn/java/javase/downloads/java-se-jdk-7-download-432154-zhs.html
解壓:tar jdk-7-linux-x64.tar.gz
3. 安裝Spark
集羣信息說明
Node1 192.168.100.101 (Master)
Node2 192.168.100.102 (Slaver)
Node3 192.168.100.103 (Slaver)
Node4 192.168.100.104 (Slaver)
下載Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop2.4.tgz
解壓
[spark@Node1 apache_spark]$ tar zxvf spark-1.2.0-bin-hadoop2.4.tgz
配置環境變量
export SPARK_HOME=/home/spark/apache_spark/spark-1.2.0-bin-hadoop2.4
PATH=$PATH:$SPARK_HOME/bin
修改配置文件:
cd /home/spark/apache_spark/spark-1.2.0-bin-hadoop2.4/conf
//1 修改spark-env.sh.template文件
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
//添加需要導出的環境變量
export SCALA_HOME=/home/spark/apache_spark/scala-2.11.4
export JAVA_HOME=/home/spark/apache_spark/jdk1.7.0_51
export SPARK_MASTER_IP=192.168.100.101
export SPARK_WORKER_MEMORY=512M
export master=spark://192.168.100.101:7077
//2 修改slaves文件
mv slaves.template slaves
vim slaves
//添加從節點
Node2
Node3
Node4
分發文件到從節點(已配置節點之間的ssh互信)
//分發配置文件
scp ~/.bash_profile Node2:~/.bash_profile
scp ~/.bash_profile Node3:~/.bash_profile
scp ~/.bash_profile Node4:~/.bash_profile
//分發安裝文件
scp -r ~/apache_spark node2:~/
scp -r ~/apache_spark node3:~/
scp -r ~/apache_spark node4:~/
啓動和停止spark
啓動Spark
/home/spark/apache_spark/spark-1.2.0-bin-hadoop2.4/sbin/start-all.sh
停止spark
/home/spark/apache_spark/spark-1.2.0-bin-hadoop2.4/sbin/stop-all.sh
查看Spark集羣信息
其他
設置時區:
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate time.windows.com
hwclock –systohc
參考
[1] Spark機器學習
[2] spark-1.2.0 集羣環境搭建
[3] spark1.3.1安裝和集羣的搭建