spark單機環境搭建以及快速入門

1 單機環境搭建

系統環境

cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

配置jdk8

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz"
tar -xf jdk-8u181-linux-x64.tar.gz

echo 'export JAVA_HOME=/home/work/fsj/jdk1.8.0_181
export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc

source ~/.bashrc
java -version

配置spark

從http://spark.apache.org/downloads.html 下載最新版spark預編譯包並解壓。

echo 'export SPARK_HOME=/home/work/fsj/spark-2.3.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
run-example SparkPi 10  # 運行例子

2 spark-shell

$ spark-shell --master local[2]
2018-09-02 16:12:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[2], app id = local-1535875965532).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc
res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@674aa626

scala> val textFile = spark.read.textFile("README.md")
2018-09-02 16:16:44 WARN  ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2018-09-02 16:16:45 WARN  ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
2018-09-02 16:16:45 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

scala> textFile.first()
res2: String = # Apache Spark

可以看到shell給出了ui地址:http://localhost:4040

3 獨立項目

spark-submit提交作業包括local模式和集羣模式。這裏只涉及local模式。

通過maven來管理Scala依賴,新建SimpleApp項目。
在pom文件中需要加上scala maven plugin
由於插件包含scala,所以我們用maven編譯項目時,本地並不需要配置scala

代碼和具體pom寫法見:https://github.com/shenjiefeng/spark-examples/tree/master/SimpleApp

$ cd /path/to/SimpleApp && mvn clean package  # 建議選擇國內mvn源
$ tree
.
├── pom.xml
├── src
│   └── main
│       └── scala
│           └── SimpleApp.scala
└── target
    ├── classes
    │   ├── SimpleApp$$anonfun$1.class
    │   ├── SimpleApp$$anonfun$2.class
    │   ├── SimpleApp.class
    │   └── SimpleApp$.class
    ├── classes.timestamp
    ├── maven-archiver
    │   └── pom.properties
    ├── simple-project-1.0.jar
    ├── surefire
    └── test-classes

$ spark-submit   --class "SimpleApp"   --master local[*]   target/simple-project-1.0.jar
...

修改spark-submit命令的日誌級別:

$ cd $SPARK_HOME
$ cp conf/log4j.properties.template conf/log4j.properties
log4j.rootCategory=INFO, console # 改成WARN

$ spark-submit   --class "SimpleApp"   --master local[*]   target/simple-project-1.0.jar
18/09/02 17:27:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Lines with a: 61, Lines with b: 30

更多

  • 集羣環境搭建:https://showme.codes/2017-01-31/setup-spark-dev-env/
  • http://spark.apache.org/docs/latest/quick-start.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章