大數據基礎(四)Ubuntu sbt安裝和Spark下的使用

Ubuntu sbt安裝和Spark下的使用
環境:
ubuntu server 14.04.04 amd64,hadoop2.6.2,scala 2.11.7,sbt 0.13.11,jdk 1.8


一、安裝方法一:
下載tgz壓縮包
1 下載
root@spark:~# wget https://dl.bintray.com/sbt/native-packages/sbt/0.13.11/sbt-0.13.11.tgz


2 解壓
root@spark:/usr/local# tar zxvf sbt-0.13.11.tgz
3 賦執行權限
root@spark:/usr/local/sbt# chmod u+x bin/sbt
4 環境變量
vi ~/.bashrc
寫入
export SBT_HOME=/usr/local/sbt
export PATH=${SBT_HOME}/bin:$PATH
生效
source ~/.bashrc
參考
http://www.linuxdiyf.com/linux/14871.html


二、安裝方法二:
wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb
dpkg -i sbt-0.13.11.deb 

apt-get update

apt-get -f install

apt-get install sbt
參考

http://stackoverflow.com/questions/13711395/install-sbt-on-ubuntu

http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html





三、使用一
比如使用如下代碼
object Hi {
  def main(args: Array[String]) = println("Hi!")
}
可以在命令行輸入:
$ mkdir hello
$ cd hello
$ echo 'object Hi { def main(args: Array[String]) = println("Hi!") }' > hw.scala
$ sbt
...
> run
...
Hi!


四、使用二 -- Spark
參考書:Big Data Analytics with Spark - chapter 5


5 使用


5.1 創建目錄,.scala,.sbt文件
mkdir WordCount
//WordCount.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object WordCount {
    def main(args: Array[String]): Unit = {
    val inputPath = args(0)
    val outputPath = args(1)
    val sc = new SparkContext()
    val lines = sc.textFile(inputPath)
    val wordCounts = lines.flatMap {line => line.split(" ")}
    .map(word => (word, 1))
    .reduceByKey(_ + _)
    wordCounts.saveAsTextFile(outputPath)
    }
}


//wordcount.sbt
//參考:http://www.scala-sbt.org/0.13/docs/Hello.html
//參考:http://www.scala-sbt.org/0.13/docs/Basic-Def.html
//舊版sbt
//name := "word-count"
//version := "1.0.0"
//scalaVersion := "2.10.6"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
新版sbt 2.11.7用法如下:
//wordcount.sbt
lazy val root=(project in file(".")).
  settings(
    name := "word-count",
    version := "1.0.0",
    scalaVersion := "2.11.7",
    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
  )


結果:
root@spark:~/WordCount# ls
wordcount.sbt  WordCount.scala


5.2 編譯
首次運行sbt會下很多jar包,可以用sbt -v package編譯,顯示詳情,網速慢等很久。
cd WordCount
sbt package
結果:
Getting Scala 2.10.6 (for sbt)...
downloading https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.6/jline-2.10.6.jar ...
[SUCCESSFUL ] org.scala-lang#jline;2.10.6!jline.jar (3764ms)
downloading https://repo1.maven.org/maven2/org/fusesource/jansi/jansi/1.4/jansi-1.4.jar ...
[SUCCESSFUL ] org.fusesource.jansi#jansi;1.4!jansi.jar (2979ms)
:: retrieving :: org.scala-sbt#boot-scala
confs: [default]
5 artifacts copied, 0 already retrieved (24494kB/6179ms)
[info] Set current project to root (in build file:/root/)
root@spark:~/WordCount# sbt package
[info] Set current project to word-count (in build file:/root/WordCount/)
[info] Updating {file:/root/WordCount/}root...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /root/WordCount/target/scala-2.11/classes...
[info] Packaging /root/WordCount/target/scala-2.11/word-count_2.11-1.0.0.jar ...
[info] Done packaging.
[success] Total time: 38 s, completed Jun 10, 2016 11:33:59 AM
如下:
root@spark:~/WordCount/target/scala-2.11# ls
classes  word-count_2.11-1.0.0.jar


5.3 Spark submit運行jar包
注意先啓動Hadoop 2.6.2
查看輸入文件【上傳可以用hadoop fs -put /home/alex/t1.log /】
root@spark:~/WordCount# hadoop fs -ls /
-rw-r--r--   1 root supergroup        556 2016-06-06 07:00 /t1.log
root@spark:~/WordCount# hadoop fs -cat /t1.log
log文本內容如下
[BEGIN] 2016/6/4 12:30:12
[2016/6/4 12:30:12] Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 3.19.0-59-generic x86_64)
[2016/6/4 12:30:12]
[2016/6/4 12:30:12]  * Documentation:  https://help.ubuntu.com/
[2016/6/4 12:30:12] Last login: Sat Jun  4 06:40:54 2016 from 192.168.10.1
[2016/6/4 12:30:16] root@spark:~# ls
[2016/6/4 12:30:16] cleanjob  derby.log  image.jpg  input  metastore_db  testtable.java
[2016/6/4 12:30:21] root@spark:~# cd /home/alex
[2016/6/4 12:30:22] root@spark:/home/alex# ls
[2016/6/4 12:30:22] pcshare  seed.txt  xdata  xsetups


提交spark submit
root@spark:~/WordCount/target/scala-2.11# /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class "WordCount" --master local[*] word-count_2.11-1.0.0.jar /t1.log /outsbt
上邊命令中的jar包在本地(~/WordCount/target/scala-2.11/),不是在HDFS上,跑的是單機local模式 輸入文件在HDFS上 /t1.log 輸出也在HDFS上 /outsbt文件夾
查看/outsbt
root@spark:~/WordCount/target/scala-2.11# hadoop fs -ls /outsbt
Found 2 items
-rw-r--r--   1 root supergroup          0 2016-06-10 11:53 /outsbt/_SUCCESS
-rw-r--r--   1 root supergroup        565 2016-06-10 11:53 /outsbt/part-00000
查看文件
root@spark:~/WordCount/target/scala-2.11# hadoop fs -cat /outsbt/part-00000
(x86_64),1)
(ls,2)
(4,1)
(12:30:12],4)
(06:40:54,1)
(3.19.0-59-generic,1)
(metastore_db,1)
(cd,1)
(/home/alex,1)
(cleanjob,1)
(root@spark:~#,2)
([2016/6/4,9)
(2016/6/4,1)
(pcshare,1)
(Ubuntu,1)
(xsetups,1)
(12:30:12,1)
(login:,1)
(Welcome,1)
(,11)
(image.jpg,1)
(root@spark:/home/alex#,1)
(to,1)
(*,1)
(Jun,1)
(2016,1)
(Documentation:,1)
(https://help.ubuntu.com/,1)
(12:30:16],2)
(LTS,1)
(xdata,1)
(12:30:22],2)
(derby.log,1)
([BEGIN],1)
(Sat,1)
(seed.txt,1)
(input,1)
(Last,1)
(14.04.4,1)
(from,1)
(12:30:21],1)
((GNU/Linux,1)
(192.168.10.1,1)
(testtable.java,1)


6 參考:
Big Data Analytics with Spark -- Chapter 5
http://www.scala-sbt.org/0.13/docs/Hello.html
http://www.scala-sbt.org/0.13/docs/Basic-Def.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章