1、WordCount程序代碼
package com.first
import org.apache.spark.SparkContext
import SparkContext._
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
if (args.length != 2){
println("usage is org.test.WordCount <input> <output>")
return
}
val conf = new SparkConf()
val sc = new SparkContext(conf)
//val sc = new SparkContext(args(0), "WordCount",
// System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_TEST_JAR")))
val textFile = sc.textFile(args(0))
val result = textFile.flatMap(line => line.split("\\s+"))
.map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(1))
//result.foreach(f=>println)
sc.stop
}
}
2、通過spark-submit提交作業
在終端進去spark的bin目錄下執行(多種執行方式可以參考點擊打開鏈接)
./spark-submit --name WordCount1 --class com.first.WordCount --master yarn-cluster /home/hadoop/wangqiujie/wordcount2.jar wanginput/word.txt (此爲相對路徑)wangoutput(此爲相對路徑)
3、運行中出現了異常Exception in createBlockOutputStream
原因是229那個節點的防火牆沒有關閉。(常見異常可以參考點擊打開鏈接)