1、構建SBT項目環境
mkdir -p ~/kmeans/src/main/scala
2、編寫kmeans.sbt
name := "Kmeans Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++=Seq( "org.apache.spark" %% "spark-core" % "2.0.0",
"org.apache.spark" %% "spark-mllib" % "2.0.0")
當時,忘記添加mllib庫,出現報錯:“error object mllib is not a member of package org.apache.spark
”
3、編寫scala源代碼 kmeans_test.scala
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object kmeans_test{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Kmeans Test")
val sc = new SparkContext(conf)
val data=sc.textFile("file:///usr/spark2.0/data/mllib/kmeans_data.txt")
val parsedData=data.map(s=>Vectors.dense(s.split(" ").map(_.toDouble))).cache()
val numClusters=2
val numIterations=20
val clusters=KMeans.train(parsedData,numClusters,numIterations)
val WSSSE=clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors="+WSSSE)
sc.stop()
}
}
4、將scala源碼拷貝至~/kmeans/src/main/scala/目錄下
6、最終工程目錄如下:
find .
.
/kmeans.sbt
/src
/src/main
/src/main/scala
/src/main/scala/kmean_test.scala
5、進入kmeans目錄,執行編譯操作
cd ~/kmeans
sbt complile
4、編譯完成後執行打包
sbt package
5、打包完成後使用spark-submit工具提交任務
spark-submit --class kmeans_test target/scala-2.11/kmeans-project_2.11-1.0.jar
6、結果輸出如下: