目录
1.在IDEA中新建Project-->Maven-->Next
2.GroupId一般写公司统一名称,ArtifactId写项目名称 -->Next
5.解压apache-maven-3.3.9-bin.zip
6.打开conf中的settings.xml,修改本地仓库路径
7.在IDEA中打开File-->settings-->搜maven-->配置解压的目录和修改的settings文件-->OK-->右下角弹窗选Enable Auto-Import
1.在IDEA中新建Project-->Maven-->Next
2.GroupId一般写公司统一名称,ArtifactId写项目名称 -->Next
3.点击Finish
4.目录结构
.idea 是元信息,是工作目录所在位置,如果想拷贝工程到自己的电脑需要把这个目录删了重新加载
src 是编辑代码的目录
pom.xml 是它的依赖,写所用的jar包
5.解压apache-maven-3.3.9-bin.zip
6.打开conf中的settings.xml,修改本地仓库路径
拷贝第53行并指定本地仓库路径,我指定的是<localRepository>D:\maven\repository</localRepository>,保存。
7.在IDEA中打开File-->settings-->搜maven-->配置解压的目录和修改的settings文件-->OK-->右下角弹窗选Enable Auto-Import
8.maven依赖查询 与添加
9.配置maven的环境变量
10.打开cmd --> mvn -v
11.配置Scala
main-->新建scala文件夹-->File-->Project Structure
-->Modules-->点击main目录scala-->点击Sources
-->Modules-->点击test目录scala-->点击Tests
-->Libraries-->点击+-->Scala SDK-->OK
12.配置pom文件
再后面追加以下配置
<properties>
<spark.version>2.2.0</spark.version>
<scala.version>2.11</scala.version>
<hadoop.version>2.7.3</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.39</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
</build>
13.测试环境是否成功
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable
object RDDQueueStream {
def main(args: Array[String]): Unit = {
System.setProperty("hadoop.home.dir", "D:\\temp\\hadoop-2.4.1\\hadoop-2.4.1")
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val conf = new SparkConf().setAppName("MyNetworkWordCount").setMaster("local[2]")
val ssc = new StreamingContext(conf,Seconds(1))
val rddQueue = new mutable.Queue[RDD[Int]]()
for(i <- 1 to 3){
rddQueue += ssc.sparkContext.makeRDD(i to 10)
Thread.sleep(2000)
}
val inputDStream = ssc.queueStream(rddQueue)
val result = inputDStream.map(x => (x,x*2))
result.print()
ssc.start()
ssc.awaitTermination()
}
}