- Develop class which is used for connect kafka topics and store data into hdfs.
In spark project:
./examples/src/main/scala/org/apche/spark/examples/streaming/Kafka.scala
package org.apache.spark.examples.streaming
import java.util.Properties
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf
object Kafka {
def main(args:Array[String])
{
if (args.length < 5) {
System.err.println("Usage: Kafka <zkQuorum> <group> <topics> <numThreads> <output>")
System.exit(1)
}
val Array(zkQuorum, group, topics, numThreads,output) = args
val sparkConf = new SparkConf().setAppName("Kafka")
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")
val topicpMap = topics.split(",").map((_,numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap).map(_._2)
lines.print()
lines.saveAsTextFiles(output, "txt")
ssc.start()
ssc.awaitTermination()
}
} - Generate new spark examples jar:
cd examples
mvn -Pyarn -DskipTests clean package - Replace cluster's spark-exapmes-*.jar with upper generated new one
- Start Kafka server, kafka producer:
cd ${KAFKA_HOME}
bin/kafka-server-start.sh config/server.properties
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test - Start spark streaming to connect kafka:
bin/spark-submit --master yarn-cluster --class org.apache.spark.examples.streaming.Kafka /opt/spark/lib/spark-examples-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar localhost:2183 group_kafka test 1 topics
Notice: group_kafka: group id for current spark streaming consumer, could be anything - When yarn application turns into state: RUNNING, type message in spark producer:
this is a testing message - Data is being written into hdfs:
The numbers between topics and .txt is TIME_IN_MS( milliscond).
[hadoop@master root]$ hadoop fs -ls /user/hadoop/
Found 82 items
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:13 /user/hadoop/.sparkStaging
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:13 /user/hadoop/checkpoint
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604268000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604270000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604272000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604274000
[hadoop@master root]$ hadoop fs -cat /user/hadoop/topics-1434604274000/part-00000
this is a testing message