本文系原創系列,轉載請註明。
原帖地址:http://blog.csdn.net/xeseo
前言
準備工作
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.2-incubating</version>
</dependency>
但是storm似乎沒有直接把external的包加載到classpath,所以使用時,還得手動把該jar包從external/storm-kafka/下拷到storm的lib目錄。使用KafkaSpout
- Kafka集羣中的Broker地址 (IP+Port)
有兩種方法指定:1. 使用靜態地址,即直接給定Kafka集羣中所有Broker信息GlobalPartitionInformation info = new GlobalPartitionInformation(); info.addPartition(0, new Broker("10.1.110.24",9092)); info.addPartition(0, new Broker("10.1.110.21",9092)); BrokerHosts brokerHosts = new StaticHosts(info);
2. 從Zookeeper動態讀取推薦使用這種方法,因爲Kafka的Broker可能會動態的增減BrokerHosts brokerHosts = new ZkHosts("10.1.110.24:2181,10.1.110.22:2181");
- topic名字
- 當前spout的唯一標識Id (以下代稱$spout_id)
- zookeeper上用於存儲當前處理到哪個Offset了 (以下代稱$zk_root)
- 當前topic中數據如何解碼
String topic = "test";
String zkRoot = "kafkastorm";
String spoutId = "myKafka";
SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, topic, zkRoot, spoutId);
spoutConfig.scheme = new SchemeAsMultiScheme(new TestMessageScheme());
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new KafkaSpout(spoutConfig), spoutNum);
其中TestMessageScheme就是告訴KafkaSpout如何去解碼數據,生成Storm內部傳遞數據
public class TestMessageScheme implements Scheme {
private static final Logger LOGGER = LoggerFactory.getLogger(TestMessageScheme.class);
@Override
public List<Object> deserialize(byte[] bytes) {
try {
String msg = new String(bytes, "UTF-8");
return new Values(msg);
} catch (InvalidProtocolBufferException e) {
LOGGER.error("Cannot parse the provided message!");
}
//TODO: what happend if returns null?
return null;
}
@Override
public Fields getOutputFields() {
return new Fields("msg");
}
}
這個解碼方式是與Producer端生成時塞入數據的編碼方式配套的。這裏我Producer端塞入的是String的byte,所以這裏也還原成String,定義輸出爲一個名叫"msg"的field。使用TransactionalTridentKafkaSpout
TridentKafkaConfig kafkaConfig = new TridentKafkaConfig(brokerHosts, topic, spoutId);
kafkaConfig.scheme = new SchemeAsMultiScheme(new TestMessageScheme());
TransactionalTridentKafkaSpout kafkaSpout = new TransactionalTridentKafkaSpout(kafkaConfig);
TridentTopology topology = new TridentTopology();
topology.newStream("test_str", kafkaSpout).shuffle().each(new Fields("msg", new PrintFunction());
常見問題
spoutConfig.zkServers = new ArrayList<String>(){{
add("10.1.110.20");
add("10.1.110.21");
add("10.1.110.24");
}};
spoutConfig.zkPort = 2181;
<del><dependency>
<groupId>net.wurstmeister.storm</groupId>
<artifactId>storm-kafka-0.8-plus</artifactId>
<version>0.2.0</version>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</exclusion>
</dependency></del>