今天來將上次的kafka tridentredis 結合的demo完成,回顧下上篇中的數據流程。好的我們先來完成trident部分。日誌打印部分思路很清晰了,就是時間戳,城市名,病症id 的形式。
topology.newStream("kafkaspout",kafkaspout)
.each(newFields("str"),new Func1(),new Fields("obj"))
.each(newFields("obj"),new Func2(),new Fields("descripe"))
.groupBy(new Fields("descripe"))
.persistentAggregate(new MemoryMapState.Factory(),new Count(),new Fields("count"))
.newValuesStream()
.each(newFields("descripe","count"),newFunc3(),new Fields());
trident處理流程大概如上,kafka訂閱過來內容字段爲str,然後將這個字符串格式化爲對象並作爲obj字段發射,然後對obj字段追加descripe字段這個字段是精確到小時的時間+城市名+病症id組成的,例2017.07.20.10Beijing2。作用是用於分組。然後就是根據這個描述分組,並對每個組進行計數,每個組的數目剛好就代表了這一個小時內某個城市某個病症的發病數,然後繼續往下到Func3,即檢查cout是否超過一個門限值,如果超過認爲病症爆發。
再次明確下功能Func1是字符串序列化爲對象的,Func2是將部分字段整合成一個字符串描述的,Func3是判斷是否超過門限的。
代碼結構如下:
MyEvent是將字符串轉換成的對象:
package Util;
import java.io.Serializable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
/**
* Created by Frank on 2017/7/20.
*/
public class MyEvent implements Serializable{
private Date time;
private City city;
private int disease;
public MyEvent(String[]arr) throws ParseException {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss" );
this.time=sdf.parse(arr[0]);
this.city=City.valueOf(arr[1]);
this.disease= Integer.parseInt(arr[2]);
}
public Date getTime() {
return time;
}
public void setTime(Date time) {
this.time = time;
}
public City getCity() {
return city;
}
public void setCity(City city) {
this.city = city;
}
public int getDisease() {
return disease;
}
public void setDisease(int disease){
this.disease = disease;
}
}
City是個枚舉,這裏我只用了北上廣三個城市:
package Util;
/**
* Created by Frank on 2017/7/20.
*/
public enum City{
Beijing,Shanghai,Guangzhou
}
Func1 字符串轉MyEvent:
package Func;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func1 extends BaseFunction{
public void execute(TridentTuple tuple, TridentCollector collector) {
String log=tuple.getStringByField("str");
System.out.println(log);
String[] arr=log.split(",");
try{
MyEvent myEvent = new MyEvent(arr);
List<Object> list=new ArrayList<>();
list.add(myEvent);
collector.emit(list);
}
catch(Exception e){
System.out.println("logconvert error");//轉換失敗就不繼續發射了
}
}
}
Func2 整合字段爲“描述”字符串:
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func2 extends BaseFunction{
public void execute(TridentTuple tuple, TridentCollector collector) {
MyEvent myEvent=(MyEvent)tuple.getValueByField("obj");
//這裏使用了Date類的相關方法其實已經不推薦了推薦使用Calendar類
String hourstr =(myEvent.getTime().getYear()+1900)+"."
+(myEvent.getTime().getMonth()+1)+"."
+myEvent.getTime().getDate()+"."
+myEvent.getTime().getHours();
String citystr=myEvent.getCity().name();
String descripe=hourstr+citystr+myEvent.getDisease();
List<Object> list=new ArrayList<>();
list.add(descripe);
collector.emit(list);
}
}
Func3 門限預警(先打印,還沒有結合redis)
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func3 extends BaseFunction{
private static final int BREAK_VALUE=10;//設置一小時超過十個病例就算疾病爆發
public void execute(TridentTuple tuple, TridentCollector collector) {
Long count =tuple.getLongByField("count");
Stringdescripe=tuple.getStringByField("descripe");
if(count>BREAK_VALUE){
System.out.println(descripe +" breakout");
}
}
}
Topology 拓撲部分:
package topology;
import Func.Func1;
import Func.Func2;
import Func.Func3;
import com.esotericsoftware.minlog.Log;
import kafka.api.OffsetRequest;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.kafka.*;
import org.apache.storm.kafka.trident.GlobalPartitionInformation;
import org.apache.storm.kafka.trident.OpaqueTridentKafkaSpout;
import org.apache.storm.kafka.trident.TridentKafkaConfig;
import org.apache.storm.spout.MultiScheme;
import org.apache.storm.topology.base.BaseWindowedBolt;
import org.apache.storm.trident.TridentState;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.*;
import org.apache.storm.trident.operation.builtin.Count;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.testing.MemoryMapState;
import org.apache.storm.trident.testing.Split;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.nio.ByteBuffer;
import java.util.List;
import java.util.Map;
/**
* Created by Frank on 2017/7/16.
*/
public class Topology {
public static StormTopologybuildTopology(){
BrokerHosts kafkaHosts = new ZkHosts("kafkaserverip:2181");
TridentKafkaConfig spoutConf=new TridentKafkaConfig(kafkaHosts,"test");
spoutConf.scheme=new StringMultiSchemeWithTopic();
spoutConf.startOffsetTime= OffsetRequest.LatestTime();
//默認是從最早的時間點抓取,改爲從當前
OpaqueTridentKafkaSpoutkafkaspout=new OpaqueTridentKafkaSpout(spoutConf);
TridentTopology topology = new TridentTopology();
topology.newStream("kafkaspout",kafkaspout)
.each(new Fields("str"),new Func1(),new Fields("obj"))
.each(new Fields("obj"),new Func2(),new Fields("descripe"))
.groupBy(new Fields("descripe"))
.persistentAggregate(new MemoryMapState.Factory(),new Count(),new Fields("count"))
.newValuesStream()
.each(new Fields("descripe","count"),newFunc3(),new Fields());
return topology.build();
}
public static void main(String[] args) throws Exception{
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("cdc", conf, buildTopology());
}
}
好,已經迫不及待的想試下能不能運行了,邏輯上好像是沒啥問題。爲了便於測試我用了node-red作爲kafka的生產者,成產日誌,而不是實際的程序日誌。
如下,點擊第一個則發送一個 當前時間年-月-日 時:分:秒,Beijing,1的消息到kafka的test主題,第二個則最後是2,第三個最後是3.
分別點擊數次,同時觀察程序打印情況:
隨機點擊模擬病症診斷結果的隨機性,最後發現在北京的病症1在當前小時超過10次(第11次)的時候打印了breakout日誌。可見程序運行正常。
好的trident部分看來問題不大我們再將最後的breakout日誌發佈到redis服務器上,而不僅僅是打印。
Func3 結合redis發佈channel
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.operation.TridentOperationContext;
import org.apache.storm.trident.tuple.TridentTuple;
import redis.clients.jedis.Jedis;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* Created by Frank on 2017/7/20.
*/
public class Func3 extends BaseFunction{
private staticfinal int BREAK_VALUE=10;
private Jedisjedis;
publicvoid prepare(Map conf, TridentOperationContext context) {
jedis=new Jedis("redisserverip");
jedis.connect();
jedis.auth("password");
}
public void execute(TridentTuple tuple, TridentCollector collector) {
Long count =tuple.getLongByField("count");
String descripe=tuple.getStringByField("descripe");
if(count>BREAK_VALUE){
System.out.println(descripe +" breakout");
jedis.publish("breakout",descripe +" breakout");
}
}
}
node-red中取訂閱這個breakoutchannel,(redis的訂閱發佈叫channel其他一些叫topic其實意思一樣)然後點十一次日誌出來,breakout打印並且redis中也有這個發佈出來。如下
程序就全部完成了。完結撒花~