一. 項目概述
隨着智能手機的普及,在如今的電商網站中已經有越來越多的用戶來自移動端, 相比起傳統瀏覽器的登錄方式 ,手機 APP 成爲了更多用戶訪問電商網站的首選 。對 於電商企業來說 ,一般會通過各種不同的渠道對自己的 APP 進行市場推廣,而這些 渠道的統計數據(比如,不同網站上廣告鏈接的點擊量、 APP 下載量)就成了市場 營銷的重要商業指標。
二.代碼
2.1 pom文件配置
pom文件如下:
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-redis_2.11</artifactId>
<version>1.1.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.19</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<!-- Table API 和 Flink SQL -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>1.10.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-csv</artifactId>
<version>1.10.1</version>
</dependency>
2.2 POJO類
AdClickEvent
private Long userId;
private Long adId;
private String province;
private String city;
private Long timestamp;
AdCountViewByProvince
private String province;
private String windowEnd;
private Long count;
BlackListUserWarning
private Long userId;
private Long adId;
private String warningMsg;
ChannelPromotionCount
private String channel;
private String behavior;
private String windowEnd;
private Long count;
MarketingUserBehavior
private Long userId;
private String behavior;
private String channel;
private Long timestamp;
2.3 自定義測試數據源
定義一個源數據的 POJO 類 MarketingUserBehavior,再 定義一個 SourceFunction, 用於產生用戶行爲源數據,命名爲 SimulatedMarketingBehaviorSource:
// 實現自定義的模擬市場用戶行爲數據源
public static class SimulatedMarketingUserBehaviorSource implements SourceFunction<MarketingUserBehavior>{
// 控制是否正常運行的標示位
Boolean running = true;
// 定義用戶行爲和渠道的範圍
List<String> behaviorList = Arrays.asList("CLICK", "DOWNLOAD", "INSTALL", "UNINSTALL");
List<String> channelList = Arrays.asList("app store", "wechat", "weibo");
Random random = new Random();
@Override
public void run(SourceContext<MarketingUserBehavior> ctx) throws Exception {
while(running){
// 隨機生成所有字段
Long id = random.nextLong();
String behavior = behaviorList.get( random.nextInt(behaviorList.size()) );
String channel = channelList.get( random.nextInt(channelList.size()));
Long timestamp = System.currentTimeMillis();
// 發出數據
ctx.collect(new MarketingUserBehavior(id ,behavior, channel, timestamp));
Thread.sleep(100L);
}
}
@Override
public void cancel() {
running = false;
}
}
2.4 分渠道統計
另外定義一個窗口處理的輸出結果 POJO 類 ChannelPromotionCount,並自定義 預聚合函數 AggregateFunction 和全窗口函數 ProcessWindowFunction 進行處理。
代碼:
package com.zqs.flink.project.market_analysis;
/**
* @author 只是甲
* @date 2021-10-19
* @remark App Marketing By Channel
*/
import com.zqs.flink.project.market_analysis.beans.ChannelPromotionCount;
import com.zqs.flink.project.market_analysis.beans.MarketingUserBehavior;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.sql.Timestamp;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
public class AppMarketingByChannel {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// 1. 從自定義數據源中讀取數據
DataStream<MarketingUserBehavior> dataStream = env.addSource(new SimulatedMarketingUserBehaviorSource() )
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MarketingUserBehavior>() {
@Override
public long extractAscendingTimestamp(MarketingUserBehavior element) {
return element.getTimestamp();
}
});
// 2. 分渠道開窗統計
SingleOutputStreamOperator<ChannelPromotionCount> resultStream = dataStream
.filter(data -> !"UNINSTALL".equals(data.getBehavior()))
.keyBy("channel", "behavior")
.timeWindow(Time.hours(1), Time.seconds(5)) // 定義滑窗
.aggregate(new MarketingCountAgg(), new MarketingCountResult());
resultStream.print();
env.execute("app marketing by channel job");
}
// 實現自定義的模擬市場用戶行爲數據源
public static class SimulatedMarketingUserBehaviorSource implements SourceFunction<MarketingUserBehavior>{
// 控制是否正常運行的標示位
Boolean running = true;
// 定義用戶行爲和渠道的範圍
List<String> behaviorList = Arrays.asList("CLICK", "DOWNLOAD", "INSTALL", "UNINSTALL");
List<String> channelList = Arrays.asList("app store", "wechat", "weibo");
Random random = new Random();
@Override
public void run(SourceContext<MarketingUserBehavior> ctx) throws Exception {
while(running){
// 隨機生成所有字段
Long id = random.nextLong();
String behavior = behaviorList.get( random.nextInt(behaviorList.size()) );
String channel = channelList.get( random.nextInt(channelList.size()));
Long timestamp = System.currentTimeMillis();
// 發出數據
ctx.collect(new MarketingUserBehavior(id ,behavior, channel, timestamp));
Thread.sleep(100L);
}
}
@Override
public void cancel() {
running = false;
}
}
// 實現自定義的增量聚合函數
public static class MarketingCountAgg implements AggregateFunction<MarketingUserBehavior, Long, Long>{
@Override
public Long createAccumulator() {
return 0L;
}
@Override
public Long add(MarketingUserBehavior value, Long accumulator) {
return accumulator + 1;
}
@Override
public Long getResult(Long accumulator) {
return accumulator;
}
@Override
public Long merge(Long a, Long b) {
return a + b;
}
}
// 實現自定義全窗口函數
public static class MarketingCountResult extends ProcessWindowFunction<Long, ChannelPromotionCount, Tuple, TimeWindow>{
@Override
public void process(Tuple tuple, Context context, Iterable<Long> elements, Collector<ChannelPromotionCount> out) throws Exception {
String channel = tuple.getField(0);
String behavior = tuple.getField(1);
String windowEnd = new Timestamp(context.window().getEnd()).toString();
Long count = elements.iterator().next();
out.collect(new ChannelPromotionCount(channel, behavior, windowEnd, count));
}
}
}
測試記錄:
2.5 不分渠道(總量)統計
同樣我們還可以考察不分渠道的市場推廣統計,這樣得到的就是所有渠道推廣 的總量,創建 AppMarketingStatistics 類。
代碼如下:
package com.zqs.flink.project.market_analysis;
import com.zqs.flink.project.market_analysis.beans.ChannelPromotionCount;
import com.zqs.flink.project.market_analysis.beans.MarketingUserBehavior;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.sql.Timestamp;
/**
* @author 只是甲
* @Date 2021-10-19
* @remark App Marketing Statistics
*/
public class AppMarketingStatistics {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// 1. 從自定義數據源中讀取數據
DataStream<MarketingUserBehavior> dataStream = env.addSource(new AppMarketingByChannel.SimulatedMarketingUserBehaviorSource())
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MarketingUserBehavior>() {
@Override
public long extractAscendingTimestamp(MarketingUserBehavior element) {
return element.getTimestamp();
}
});
// 2. 開窗統計總量
SingleOutputStreamOperator<ChannelPromotionCount> resultStream = dataStream
.filter(data -> !"UNINSTALL".equals(data.getBehavior()))
.map(new MapFunction<MarketingUserBehavior, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(MarketingUserBehavior value) throws Exception {
return new Tuple2<>("total", 1L);
}
})
.keyBy(0)
.timeWindow(Time.hours(1), Time.seconds(5))
.aggregate( new MarketingStatisticsAgg(), new MarketingStatisticsResult() );
resultStream.print();
env.execute("app marketing by channel job");
}
public static class MarketingStatisticsAgg implements AggregateFunction<Tuple2<String, Long>, Long, Long>{
@Override
public Long createAccumulator() {
return 0L;
}
@Override
public Long add(Tuple2<String, Long> value, Long accumulator) {
return accumulator + 1;
}
@Override
public Long getResult(Long accumulator) {
return accumulator;
}
@Override
public Long merge(Long a, Long b) {
return a + b;
}
}
public static class MarketingStatisticsResult implements WindowFunction<Long, ChannelPromotionCount, Tuple, TimeWindow>{
@Override
public void apply(Tuple tuple, TimeWindow window, Iterable<Long> input, Collector<ChannelPromotionCount> out) throws Exception {
String windowEnd = new Timestamp( window.getEnd() ).toString();
Long count = input.iterator().next();
out.collect(new ChannelPromotionCount("total", "total", windowEnd, count));
}
}
}
測試記錄:
2.6 黑名單過濾
代碼:
package com.zqs.flink.project.market_analysis;
/**
* @author 只是甲
* @date 2021-10-19
* @remark Ad Statistics By Province
*/
import com.zqs.flink.project.market_analysis.beans.AdClickEvent;
import com.zqs.flink.project.market_analysis.beans.AdCountViewByProvince;
import com.zqs.flink.project.market_analysis.beans.BlackListUserWarning;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;
import sun.awt.SunHints;
import java.net.URL;
import java.sql.Timestamp;
public class AdStatisticsByProvince {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);
// 1. 從文件中讀取數據
URL resource = AdStatisticsByProvince.class.getResource("/AdClickLog.csv");
DataStream<AdClickEvent> adClickEventStream = env.readTextFile(resource.getPath())
.map( line -> {
String[] fields = line.split(",");
return new AdClickEvent(new Long(fields[0]), new Long(fields[1]), fields[2], fields[3], new Long(fields[4]));
})
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<AdClickEvent>() {
@Override
public long extractAscendingTimestamp(AdClickEvent element) {
return element.getTimestamp() * 1000L;
}
});
// 2. 對同一個用戶點擊同一個廣告的行爲進行檢測預警
SingleOutputStreamOperator<AdClickEvent> filterAdClickStream = adClickEventStream
.keyBy("userId", "adId") // 基於用戶id和廣告id做分組
.process(new FilterBlackListUser(100));
// 3. 基於省份分組,開窗聚合
SingleOutputStreamOperator<AdCountViewByProvince> adCountResultStream = filterAdClickStream
.keyBy(AdClickEvent::getProvince)
.timeWindow(Time.hours(1), Time.minutes(5))
.aggregate(new AdCountAgg(), new AdCountResult());
adCountResultStream.print();
filterAdClickStream.getSideOutput(new OutputTag<BlackListUserWarning>("blacklist"){}).print("blacklist-user");
env.execute("ad count by province job");
}
public static class AdCountAgg implements AggregateFunction<AdClickEvent, Long, Long>{
@Override
public Long createAccumulator() {
return 0L;
}
@Override
public Long add(AdClickEvent value, Long accumulator) {
return accumulator + 1;
}
@Override
public Long getResult(Long accumulator) {
return accumulator + 1;
}
@Override
public Long merge(Long a, Long b) {
return a + b;
}
}
public static class AdCountResult implements WindowFunction<Long, AdCountViewByProvince, String, TimeWindow>{
@Override
public void apply(String province, TimeWindow window, Iterable<Long> input, Collector<AdCountViewByProvince> out) throws Exception {
String windowEnd = new Timestamp( window.getEnd()).toString();
Long count = input.iterator().next();
out.collect( new AdCountViewByProvince(province, windowEnd, count));
}
}
// 實現自定義處理函數
public static class FilterBlackListUser extends KeyedProcessFunction<Tuple, AdClickEvent, AdClickEvent>{
// 定義屬性值: 點擊次數上限
private Integer countUpperBound;
public FilterBlackListUser(Integer countUpperBound){
this.countUpperBound = countUpperBound;
}
// 定義狀態, 保存當前用戶對某一廣告的點擊次數
ValueState<Long> countState;
// 定義一個標誌狀態,保存當前用戶是否已經被髮送到黑名單
ValueState<Boolean> isSentState;
@Override
public void open(Configuration parameters) throws Exception {
countState = getRuntimeContext().getState(new ValueStateDescriptor<Long>("ad-count", Long.class, 0L));
isSentState = getRuntimeContext().getState(new ValueStateDescriptor<Boolean>("is-sent", Boolean.class, false));
}
@Override
public void processElement(AdClickEvent value, Context ctx, Collector<AdClickEvent> out) throws Exception {
// 判斷當前用戶對同一廣告的點擊次數,如果不夠上限,就count加1正常輸出;如果達到上限,直接過濾掉,並側輸出流輸出黑名單報警
// 首先獲取當前的count值
Long curCount = countState.value();
// 1. 判斷是否是第一個數據,如果是的話,註冊一個第二天0點的定時器
if (curCount == 0){
Long ts = (ctx.timerService().currentProcessingTime() / (24*60*60*1000) + 1) * (24*60*60*1000) - 8*60*60*1000;
ctx.timerService().registerProcessingTimeTimer(ts);
}
// 2. 判斷是否報警
if (curCount >= countUpperBound){
// 判斷是否輸出到黑名單過,如果沒有的話就輸出到側輸出流
if (!isSentState.value()){
isSentState.update(true); // 更新狀態
ctx.output( new OutputTag<BlackListUserWarning>("blacklist"){},
new BlackListUserWarning(value.getUserId(), value.getAdId(), "click over " + countUpperBound + "times." ));
}
return; // 不再執行下面操作
}
// 如果沒有返回,點擊次數加1,更新狀態,正常輸出當前數據到主流
countState.update(curCount + 1);
out.collect(value);
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<AdClickEvent> out) throws Exception {
// 清空所有狀態
countState.clear();
isSentState.clear();
}
}
}
測試記錄:
參考:
- https://www.bilibili.com/video/BV1qy4y1q728
- https://ashiamd.github.io/docsify-notes/#/study/BigData/Flink/%E5%B0%9A%E7%A1%85%E8%B0%B7Flink%E5%85%A5%E9%97%A8%E5%88%B0%E5%AE%9E%E6%88%98-%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0?id=_1432-%e5%ae%9e%e6%97%b6%e6%b5%81%e9%87%8f%e7%bb%9f%e8%ae%a1%e7%83%ad%e9%97%a8%e9%a1%b5%e9%9d%a2