flink EventTime 事件時間 WaterMark 水印 demo理解附工程源碼

首先感謝此博客,借用了裏面的圖, 因爲我覺得沒有比這個更好的圖了。

博客鏈接:https://blog.csdn.net/a6822342/article/details/78064815

 

英文鏈接:http://vishnuviswanath.com/flink_eventtime.html

正文:

場景

1,我們創建一個大小爲10秒,每5秒滑動一次的滑動窗口。

2,假如在2020-04-28 17:00:00,我們最簡單的實時流程序已經穩定運行並處理事件一段時間。

3,窗口的開窗起始時間規則如下:

 

那麼假設已經根據開窗規則計算出,本例中滑動窗口的時間範圍如下(左閉右開區間):

……

[17:00:00, 17:00:10)

[17:00:05, 17:00:15)

[17:00:10, 17:00:20)

[17:00:15, 17:00:25)

……

4,在17:00:13秒時,數據源產生了一個事件a,在17:00:14秒時,數據源也產生了一個事件a,在17:00:16秒時,數據源又產生了一個事件a。我們只用這三個事件就足以描述所有問題。

指標計算

現在我們要在這個設定的滑動窗口中,對事件a進行計數。

 

最簡單的flink實時流處理

即整個實時流不存在EventTime控制和水印機制。

理想情況

理想情況下,實時流如下所示:

 

現實問題

上文中的“在17:00:13秒時,數據源產生了一個事件a”,該事件在經過網絡傳輸的過程中,發生網絡延遲(延遲6秒到達),實際到達實時流程序時已經是17:00:19,而我們最簡單的實時流程序無法分辨出這是一個延遲達到的事件,那麼的計算結果如下:

 

 

 

demo截圖及源碼

 

 

 

 

 

 1 import org.apache.flink.api.scala._
 2 import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
 3 import org.apache.flink.streaming.api.windowing.time.Time
 4 
 5 object Simplest {
 6 
 7   def main(args: Array[String]): Unit = {
 8 
 9     val senv = StreamExecutionEnvironment.getExecutionEnvironment
10 
11     // 最簡單的流程處理,輸入數據爲“事件名稱,事件產生的實際時間”
12     // a,13 (在第13秒時輸入)
13     // a,16 (在第16秒時輸入)
14     // a,14 (在第19秒時輸入)
15     val text = senv.socketTextStream("hadoop-01", 9021)
16     val counts = text.map (m => (m.split(",")(0), 1) )
17       .keyBy(0)
18       .timeWindow(Time.seconds(10), Time.seconds(5))
19       .sum(1)
20 
21     counts.print
22     senv.execute("EventTime processing example")
23   }
24 }

 

 

 pom.xml見最後。

 

具有EventTime控制的實時流處理

EventTime控制

跟上述最簡單的實時流相比,我們只是啓用了EventTime控制。

對原有代碼改動有兩點:

1,        我們需要手動實現一個時間戳提取器,它可以從事件中提取事件的產生時間。事件的格式爲“a,事件產生時的時間戳”。時間戳提取器中的extractTimestamp方法獲取時間戳。getCurrentWatermark方法可以暫時忽略不計。

2,        對輸入流注冊時間戳提取器

現實問題

經過EventTime控制後,實時流的處理在窗口1中仍然爲1,正確結果應該爲2,窗口2和窗口3的結果計數正確。

如圖所示:

 

 

 

demo截圖及源碼

 

 

 

 

 

 

 

 

 

 1 import org.apache.flink.api.scala._
 2 import org.apache.flink.streaming.api.TimeCharacteristic
 3 import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
 4 import org.apache.flink.streaming.api.windowing.time.Time
 5 
 6 object SimplestWithEventTime {
 7 
 8   def main(args: Array[String]): Unit = {
 9 
10     val senv = StreamExecutionEnvironment.getExecutionEnvironment
11 
12     // 增加了EventTime的實時流
13     // 通過EventTime處理延遲消息, 使前一窗口的延遲消息不再計入後面的窗口中
14     senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
15     val input = senv.socketTextStream("hadoop-01", 9021)
16 
17     val text = input.assignTimestampsAndWatermarks(new TimestampExtractor)
18 
19     val counts = text.map (m => (m.split(",")(0), 1) )
20       .keyBy(0)
21       .timeWindow(Time.seconds(10), Time.seconds(5))
22       .sum(1)
23 
24     counts.print
25     senv.execute("EventTime processing example")
26   }
27 }
 1 import java.text.SimpleDateFormat
 2 import java.util.Date
 3 
 4 import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks
 5 import org.apache.flink.streaming.api.watermark.Watermark
 6 
 7 class TimestampExtractor extends AssignerWithPeriodicWatermarks[String] with Serializable {
 8   override def extractTimestamp(e: String, prevElementTimestamp: Long): Long = {
 9 
10     // 自動獲取當前時間整分鐘的時間戳,yyyy-MM-dd HH:mm:00
11     val baseTimeStringType = new SimpleDateFormat("yyyy-MM-dd HH:mm").format(new Date) + ":00"
12     val baseTimeDateType = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse(baseTimeStringType)
13 
14     // 計算事件的產生時間
15     val baseTimestamp = baseTimeDateType.getTime
16     val offsetMillis = 1000  * e.split(",")(1).toLong
17     val newEventTimestamp = baseTimestamp.toLong + offsetMillis
18 
19     println(s"當前時間整分鐘爲$baseTimeStringType, 事件延遲的毫秒數爲$offsetMillis," +
20       s"事件產生時間爲$newEventTimestamp,當前毫秒數爲"+System.currentTimeMillis)
21 
22     newEventTimestamp
23   }
24 
25   // 當 水印時間 大於等於 窗口的結束時間,開始觸發窗口的計算。
26   // 這裏的水印使用的是系統時間,精確到毫秒
27   // 所以事件時間的基準必須和水印時間一致,也是毫秒級時間戳
28   override def getCurrentWatermark: Watermark = {
29     new Watermark(System.currentTimeMillis - 5000)
30   }
31 }

 

 

 

 

 

 

具有EventTime和水印機制的實時流處理

水印機制

跟上述“只增加EventTime控制的實時流”相比,再增加水印機制。

對上述代碼改動只有一點:

1,        修改時間戳提取器中的getCurrentWatermark方法,將獲取到的當前時間的時間戳減去5秒作爲水印時間。

運行效果

 

 

 

demo截圖及源碼

 

 

 

 

 

 

 

 

 

 

pom.xml

  1 <!--
  2 Licensed to the Apache Software Foundation (ASF) under one
  3 or more contributor license agreements.  See the NOTICE file
  4 distributed with this work for additional information
  5 regarding copyright ownership.  The ASF licenses this file
  6 to you under the Apache License, Version 2.0 (the
  7 "License"); you may not use this file except in compliance
  8 with the License.  You may obtain a copy of the License at
  9 
 10   http://www.apache.org/licenses/LICENSE-2.0
 11 
 12 Unless required by applicable law or agreed to in writing,
 13 software distributed under the License is distributed on an
 14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 15 KIND, either express or implied.  See the License for the
 16 specific language governing permissions and limitations
 17 under the License.
 18 -->
 19 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 20     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 21     <modelVersion>4.0.0</modelVersion>
 22 
 23     <groupId>my-flink-project</groupId>
 24     <artifactId>my-flink-project</artifactId>
 25     <version>0.1</version>
 26     <packaging>jar</packaging>
 27 
 28     <name>Flink Quickstart Job</name>
 29     <url>http://www.myorganization.org</url>
 30 
 31     <properties>
 32         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 33         <flink.version>1.9.1</flink.version>
 34         <java.version>1.8</java.version>
 35         <scala.binary.version>2.11</scala.binary.version>
 36         <maven.compiler.source>${java.version}</maven.compiler.source>
 37         <maven.compiler.target>${java.version}</maven.compiler.target>
 38     </properties>
 39 
 40     <repositories>
 41         <repository>
 42             <id>apache.snapshots</id>
 43             <name>Apache Development Snapshot Repository</name>
 44             <url>https://repository.apache.org/content/repositories/snapshots/</url>
 45             <releases>
 46                 <enabled>false</enabled>
 47             </releases>
 48             <snapshots>
 49                 <enabled>true</enabled>
 50             </snapshots>
 51         </repository>
 52     </repositories>
 53 
 54     <dependencies>
 55         <!-- Apache Flink dependencies -->
 56         <!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
 57         <dependency>
 58             <groupId>org.apache.flink</groupId>
 59             <artifactId>flink-java</artifactId>
 60             <version>${flink.version}</version>
 61             <scope>provided</scope>
 62         </dependency>
 63         <dependency>
 64             <groupId>org.apache.flink</groupId>
 65             <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
 66             <version>${flink.version}</version>
 67         </dependency>
 68         <dependency>
 69             <groupId>org.apache.flink</groupId>
 70             <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
 71             <version>${flink.version}</version>
 72             <scope>provided</scope>
 73         </dependency>
 74         <dependency>
 75             <groupId>org.apache.flink</groupId>
 76             <artifactId>flink-scala_${scala.binary.version}</artifactId>
 77             <version>${flink.version}</version>
 78         </dependency>
 79 
 80         <!-- Add connector dependencies here. They must be in the default scope (compile). -->
 81 
 82         <!-- Example:
 83 
 84         <dependency>
 85             <groupId>org.apache.flink</groupId>
 86             <artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
 87             <version>${flink.version}</version>
 88         </dependency>
 89         -->
 90 
 91         <!-- Add logging framework, to produce console output when running in the IDE. -->
 92         <!-- These dependencies are excluded from the application JAR by default. -->
 93         <dependency>
 94             <groupId>org.slf4j</groupId>
 95             <artifactId>slf4j-log4j12</artifactId>
 96             <version>1.7.7</version>
 97             <scope>runtime</scope>
 98         </dependency>
 99         <dependency>
100             <groupId>log4j</groupId>
101             <artifactId>log4j</artifactId>
102             <version>1.2.17</version>
103             <scope>runtime</scope>
104         </dependency>
105     </dependencies>
106 
107     <build>
108         <plugins>
109 
110             <!-- Java Compiler -->
111             <plugin>
112                 <groupId>org.apache.maven.plugins</groupId>
113                 <artifactId>maven-compiler-plugin</artifactId>
114                 <version>3.1</version>
115                 <configuration>
116                     <source>${java.version}</source>
117                     <target>${java.version}</target>
118                 </configuration>
119             </plugin>
120 
121             <!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
122             <!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
123             <plugin>
124                 <groupId>org.apache.maven.plugins</groupId>
125                 <artifactId>maven-shade-plugin</artifactId>
126                 <version>3.0.0</version>
127                 <executions>
128                     <!-- Run shade goal on package phase -->
129                     <execution>
130                         <phase>package</phase>
131                         <goals>
132                             <goal>shade</goal>
133                         </goals>
134                         <configuration>
135                             <artifactSet>
136                                 <excludes>
137                                     <exclude>org.apache.flink:force-shading</exclude>
138                                     <exclude>com.google.code.findbugs:jsr305</exclude>
139                                     <exclude>org.slf4j:*</exclude>
140                                     <exclude>log4j:*</exclude>
141                                 </excludes>
142                             </artifactSet>
143                             <filters>
144                                 <filter>
145                                     <!-- Do not copy the signatures in the META-INF folder.
146                                     Otherwise, this might cause SecurityExceptions when using the JAR. -->
147                                     <artifact>*:*</artifact>
148                                     <excludes>
149                                         <exclude>META-INF/*.SF</exclude>
150                                         <exclude>META-INF/*.DSA</exclude>
151                                         <exclude>META-INF/*.RSA</exclude>
152                                     </excludes>
153                                 </filter>
154                             </filters>
155                             <transformers>
156                                 <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
157                                     <mainClass>myflink.StreamingJob</mainClass>
158                                 </transformer>
159                             </transformers>
160                         </configuration>
161                     </execution>
162                 </executions>
163             </plugin>
164         </plugins>
165 
166         <pluginManagement>
167             <plugins>
168 
169                 <!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
170                 <plugin>
171                     <groupId>org.eclipse.m2e</groupId>
172                     <artifactId>lifecycle-mapping</artifactId>
173                     <version>1.0.0</version>
174                     <configuration>
175                         <lifecycleMappingMetadata>
176                             <pluginExecutions>
177                                 <pluginExecution>
178                                     <pluginExecutionFilter>
179                                         <groupId>org.apache.maven.plugins</groupId>
180                                         <artifactId>maven-shade-plugin</artifactId>
181                                         <versionRange>[3.0.0,)</versionRange>
182                                         <goals>
183                                             <goal>shade</goal>
184                                         </goals>
185                                     </pluginExecutionFilter>
186                                     <action>
187                                         <ignore/>
188                                     </action>
189                                 </pluginExecution>
190                                 <pluginExecution>
191                                     <pluginExecutionFilter>
192                                         <groupId>org.apache.maven.plugins</groupId>
193                                         <artifactId>maven-compiler-plugin</artifactId>
194                                         <versionRange>[3.1,)</versionRange>
195                                         <goals>
196                                             <goal>testCompile</goal>
197                                             <goal>compile</goal>
198                                         </goals>
199                                     </pluginExecutionFilter>
200                                     <action>
201                                         <ignore/>
202                                     </action>
203                                 </pluginExecution>
204                             </pluginExecutions>
205                         </lifecycleMappingMetadata>
206                     </configuration>
207                 </plugin>
208             </plugins>
209         </pluginManagement>
210     </build>
211 
212     <!-- This profile helps to make things run out of the box in IntelliJ -->
213     <!-- Its adds Flink's core classes to the runtime class path. -->
214     <!-- Otherwise they are missing in IntelliJ, because the dependency is 'provided' -->
215     <profiles>
216         <profile>
217             <id>add-dependencies-for-IDEA</id>
218 
219             <activation>
220                 <property>
221                     <name>idea.version</name>
222                 </property>
223             </activation>
224 
225             <dependencies>
226                 <dependency>
227                     <groupId>org.apache.flink</groupId>
228                     <artifactId>flink-java</artifactId>
229                     <version>${flink.version}</version>
230                     <scope>compile</scope>
231                 </dependency>
232                 <dependency>
233                     <groupId>org.apache.flink</groupId>
234                     <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
235                     <version>${flink.version}</version>
236                     <scope>compile</scope>
237                 </dependency>
238             </dependencies>
239         </profile>
240     </profiles>
241 
242 </project>

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章