Flink源碼剖析：Flink Async I/O的三種模式

文章目錄

1. 維表join

流計算系統中經常需要與外部系統進行交互，比如需要查詢外部數據庫以關聯上用戶的額外信息。通常，我們的實現方式是向數據庫發送用戶a的查詢請求，然後等待結果返回，在這之前，我們無法發送用戶b的查詢請求。這是一種同步訪問的模式，如下圖左邊所示。

圖中棕色的長條表示等待時間，可以發現網絡等待時間極大地阻礙了吞吐和延遲。爲了解決同步訪問的問題，異步模式可以併發地處理多個請求和回覆。也就是說，你可以連續地向數據庫發送用戶a、b、c等的請求，與此同時，哪個請求先返回了就處理哪個請求，從而連續的請求之間不需要阻塞等待，如上圖右邊所示。這也正是 Async I/O 的實現原理。

2. richmapfunction

利用richmapfunction進行維表關聯，就是典型的sync I/O的關聯方式。兩次請求之間阻塞進行。不適合併發量高的情形。

2.1 示例

  public static final class MapWithSiteInfoFunc
    extends RichMapFunction<String, String> {
    private static final Logger LOGGER = LoggerFactory.getLogger(MapWithSiteInfoFunc.class);
    private static final long serialVersionUID = 1L;
    private transient ScheduledExecutorService dbScheduler;
    // 引入緩存，減小請求次數
    private Map<Integer, SiteAndCityInfo> siteInfoCache;

    @Override
    public void open(Configuration parameters) throws Exception {
      super.open(parameters);
      siteInfoCache = new HashMap<>(1024);
			// 利用定時線程，實現維度數據的週期性更新
      dbScheduler = new ScheduledThreadPoolExecutor(1, r -> {
        Thread thread = new Thread(r, "site-info-update-thread");
        thread.setUncaughtExceptionHandler((t, e) -> {
          LOGGER.error("Thread " + t + " got uncaught exception: " + e);
        });
        return thread;
      });

      dbScheduler.scheduleWithFixedDelay(() -> {
        try {
          QueryRunner queryRunner = new QueryRunner(JdbcUtil.getDataSource());
          List<Map<String, Object>> info = queryRunner.query(SITE_INFO_QUERY_SQL, new MapListHandler());

          for (Map<String, Object> item : info) {
            siteInfoCache.put((int) item.get("site_id"), new SiteAndCityInfo(
              (int) item.get("site_id"),
              (String) item.getOrDefault("site_name", ""),
              (long) item.get("city_id"),
              (String) item.getOrDefault("city_name", "")
            ));
          }

          LOGGER.info("Fetched {} site info records, {} records in cache", info.size(), siteInfoCache.size());
        } catch (Exception e) {
          LOGGER.error("Exception occurred when querying: " + e);
        }
      }, 0, 10 * 60, TimeUnit.SECONDS);
    }

    @Override
    public String map(String value) throws Exception {
      JSONObject json = JSON.parseObject(value);
      int siteId = json.getInteger("site_id");
     
      String siteName = "", cityName = "";
      SiteAndCityInfo info = siteInfoCache.getOrDefault(siteId, null);
      if (info != null) {
        siteName = info.getSiteName();
        cityName = info.getCityName();
      }

      json.put("site_name", siteName);
      json.put("city_name", cityName);
      return json.toJSONString();
    }

    @Override
    public void close() throws Exception {
      // 清空緩存，關閉連接
      siteInfoCache.clear();
      ExecutorUtils.gracefulShutdown(10, TimeUnit.SECONDS, dbScheduler);
      JdbcUtil.close();

      super.close();
    }

    private static final String SITE_INFO_QUERY_SQL = "...";
  }

3. asyncio

Flink 1.2中引入了Async IO(異步IO)來加快flink與外部系統的交互性能，提升吞吐量。其設計的核心是對原有的每條處理後的消息發送至下游operator的執行流程進行改進。其核心實現包括生產和消費兩部分，生產端引入了一個AsyncWaitOperator,在其processElement/processWatermark方法中完成對消息的維表關聯，隨即將未處理完的Futrue對象存入隊列中；消費端引入一個Emitter線程，不斷從隊列中消費數據併發往下游算子。

3.1 示例

簡單的來說，使用 Async I/O 對應到 Flink 的 API 就是 RichAsyncFunction 這個抽象類，繼承這個抽象類實現裏面的open（初始化），asyncInvoke（數據異步調用），close（停止的一些操作）方法，最主要的是實現asyncInvoke 裏面的方法。有如下示例，Kafka作爲流表，存儲用戶瀏覽記錄，Elasticsearch作爲維表，存儲用戶年齡信息，利用async I/O對瀏覽記錄進行加寬。

流表: 用戶行爲日誌。某個用戶在某個時刻點擊或瀏覽了某個商品。自己造的測試數據，數據樣例如下:

{"userID": "user_1", "eventTime": "2016-06-06 07:03:42", "eventType": "browse", "productID": 2}

維表: 用戶基礎信息。自己造的測試數據，數據存儲在ES上，數據樣例如下:

GET dim_user/dim_user/user

{
  "_index": "dim_user",
  "_type": "dim_user",
  "_id": "user_1",
  "_version": 1,
  "found": true,
  "_source": {
    "age": 22
  }
}

實現邏輯：

public class FlinkAsyncIO {
    public static void main(String[] args) throws Exception{

        String kafkaBootstrapServers = "localhost:9092";
        String kafkaGroupID = "async-test";
        String kafkaAutoOffsetReset= "latest";
        String kafkaTopic = "asyncio";
        int kafkaParallelism =2;

        String esHost= "localhost";
        Integer esPort= 9200;
        String esUser = "";
        String esPassword = "";
        String esIndex = "dim_user";
        String esType = "dim_user";

        /**Flink DataStream 運行環境*/
        Configuration config = new Configuration();
        config.setInteger(RestOptions.PORT,8081);
        config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);

        /**添加數據源*/
        Properties kafkaProperties = new Properties();
        kafkaProperties.put("bootstrap.servers",kafkaBootstrapServers);
        kafkaProperties.put("group.id",kafkaGroupID);
        kafkaProperties.put("auto.offset.reset",kafkaAutoOffsetReset);
        FlinkKafkaConsumer010<String> kafkaConsumer = new FlinkKafkaConsumer010<>(kafkaTopic, new SimpleStringSchema(), kafkaProperties);
        kafkaConsumer.setCommitOffsetsOnCheckpoints(true);
        SingleOutputStreamOperator<String> source = env.addSource(kafkaConsumer).name("KafkaSource").setParallelism(kafkaParallelism);

        //數據轉換
        SingleOutputStreamOperator<Tuple4<String, String, String, Integer>> sourceMap = source.map((MapFunction<String, Tuple4<String, String, String, Integer>>) value -> {
            Tuple4<String, String, String, Integer> output = new Tuple4<>();
            try {
                JSONObject obj = JSON.parseObject(value);
                output.f0 = obj.getString("userID");
                output.f1 = obj.getString("eventTime");
                output.f2 = obj.getString("eventType");
                output.f3 = obj.getInteger("productID");
            } catch (Exception e) {
                e.printStackTrace();
            }
            return output;
        }).returns(new TypeHint<Tuple4<String, String, String, Integer>>(){}).name("Map: ExtractTransform");

        //過濾掉異常數據
        SingleOutputStreamOperator<Tuple4<String, String, String, Integer>> sourceFilter = sourceMap.filter((FilterFunction<Tuple4<String, String, String, Integer>>) value -> value.f3 != null).name("Filter: FilterExceptionData");

        //Timeout: 超時時間 默認異步I/O請求超時時，會引發異常並重啓或停止作業。 如果要處理超時，可以重寫AsyncFunction#timeout方法。
        //Capacity: 併發請求數量
        /**Async IO實現流表與維表Join*/
        SingleOutputStreamOperator<Tuple5<String, String, String, Integer, Integer>> result = AsyncDataStream.unorderedWait(sourceFilter, new ElasticsearchAsyncFunction(esHost,esPort,esUser,esPassword,esIndex,esType), 500, TimeUnit.MILLISECONDS, 10).name("Join: JoinWithDim");

        /**結果輸出*/
        result.print().name("PrintToConsole");
        env.execute();
    }
}

ElasticsearchAsyncFunction：

public class ElasticsearchAsyncFunction extends RichAsyncFunction<Tuple4<String, String, String, Integer>, Tuple5<String, String, String, Integer, Integer>> {
    private String host;
    private Integer port;
    private String user;
    private String password;
    private String index;
    private String type;

    public ElasticsearchAsyncFunction(String host, Integer port, String user, String password, String index, String type) {
        this.host = host;
        this.port = port;
        this.user = user;
        this.password = password;
        this.index = index;
        this.type = type;
    }

    private RestHighLevelClient restHighLevelClient;
    private Cache<String, Integer> cache;
    /**
     * 和ES建立連接
     *
     * @param parameters
     */
    @Override
    public void open(Configuration parameters) {

        //ES Client
        CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(user, password));
        restHighLevelClient = new RestHighLevelClient(
                RestClient
                        .builder(new HttpHost(host, port))
                        .setHttpClientConfigCallback(httpAsyncClientBuilder -> HttpAsyncClientBuilder.create()));

        //初始化緩存
        cache = CacheBuilder.newBuilder().maximumSize(2).expireAfterAccess(5, TimeUnit.MINUTES).build();
    }

    /**
     * 關閉連接
     *
     * @throws Exception
     */
    @Override
    public void close() throws Exception {
        restHighLevelClient.close();
    }
    /**
     * 異步調用
     *
     * @param input
     * @param resultFuture
     */
    @Override
    public void asyncInvoke(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {

        // 1、先從緩存中取
        Integer cachedValue = cache.getIfPresent(input.f0);
        if (cachedValue != null) {
            System.out.println("從緩存中獲取到維度數據: key=" + input.f0 + ",value=" + cachedValue);
            resultFuture.complete(Collections.singleton(new Tuple5<>(input.f0, input.f1, input.f2, input.f3, cachedValue)));

            // 2、緩存中沒有,則從外部存儲獲取
        } else {
            searchFromES(input, resultFuture);
        }
    }
    /**
     * 當緩存中沒有數據時，從外部存儲ES中獲取
     *
     * @param input
     * @param resultFuture
     */
    private void searchFromES(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {

        // 1、構造輸出對象
        Tuple5<String, String, String, Integer, Integer> output = new Tuple5<>();
        output.f0 = input.f0;
        output.f1 = input.f1;
        output.f2 = input.f2;
        output.f3 = input.f3;

        // 2、待查詢的Key
        String dimKey = input.f0;

        // 3、構造Ids Query
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices(index);
        searchRequest.types(type);
        searchRequest.source(SearchSourceBuilder.searchSource().query(QueryBuilders.idsQuery().addIds(dimKey)));

        RequestOptions requestOptions = RequestOptions.DEFAULT;
        // 4、用異步客戶端查詢數據
        restHighLevelClient.searchAsync(searchRequest, RequestOptions.DEFAULT, new ActionListener<SearchResponse>() {

            //成功響應時處理
            @Override
            public void onResponse(SearchResponse searchResponse) {
                SearchHit[] searchHits = searchResponse.getHits().getHits();
                if (searchHits.length > 0) {
                    JSONObject obj = JSON.parseObject(searchHits[0].getSourceAsString());
                    Integer dimValue = obj.getInteger("age");
                    output.f4 = dimValue;
                    cache.put(dimKey, dimValue);
                    System.out.println("將維度數據放入緩存: key=" + dimKey + ",value=" + dimValue);
                }
                resultFuture.complete(Collections.singleton(output));
            }

            //響應失敗時處理
            @Override
            public void onFailure(Exception e) {
                output.f4 = null;
                resultFuture.complete(Collections.singleton(output));
            }
        });

    }

    //超時時處理
    @Override
    public void timeout(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {
        searchFromES(input, resultFuture);
    }
}

3.2 Ordered模式

Flink Async I/O又可以細分爲三種，一種是有序的Ordered模式，一種是ProcessingTime 無序模式，一種是EventTime 無序。

主要區別是往下游output的順序，有序模式會按接收的順序繼續往下游output發送，無序模式就是誰先處理完誰就先往下游發送。下圖是ordered模式的原理圖。

無論有序無需，都採用了Futrue/Promise設計模式，大體都遵循以下設計邏輯：

生產端：將每條消息封裝成一個StreamRecordQueueEntry(內部維護一個Future對象)，放入StreamElementQueue中
生產端：消息與外部系統交互的邏輯放入AsynInvoke方法中，將交互執行結果放入StreamRecordQueueEntry中
消費端：啓動一個emitter線程，從StreamElementQueue中讀取已經完成的StreamRecordQueueEntry，將其結果發送至下游operator算子

下面我們分別就生產端和消費端對ordered模式進行源碼分析

3.2.1 生產

AsyncWaitOperator

@Internal
public class AsyncWaitOperator<IN, OUT>
		extends AbstractUdfStreamOperator<OUT, AsyncFunction<IN, OUT>>
		implements OneInputStreamOperator<IN, OUT>, OperatorActions, BoundedOneInput {

	@Override
	public void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) {
		super.setup(containingTask, config, output);

		this.checkpointingLock = getContainingTask().getCheckpointLock();

		this.inStreamElementSerializer = new StreamElementSerializer<>(
			getOperatorConfig().<IN>getTypeSerializerIn1(getUserCodeClassloader()));

		// create the operators executor for the complete operations of the queue entries
		this.executor = Executors.newSingleThreadExecutor();
		// 根據項目中使用AsyncDataStream.unorderedWait還是AsyncDataStream.orderedWait方法，進行有序和無需兩種模式的區分，初始化不同的隊列
		switch (outputMode) {
			case ORDERED:
				queue = new OrderedStreamElementQueue(
					capacity,
					executor,
					this);
				break;
			case UNORDERED:
				queue = new UnorderedStreamElementQueue(
					capacity,
					executor,
					this);
				break;
			default:
				throw new IllegalStateException("Unknown async mode: " + outputMode + '.');
		}
	}

	@Override
	public void open() throws Exception {
		super.open();

		// 啓動emitter線程
		this.emitter = new Emitter<>(checkpointingLock, output, queue, this);
		this.emitterThread = new Thread(emitter, "AsyncIO-Emitter-Thread (" + getOperatorName() + ')');
		emitterThread.setDaemon(true);
		emitterThread.start();

		// process stream elements from state, since the Emit thread will start as soon as all
		// elements from previous state are in the StreamElementQueue, we have to make sure that the
		// order to open all operators in the operator chain proceeds from the tail operator to the
		// head operator.
		if (recoveredStreamElements != null) {
			for (StreamElement element : recoveredStreamElements.get()) {
				if (element.isRecord()) {
					processElement(element.<IN>asRecord());
				}
				else if (element.isWatermark()) {
					processWatermark(element.asWatermark());
				}
				else if (element.isLatencyMarker()) {
					processLatencyMarker(element.asLatencyMarker());
				}
				else {
					throw new IllegalStateException("Unknown record type " + element.getClass() +
						" encountered while opening the operator.");
				}
			}
			recoveredStreamElements = null;
		}

	}

  // 算子中的processElement方法，都會逐個處理每一條到來的數據
	@Override
	public void processElement(StreamRecord<IN> element) throws Exception {
    // 將數據包裝成StreamRecordBufferEntry對象
		final StreamRecordQueueEntry<OUT> streamRecordBufferEntry = new StreamRecordQueueEntry<>(element);

		addAsyncBufferEntry(streamRecordBufferEntry);
		// 調用AsyncFunction接口的用戶自定義實現類ElasticsearchAsyncFunction中的asyncInvoke方法，該用戶實現方法中，將返回結果通過異步回調的方式，返回給StreamRecordBufferEntry對象中的Future對象
		userFunction.asyncInvoke(element.getValue(), streamRecordBufferEntry);
	}

	private <T> void addAsyncBufferEntry(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		assert(Thread.holdsLock(checkpointingLock));

		pendingStreamElementQueueEntry = streamElementQueueEntry;
		// 嘗試將StreamRecordQueueEntry對象加入到隊列
		while (!queue.tryPut(streamElementQueueEntry)) {
			// we wait for the emitter to notify us if the queue has space left again
			checkpointingLock.wait();
		}
		pendingStreamElementQueueEntry = null;
	}
}

OrderedStreamElementQueue

@Internal
public class OrderedStreamElementQueue implements StreamElementQueue {
  // 往OrderedStreamElementQueue隊列中插入StreamRecordBufferEntry對象
	@Override
	public <T> boolean tryPut(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // capacity用於控制併發請求數量，即OrderedStreamElementQueue隊列中的StreamRecordBufferEntry對象的個數
			if (queue.size() < capacity) {
				addEntry(streamElementQueueEntry);

				LOG.debug("Put element into ordered stream element queue. New filling degree " +
					"({}/{}).", queue.size(), capacity);

				return true;
			} else {
        // 如果一直插入失敗，則AsyncWaitOperator#addAsyncBufferEntry方法會無限嘗試插入，極致情況下，會觸發Flink自身的反壓機制，用戶不用做任何特殊處理
				LOG.debug("Failed to put element into ordered stream element queue because it " +
					"was full ({}/{}).", queue.size(), capacity);
				return false;
			}
		} finally {
			lock.unlock();
		}
	}
  
  	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());
		// 將StreamRecordBufferEntry對象插入隊尾
		queue.addLast(streamElementQueueEntry);

    // StreamRecordBufferEntry對象中的Futrue對象一旦返回結果，則進行以下調用
		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// we got interrupted. This indicates a shutdown of the executor
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);
	}
  
  	private void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // 隊列不爲空，且隊首StreamRecordBufferEntry對象中的Future對象已收到返回值，則通過Condition對象喚醒emmiter線程，使其能夠取出隊首元素
			if (!queue.isEmpty() && queue.peek().isDone()) {
				LOG.debug("Signal ordered stream element queue has completed head element.");
				headIsCompleted.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}
  
  	@Override
	public AsyncResult peekBlockingly() throws InterruptedException {
		lock.lockInterruptibly();

		try {
			// emmiter線程在從隊列中取StreamRecordBufferEntry對象時，如果隊列爲空 or 隊首future未完成，則emmiter線程會一直阻塞
			while (queue.isEmpty() || !queue.peek().isDone()) {
        // Condition阻塞
				headIsCompleted.await();
			}

			LOG.debug("Peeked head element from ordered stream element queue with filling degree " +
				"({}/{}).", queue.size(), capacity);

			return queue.peek();
		} finally {
			lock.unlock();
		}
	}
}

3.2.2 消費

Emmiter

@Internal
public class Emitter<OUT> implements Runnable {
  @Override
	public void run() {
		try {
      // 不斷嘗試讀取隊首元素，在OrderedStreamElementQueue#peekBlockingly中可以看到，如果隊首元素中的Future對象還沒有返回數據，Emitter線程會一直阻塞
			while (running) {
				LOG.debug("Wait for next completed async stream element result.");
				AsyncResult streamElementEntry = streamElementQueue.peekBlockingly();
				// 將數據發往下游算子
				output(streamElementEntry);
			}
		} catch (InterruptedException e) {
			if (running) {
				operatorActions.failOperator(e);
			} else {
				// Thread got interrupted which means that it should shut down
				LOG.debug("Emitter thread got interrupted, shutting down.");
			}
		} catch (Throwable t) {
			operatorActions.failOperator(new Exception("AsyncWaitOperator's emitter caught an " +
				"unexpected throwable.", t));
		}
	}
  
  	private void output(AsyncResult asyncResult) throws InterruptedException {
		if (asyncResult.isWatermark()) {
			synchronized (checkpointLock) {
				AsyncWatermarkResult asyncWatermarkResult = asyncResult.asWatermark();

				LOG.debug("Output async watermark.");
				// 如果是watermark，直接發往下游算子
				output.emitWatermark(asyncWatermarkResult.getWatermark());

				// 移除隊首StreamRecordBufferEntry對象，注意peek和poll的區別
				streamElementQueue.poll();

				// notify the main thread that there is again space left in the async collector
				// buffer
				checkpointLock.notifyAll();
			}
		} else {
			AsyncCollectionResult<OUT> streamRecordResult = asyncResult.asResultCollection();

			if (streamRecordResult.hasTimestamp()) {
				timestampedCollector.setAbsoluteTimestamp(streamRecordResult.getTimestamp());
			} else {
				timestampedCollector.eraseTimestamp();
			}

			synchronized (checkpointLock) {
				LOG.debug("Output async stream element collection result.");

				try {
					// 取出StreamRecordBufferEntry對象中的Future對象中的join後的數據
					Collection<OUT> resultCollection = streamRecordResult.get();

					// 將數據發往下游算子
					if (resultCollection != null) {
						for (OUT result : resultCollection) {
							timestampedCollector.collect(result);
						}
					}
				} catch (Exception e) {
					operatorActions.failOperator(
						new Exception("An async function call terminated with an exception. " +
							"Failing the AsyncWaitOperator.", e));
				}

				// 移除隊首StreamRecordBufferEntry對象，注意peek和poll的區別
				streamElementQueue.poll();

				// notify the main thread that there is again space left in the async collector
				// buffer
				checkpointLock.notifyAll();
			}
		}
	}
}

3.3 基於processtime的unordered模式

區別於ordered模式，無序模式下的StreamRecordBufferEntry對象外層又被封裝了一層Set層，主要是爲了應對watermark的存在，詳情見下節。基於processtime的unordered模式下，雖然沒有watermark，但是也跟基於eventTime的unordered模式共用了同一套邏輯，因此也多了一層Set層。

該模式下，不存在watermark類型的消息，因此所有消息的StreamRecordBufferEntry對象都是放入lastSet（此模式下，lastSet和firstSet引用相同的對象）,在消息的onCompleteHandler方法中，直接將該消息的StreamRecordBufferEntry對象從lastSet中取出再放入completeQueue中，通過emitter線程發送至下游operator，因此該場景下實現的是完全無序的處理模式。

雲邪在其博客《Flink 原理與實現：Aysnc I/O》中提到的基於processtime的unordered模式的架構圖，是針對Flink 1.3進行分析的，已經不再適用於Flink1.9，Flink1.9中該模式已經不需要用到uncompletedQueue，架構圖如下：

另，雲邪博客中的asyncCollector等數據結構在Flink1.9中也不復存在，本文針對Flink1.9進行分析。

3.3.1 生產

UnorderedStreamElementQueue

@Internal
public class UnorderedStreamElementQueue implements StreamElementQueue {
  	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());

		if (streamElementQueueEntry.isWatermark()) {
			lastSet = new HashSet<>(capacity);
			
			if (firstSet.isEmpty()) {
				firstSet.add(streamElementQueueEntry);
			} else {
				Set<StreamElementQueueEntry<?>> watermarkSet = new HashSet<>(1);
				watermarkSet.add(streamElementQueueEntry);
				uncompletedQueue.offer(watermarkSet);
			}
			uncompletedQueue.offer(lastSet);
		} else {
      // 基於processtime的unordered模式只會走這裏，且lastSet和firstSet引用同一個對象
			lastSet.add(streamElementQueueEntry);
		}

		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// The accept executor thread got interrupted. This is probably cause by
					// the shutdown of the executor.
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);

		numberEntries++;
	}
  
   // StreamRecordBufferEntry對象中的Future對象返回結果時進行回調
  	public void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();
      
		try {
      // 將StreamRecordBufferEntry對象插入completedQueue隊列
      // 此處將StreamRecordBufferEntry對象插入lastSet（等同於firstSet），又從其中取出，確實是比較多餘的。這樣做只是因爲跟”基於eventTime的unordered模式”共用了一套代碼
			if (firstSet.remove(streamElementQueueEntry)) {
        // 將StreamRecordBufferEntry對象加入completedQueue
				completedQueue.offer(streamElementQueueEntry);
				// 該模式下不會走下面的代碼
				while (firstSet.isEmpty() && firstSet != lastSet) {
					firstSet = uncompletedQueue.poll();

					Iterator<StreamElementQueueEntry<?>> it = firstSet.iterator();

					while (it.hasNext()) {
						StreamElementQueueEntry<?> bufferEntry = it.next();

						if (bufferEntry.isDone()) {
							completedQueue.offer(bufferEntry);
							it.remove();
						}
					}
				}

				LOG.debug("Signal unordered stream element queue has completed entries.");
				hasCompletedEntries.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}
  
  	@Override
	public AsyncResult peekBlockingly() throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // emitter線程從completedQueue取出StreamRecordBufferEntry對象，相比ordered模式，這裏不需要判斷隊首StreamRecordBufferEntry對象中的Future對象是否已經返回，因爲只有Futrue已返回的StreamRecordBufferEntry對象才能被插入到completedQueue隊列
			while (completedQueue.isEmpty()) {
				hasCompletedEntries.await();
			}

			LOG.debug("Peeked head element from unordered stream element queue with filling degree " +
				"({}/{}).", numberEntries, capacity);

			return completedQueue.peek();
		} finally {
			lock.unlock();
		}
	}
  
}

3.3.2 消費

Emitter線程消費邏輯同ordered模式

3.4 基於eventTime的unordered模式

該模式下雖然一段時間內的消息之間是無序的，但是由於引入了watermark，watermark1和watermark2之間的數據必須還是原來那批數據，雖然數據之間是可以是亂序的。即Set集合內部的數據，發往下游時可以亂序，但是watermark1—set—watermark2這個順序不可以被打破。

如果watermark和數據集set直接的順序被打亂，那麼當watermark2觸發窗口計算時，窗口裏面的數據可能會變多或變少，影響計算的正確性。

3.4.1 生產

AsyncWaitOperator

@Internal
public class AsyncWaitOperator<IN, OUT>
		extends AbstractUdfStreamOperator<OUT, AsyncFunction<IN, OUT>>
		implements OneInputStreamOperator<IN, OUT>, OperatorActions, BoundedOneInput {
  	@Override
	public void processWatermark(Watermark mark) throws Exception {
		WatermarkQueueEntry watermarkBufferEntry = new WatermarkQueueEntry(mark);
    // 處理watermark
		addAsyncBufferEntry(watermarkBufferEntry);
	}
  
  	@Override
	public void processElement(StreamRecord<IN> element) throws Exception {
		final StreamRecordQueueEntry<OUT> streamRecordBufferEntry = new StreamRecordQueueEntry<>(element);
    // 處理StreamRecordBufferEntry對象
		addAsyncBufferEntry(streamRecordBufferEntry);
		userFunction.asyncInvoke(element.getValue(), streamRecordBufferEntry);
	}
  
  	private <T> void addAsyncBufferEntry(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		assert(Thread.holdsLock(checkpointingLock));

		pendingStreamElementQueueEntry = streamElementQueueEntry;
		// 嘗試將StreamRecordBufferEntry對象 or watermarkBufferEntry對象插入隊列
		while (!queue.tryPut(streamElementQueueEntry)) {
			// we wait for the emitter to notify us if the queue has space left again
			checkpointingLock.wait();
		}
		pendingStreamElementQueueEntry = null;
	}
}

UnorderedStreamElementQueue

@Internal
public class UnorderedStreamElementQueue implements StreamElementQueue {
	@Override
	public <T> boolean tryPut(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();
		try {
			if (numberEntries < capacity) {
				addEntry(streamElementQueueEntry);

				LOG.debug("Put element into unordered stream element queue. New filling degree " +
					"({}/{}).", numberEntries, capacity);

				return true;
			} else {
				LOG.debug("Failed to put element into unordered stream element queue because it " +
					"was full ({}/{}).", numberEntries, capacity);

				return false;
			}
		} finally {
			lock.unlock();
		}
	}

	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());

		if (streamElementQueueEntry.isWatermark()) {
      // 遇到watermark，將lastSet置空，方便塞入下一批StreamRecordBufferEntry對象
      // 注：firstSet可以存watermarkBufferEntry對象，也可以存StreamRecordBufferEntry對象；但是
      // lastSet只會存StreamRecordBufferEntry對象
			lastSet = new HashSet<>(capacity); 

			if (firstSet.isEmpty()) {
				firstSet.add(streamElementQueueEntry);
			} else {
				Set<StreamElementQueueEntry<?>> watermarkSet = new HashSet<>(1);
				watermarkSet.add(streamElementQueueEntry);
				uncompletedQueue.offer(watermarkSet);
			}
			uncompletedQueue.offer(lastSet);
		} else {
      //在沒有遇到watermark之前，一直往lastSet中塞入StreamRecordBufferEntry對象
			lastSet.add(streamElementQueueEntry);
		}

		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// The accept executor thread got interrupted. This is probably cause by
					// the shutdown of the executor.
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);

		numberEntries++;
	}
		// watermarkBufferEntry對象 or StreamRecordBufferEntry對象中的Futrue對象返回後的回調邏輯
  	public void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // 從firstSet中取出StreamRecordBufferEntry對象，每次都是嘗試從firstSet中獲取StreamRecordBufferEntry對象，通過這個if邏輯來控制watermark和set之間的順序
			if (firstSet.remove(streamElementQueueEntry)) {
        // 將取出的StreamRecordBufferEntry對象加入completedQueue
				completedQueue.offer(streamElementQueueEntry);

				while (firstSet.isEmpty() && firstSet != lastSet) {
          // firstSet指針下移
					firstSet = uncompletedQueue.poll();

					Iterator<StreamElementQueueEntry<?>> it = firstSet.iterator();
					// 遍歷firstSet中的StreamRecordBufferEntry對象，如果完成，加入completedQueue隊列，且移出firstSet
					while (it.hasNext()) {
						StreamElementQueueEntry<?> bufferEntry = it.next();
						if (bufferEntry.isDone()) {
							completedQueue.offer(bufferEntry);
							it.remove();
						}
					}
				}

				LOG.debug("Signal unordered stream element queue has completed entries.");
				hasCompletedEntries.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}

3.4.2 消費

Emitter線程消費邏輯同ordered模式

4. 總結

Flink Async I/O利用隊列來存儲加寬前（ordered模式）或加寬後（基於processtime的unordered模式）的數據，並通過隊列和Emitter輪詢線程將生產數據與消費數據進行解耦。
ordered模式，通過將未返回結果的StreamRecordBufferEntry對象按順序插入隊列，並通過判斷頭結點是否返回，來控制消費順序與生產順序一致
基於processtime的unordered模式，在數據回調邏輯中，將StreamRecordBufferEntry對象插入隊列，即隊列中的所有StreamRecordBufferEntry對象都是已經返回異步結果並加寬後的數據。
基於eventTime的unordered模式，uncompleteQueue存儲加寬前的數據（異步調用返回前），completeQueue存儲加寬後的數據，通過firstSet這個設計，來控制watermark和數據集set之間的順序。

參考：

http://wuchong.me/blog/2017/05/17/flink-internals-async-io/

https://www.cnblogs.com/ljygz/p/11864176.html

https://www.jianshu.com/p/f9bde854627b

https://blog.csdn.net/weixin_44904816/article/details/104305824?utm_medium=distribute.pc_relevant_right.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase&depth_1-utm_source=distribute.pc_relevant_right.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase

Flink源碼剖析：Flink Async I/O的三種模式

文章目錄

1. 維表join

2. richmapfunction

2.1 示例

3. asyncio

3.1 示例

3.2 Ordered模式

3.2.1 生產

3.2.2 消費

3.3 基於processtime的unordered模式

3.3.1 生產

3.3.2 消費

3.4 基於eventTime的unordered模式

3.4.1 生產

3.4.2 消費

4. 總結

表達式引擎性能對比

Flink源碼剖析：flink-streaming-java 之 JobGraph

Flink源碼剖析：Flink Async I/O的三種模式

Flink原理：Flink中的日誌框架配置

Flink源碼剖析：flink-cep 自帶測試用例

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結