Flink源碼剖析:Flink Async I/O的三種模式

1. 維表join

流計算系統中經常需要與外部系統進行交互,比如需要查詢外部數據庫以關聯上用戶的額外信息。通常,我們的實現方式是向數據庫發送用戶a的查詢請求,然後等待結果返回,在這之前,我們無法發送用戶b的查詢請求。這是一種同步訪問的模式,如下圖左邊所示。

在這裏插入圖片描述

圖中棕色的長條表示等待時間,可以發現網絡等待時間極大地阻礙了吞吐和延遲。爲了解決同步訪問的問題,異步模式可以併發地處理多個請求和回覆。也就是說,你可以連續地向數據庫發送用戶abc等的請求,與此同時,哪個請求先返回了就處理哪個請求,從而連續的請求之間不需要阻塞等待,如上圖右邊所示。這也正是 Async I/O 的實現原理。

2. richmapfunction

利用richmapfunction進行維表關聯,就是典型的sync I/O的關聯方式。兩次請求之間阻塞進行。不適合併發量高的情形。

2.1 示例

  public static final class MapWithSiteInfoFunc
    extends RichMapFunction<String, String> {
    private static final Logger LOGGER = LoggerFactory.getLogger(MapWithSiteInfoFunc.class);
    private static final long serialVersionUID = 1L;
    private transient ScheduledExecutorService dbScheduler;
    // 引入緩存,減小請求次數
    private Map<Integer, SiteAndCityInfo> siteInfoCache;

    @Override
    public void open(Configuration parameters) throws Exception {
      super.open(parameters);
      siteInfoCache = new HashMap<>(1024);
			// 利用定時線程,實現維度數據的週期性更新
      dbScheduler = new ScheduledThreadPoolExecutor(1, r -> {
        Thread thread = new Thread(r, "site-info-update-thread");
        thread.setUncaughtExceptionHandler((t, e) -> {
          LOGGER.error("Thread " + t + " got uncaught exception: " + e);
        });
        return thread;
      });

      dbScheduler.scheduleWithFixedDelay(() -> {
        try {
          QueryRunner queryRunner = new QueryRunner(JdbcUtil.getDataSource());
          List<Map<String, Object>> info = queryRunner.query(SITE_INFO_QUERY_SQL, new MapListHandler());

          for (Map<String, Object> item : info) {
            siteInfoCache.put((int) item.get("site_id"), new SiteAndCityInfo(
              (int) item.get("site_id"),
              (String) item.getOrDefault("site_name", ""),
              (long) item.get("city_id"),
              (String) item.getOrDefault("city_name", "")
            ));
          }

          LOGGER.info("Fetched {} site info records, {} records in cache", info.size(), siteInfoCache.size());
        } catch (Exception e) {
          LOGGER.error("Exception occurred when querying: " + e);
        }
      }, 0, 10 * 60, TimeUnit.SECONDS);
    }

    @Override
    public String map(String value) throws Exception {
      JSONObject json = JSON.parseObject(value);
      int siteId = json.getInteger("site_id");
     
      String siteName = "", cityName = "";
      SiteAndCityInfo info = siteInfoCache.getOrDefault(siteId, null);
      if (info != null) {
        siteName = info.getSiteName();
        cityName = info.getCityName();
      }

      json.put("site_name", siteName);
      json.put("city_name", cityName);
      return json.toJSONString();
    }

    @Override
    public void close() throws Exception {
      // 清空緩存,關閉連接
      siteInfoCache.clear();
      ExecutorUtils.gracefulShutdown(10, TimeUnit.SECONDS, dbScheduler);
      JdbcUtil.close();

      super.close();
    }

    private static final String SITE_INFO_QUERY_SQL = "...";
  }

3. asyncio

Flink 1.2中引入了Async IO(異步IO)來加快flink與外部系統的交互性能,提升吞吐量。其設計的核心是對原有的每條處理後的消息發送至下游operator的執行流程進行改進。其核心實現包括生產和消費兩部分,生產端引入了一個AsyncWaitOperator,在其processElement/processWatermark方法中完成對消息的維表關聯,隨即將未處理完的Futrue對象存入隊列中;消費端引入一個Emitter線程,不斷從隊列中消費數據併發往下游算子。

3.1 示例

簡單的來說,使用 Async I/O 對應到 Flink 的 API 就是 RichAsyncFunction 這個抽象類,繼承這個抽象類實現裏面的open(初始化),asyncInvoke(數據異步調用),close(停止的一些操作)方法,最主要的是實現asyncInvoke 裏面的方法。有如下示例,Kafka作爲流表,存儲用戶瀏覽記錄,Elasticsearch作爲維表,存儲用戶年齡信息,利用async I/O對瀏覽記錄進行加寬。

流表: 用戶行爲日誌。某個用戶在某個時刻點擊或瀏覽了某個商品。自己造的測試數據,數據樣例如下:

{"userID": "user_1", "eventTime": "2016-06-06 07:03:42", "eventType": "browse", "productID": 2}

維表: 用戶基礎信息。自己造的測試數據,數據存儲在ES上,數據樣例如下:

GET dim_user/dim_user/user

{
  "_index": "dim_user",
  "_type": "dim_user",
  "_id": "user_1",
  "_version": 1,
  "found": true,
  "_source": {
    "age": 22
  }
}

實現邏輯:

public class FlinkAsyncIO {
    public static void main(String[] args) throws Exception{

        String kafkaBootstrapServers = "localhost:9092";
        String kafkaGroupID = "async-test";
        String kafkaAutoOffsetReset= "latest";
        String kafkaTopic = "asyncio";
        int kafkaParallelism =2;

        String esHost= "localhost";
        Integer esPort= 9200;
        String esUser = "";
        String esPassword = "";
        String esIndex = "dim_user";
        String esType = "dim_user";

        /**Flink DataStream 運行環境*/
        Configuration config = new Configuration();
        config.setInteger(RestOptions.PORT,8081);
        config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);

        /**添加數據源*/
        Properties kafkaProperties = new Properties();
        kafkaProperties.put("bootstrap.servers",kafkaBootstrapServers);
        kafkaProperties.put("group.id",kafkaGroupID);
        kafkaProperties.put("auto.offset.reset",kafkaAutoOffsetReset);
        FlinkKafkaConsumer010<String> kafkaConsumer = new FlinkKafkaConsumer010<>(kafkaTopic, new SimpleStringSchema(), kafkaProperties);
        kafkaConsumer.setCommitOffsetsOnCheckpoints(true);
        SingleOutputStreamOperator<String> source = env.addSource(kafkaConsumer).name("KafkaSource").setParallelism(kafkaParallelism);

        //數據轉換
        SingleOutputStreamOperator<Tuple4<String, String, String, Integer>> sourceMap = source.map((MapFunction<String, Tuple4<String, String, String, Integer>>) value -> {
            Tuple4<String, String, String, Integer> output = new Tuple4<>();
            try {
                JSONObject obj = JSON.parseObject(value);
                output.f0 = obj.getString("userID");
                output.f1 = obj.getString("eventTime");
                output.f2 = obj.getString("eventType");
                output.f3 = obj.getInteger("productID");
            } catch (Exception e) {
                e.printStackTrace();
            }
            return output;
        }).returns(new TypeHint<Tuple4<String, String, String, Integer>>(){}).name("Map: ExtractTransform");

        //過濾掉異常數據
        SingleOutputStreamOperator<Tuple4<String, String, String, Integer>> sourceFilter = sourceMap.filter((FilterFunction<Tuple4<String, String, String, Integer>>) value -> value.f3 != null).name("Filter: FilterExceptionData");

        //Timeout: 超時時間 默認異步I/O請求超時時,會引發異常並重啓或停止作業。 如果要處理超時,可以重寫AsyncFunction#timeout方法。
        //Capacity: 併發請求數量
        /**Async IO實現流表與維表Join*/
        SingleOutputStreamOperator<Tuple5<String, String, String, Integer, Integer>> result = AsyncDataStream.unorderedWait(sourceFilter, new ElasticsearchAsyncFunction(esHost,esPort,esUser,esPassword,esIndex,esType), 500, TimeUnit.MILLISECONDS, 10).name("Join: JoinWithDim");

        /**結果輸出*/
        result.print().name("PrintToConsole");
        env.execute();
    }
}

ElasticsearchAsyncFunction:

public class ElasticsearchAsyncFunction extends RichAsyncFunction<Tuple4<String, String, String, Integer>, Tuple5<String, String, String, Integer, Integer>> {
    private String host;
    private Integer port;
    private String user;
    private String password;
    private String index;
    private String type;

    public ElasticsearchAsyncFunction(String host, Integer port, String user, String password, String index, String type) {
        this.host = host;
        this.port = port;
        this.user = user;
        this.password = password;
        this.index = index;
        this.type = type;
    }

    private RestHighLevelClient restHighLevelClient;
    private Cache<String, Integer> cache;
    /**
     * 和ES建立連接
     *
     * @param parameters
     */
    @Override
    public void open(Configuration parameters) {

        //ES Client
        CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(user, password));
        restHighLevelClient = new RestHighLevelClient(
                RestClient
                        .builder(new HttpHost(host, port))
                        .setHttpClientConfigCallback(httpAsyncClientBuilder -> HttpAsyncClientBuilder.create()));

        //初始化緩存
        cache = CacheBuilder.newBuilder().maximumSize(2).expireAfterAccess(5, TimeUnit.MINUTES).build();
    }

    /**
     * 關閉連接
     *
     * @throws Exception
     */
    @Override
    public void close() throws Exception {
        restHighLevelClient.close();
    }
    /**
     * 異步調用
     *
     * @param input
     * @param resultFuture
     */
    @Override
    public void asyncInvoke(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {

        // 1、先從緩存中取
        Integer cachedValue = cache.getIfPresent(input.f0);
        if (cachedValue != null) {
            System.out.println("從緩存中獲取到維度數據: key=" + input.f0 + ",value=" + cachedValue);
            resultFuture.complete(Collections.singleton(new Tuple5<>(input.f0, input.f1, input.f2, input.f3, cachedValue)));

            // 2、緩存中沒有,則從外部存儲獲取
        } else {
            searchFromES(input, resultFuture);
        }
    }
    /**
     * 當緩存中沒有數據時,從外部存儲ES中獲取
     *
     * @param input
     * @param resultFuture
     */
    private void searchFromES(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {

        // 1、構造輸出對象
        Tuple5<String, String, String, Integer, Integer> output = new Tuple5<>();
        output.f0 = input.f0;
        output.f1 = input.f1;
        output.f2 = input.f2;
        output.f3 = input.f3;

        // 2、待查詢的Key
        String dimKey = input.f0;

        // 3、構造Ids Query
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices(index);
        searchRequest.types(type);
        searchRequest.source(SearchSourceBuilder.searchSource().query(QueryBuilders.idsQuery().addIds(dimKey)));

        RequestOptions requestOptions = RequestOptions.DEFAULT;
        // 4、用異步客戶端查詢數據
        restHighLevelClient.searchAsync(searchRequest, RequestOptions.DEFAULT, new ActionListener<SearchResponse>() {

            //成功響應時處理
            @Override
            public void onResponse(SearchResponse searchResponse) {
                SearchHit[] searchHits = searchResponse.getHits().getHits();
                if (searchHits.length > 0) {
                    JSONObject obj = JSON.parseObject(searchHits[0].getSourceAsString());
                    Integer dimValue = obj.getInteger("age");
                    output.f4 = dimValue;
                    cache.put(dimKey, dimValue);
                    System.out.println("將維度數據放入緩存: key=" + dimKey + ",value=" + dimValue);
                }
                resultFuture.complete(Collections.singleton(output));
            }

            //響應失敗時處理
            @Override
            public void onFailure(Exception e) {
                output.f4 = null;
                resultFuture.complete(Collections.singleton(output));
            }
        });

    }

    //超時時處理
    @Override
    public void timeout(Tuple4<String, String, String, Integer> input, ResultFuture<Tuple5<String, String, String, Integer, Integer>> resultFuture) {
        searchFromES(input, resultFuture);
    }
}

3.2 Ordered模式

Flink Async I/O又可以細分爲三種,一種是有序的Ordered模式,一種是ProcessingTime 無序模式,一種是EventTime 無序。

主要區別是往下游output的順序,有序模式會按接收的順序繼續往下游output發送,無序模式就是誰先處理完誰就先往下游發送。下圖是ordered模式的原理圖。

無論有序無需,都採用了Futrue/Promise設計模式,大體都遵循以下設計邏輯:

  1. 生產端:將每條消息封裝成一個StreamRecordQueueEntry(內部維護一個Future對象),放入StreamElementQueue

  2. 生產端:消息與外部系統交互的邏輯放入AsynInvoke方法中,將交互執行結果放入StreamRecordQueueEntry

  3. 消費端:啓動一個emitter線程,從StreamElementQueue中讀取已經完成的StreamRecordQueueEntry,將其結果發送至下游operator算子

在這裏插入圖片描述

下面我們分別就生產端和消費端對ordered模式進行源碼分析

3.2.1 生產

AsyncWaitOperator

@Internal
public class AsyncWaitOperator<IN, OUT>
		extends AbstractUdfStreamOperator<OUT, AsyncFunction<IN, OUT>>
		implements OneInputStreamOperator<IN, OUT>, OperatorActions, BoundedOneInput {

	@Override
	public void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) {
		super.setup(containingTask, config, output);

		this.checkpointingLock = getContainingTask().getCheckpointLock();

		this.inStreamElementSerializer = new StreamElementSerializer<>(
			getOperatorConfig().<IN>getTypeSerializerIn1(getUserCodeClassloader()));

		// create the operators executor for the complete operations of the queue entries
		this.executor = Executors.newSingleThreadExecutor();
		// 根據項目中使用AsyncDataStream.unorderedWait還是AsyncDataStream.orderedWait方法,進行有序和無需兩種模式的區分,初始化不同的隊列
		switch (outputMode) {
			case ORDERED:
				queue = new OrderedStreamElementQueue(
					capacity,
					executor,
					this);
				break;
			case UNORDERED:
				queue = new UnorderedStreamElementQueue(
					capacity,
					executor,
					this);
				break;
			default:
				throw new IllegalStateException("Unknown async mode: " + outputMode + '.');
		}
	}

	@Override
	public void open() throws Exception {
		super.open();

		// 啓動emitter線程
		this.emitter = new Emitter<>(checkpointingLock, output, queue, this);
		this.emitterThread = new Thread(emitter, "AsyncIO-Emitter-Thread (" + getOperatorName() + ')');
		emitterThread.setDaemon(true);
		emitterThread.start();

		// process stream elements from state, since the Emit thread will start as soon as all
		// elements from previous state are in the StreamElementQueue, we have to make sure that the
		// order to open all operators in the operator chain proceeds from the tail operator to the
		// head operator.
		if (recoveredStreamElements != null) {
			for (StreamElement element : recoveredStreamElements.get()) {
				if (element.isRecord()) {
					processElement(element.<IN>asRecord());
				}
				else if (element.isWatermark()) {
					processWatermark(element.asWatermark());
				}
				else if (element.isLatencyMarker()) {
					processLatencyMarker(element.asLatencyMarker());
				}
				else {
					throw new IllegalStateException("Unknown record type " + element.getClass() +
						" encountered while opening the operator.");
				}
			}
			recoveredStreamElements = null;
		}

	}

  // 算子中的processElement方法,都會逐個處理每一條到來的數據
	@Override
	public void processElement(StreamRecord<IN> element) throws Exception {
    // 將數據包裝成StreamRecordBufferEntry對象
		final StreamRecordQueueEntry<OUT> streamRecordBufferEntry = new StreamRecordQueueEntry<>(element);

		addAsyncBufferEntry(streamRecordBufferEntry);
		// 調用AsyncFunction接口的用戶自定義實現類ElasticsearchAsyncFunction中的asyncInvoke方法,該用戶實現方法中,將返回結果通過異步回調的方式,返回給StreamRecordBufferEntry對象中的Future對象
		userFunction.asyncInvoke(element.getValue(), streamRecordBufferEntry);
	}

	private <T> void addAsyncBufferEntry(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		assert(Thread.holdsLock(checkpointingLock));

		pendingStreamElementQueueEntry = streamElementQueueEntry;
		// 嘗試將StreamRecordQueueEntry對象加入到隊列
		while (!queue.tryPut(streamElementQueueEntry)) {
			// we wait for the emitter to notify us if the queue has space left again
			checkpointingLock.wait();
		}
		pendingStreamElementQueueEntry = null;
	}
}

OrderedStreamElementQueue

@Internal
public class OrderedStreamElementQueue implements StreamElementQueue {
  // 往OrderedStreamElementQueue隊列中插入StreamRecordBufferEntry對象
	@Override
	public <T> boolean tryPut(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // capacity用於控制併發請求數量,即OrderedStreamElementQueue隊列中的StreamRecordBufferEntry對象的個數
			if (queue.size() < capacity) {
				addEntry(streamElementQueueEntry);

				LOG.debug("Put element into ordered stream element queue. New filling degree " +
					"({}/{}).", queue.size(), capacity);

				return true;
			} else {
        // 如果一直插入失敗,則AsyncWaitOperator#addAsyncBufferEntry方法會無限嘗試插入,極致情況下,會觸發Flink自身的反壓機制,用戶不用做任何特殊處理
				LOG.debug("Failed to put element into ordered stream element queue because it " +
					"was full ({}/{}).", queue.size(), capacity);
				return false;
			}
		} finally {
			lock.unlock();
		}
	}
  
  	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());
		// 將StreamRecordBufferEntry對象插入隊尾
		queue.addLast(streamElementQueueEntry);

    // StreamRecordBufferEntry對象中的Futrue對象一旦返回結果,則進行以下調用
		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// we got interrupted. This indicates a shutdown of the executor
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);
	}
  
  	private void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // 隊列不爲空,且隊首StreamRecordBufferEntry對象中的Future對象已收到返回值,則通過Condition對象喚醒emmiter線程,使其能夠取出隊首元素
			if (!queue.isEmpty() && queue.peek().isDone()) {
				LOG.debug("Signal ordered stream element queue has completed head element.");
				headIsCompleted.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}
  
  	@Override
	public AsyncResult peekBlockingly() throws InterruptedException {
		lock.lockInterruptibly();

		try {
			// emmiter線程在從隊列中取StreamRecordBufferEntry對象時,如果隊列爲空 or 隊首future未完成,則emmiter線程會一直阻塞
			while (queue.isEmpty() || !queue.peek().isDone()) {
        // Condition阻塞
				headIsCompleted.await();
			}

			LOG.debug("Peeked head element from ordered stream element queue with filling degree " +
				"({}/{}).", queue.size(), capacity);

			return queue.peek();
		} finally {
			lock.unlock();
		}
	}
}
3.2.2 消費

Emmiter

@Internal
public class Emitter<OUT> implements Runnable {
  @Override
	public void run() {
		try {
      // 不斷嘗試讀取隊首元素,在OrderedStreamElementQueue#peekBlockingly中可以看到,如果隊首元素中的Future對象還沒有返回數據,Emitter線程會一直阻塞
			while (running) {
				LOG.debug("Wait for next completed async stream element result.");
				AsyncResult streamElementEntry = streamElementQueue.peekBlockingly();
				// 將數據發往下游算子
				output(streamElementEntry);
			}
		} catch (InterruptedException e) {
			if (running) {
				operatorActions.failOperator(e);
			} else {
				// Thread got interrupted which means that it should shut down
				LOG.debug("Emitter thread got interrupted, shutting down.");
			}
		} catch (Throwable t) {
			operatorActions.failOperator(new Exception("AsyncWaitOperator's emitter caught an " +
				"unexpected throwable.", t));
		}
	}
  
  	private void output(AsyncResult asyncResult) throws InterruptedException {
		if (asyncResult.isWatermark()) {
			synchronized (checkpointLock) {
				AsyncWatermarkResult asyncWatermarkResult = asyncResult.asWatermark();

				LOG.debug("Output async watermark.");
				// 如果是watermark,直接發往下游算子
				output.emitWatermark(asyncWatermarkResult.getWatermark());

				// 移除隊首StreamRecordBufferEntry對象,注意peek和poll的區別
				streamElementQueue.poll();

				// notify the main thread that there is again space left in the async collector
				// buffer
				checkpointLock.notifyAll();
			}
		} else {
			AsyncCollectionResult<OUT> streamRecordResult = asyncResult.asResultCollection();

			if (streamRecordResult.hasTimestamp()) {
				timestampedCollector.setAbsoluteTimestamp(streamRecordResult.getTimestamp());
			} else {
				timestampedCollector.eraseTimestamp();
			}

			synchronized (checkpointLock) {
				LOG.debug("Output async stream element collection result.");

				try {
					// 取出StreamRecordBufferEntry對象中的Future對象中的join後的數據
					Collection<OUT> resultCollection = streamRecordResult.get();

					// 將數據發往下游算子
					if (resultCollection != null) {
						for (OUT result : resultCollection) {
							timestampedCollector.collect(result);
						}
					}
				} catch (Exception e) {
					operatorActions.failOperator(
						new Exception("An async function call terminated with an exception. " +
							"Failing the AsyncWaitOperator.", e));
				}

				// 移除隊首StreamRecordBufferEntry對象,注意peek和poll的區別
				streamElementQueue.poll();

				// notify the main thread that there is again space left in the async collector
				// buffer
				checkpointLock.notifyAll();
			}
		}
	}
}

3.3 基於processtime的unordered模式

區別於ordered模式,無序模式下的StreamRecordBufferEntry對象外層又被封裝了一層Set層,主要是爲了應對watermark的存在,詳情見下節。基於processtime的unordered模式下,雖然沒有watermark,但是也跟基於eventTime的unordered模式共用了同一套邏輯,因此也多了一層Set層。

該模式下,不存在watermark類型的消息,因此所有消息的StreamRecordBufferEntry對象都是放入lastSet(此模式下,lastSet和firstSet引用相同的對象),在消息的onCompleteHandler方法中,直接將該消息的StreamRecordBufferEntry對象從lastSet中取出再放入completeQueue中,通過emitter線程發送至下游operator,因此該場景下實現的是完全無序的處理模式。

雲邪在其博客 《Flink 原理與實現:Aysnc I/O》中提到的基於processtime的unordered模式的架構圖,是針對Flink 1.3進行分析的,已經不再適用於Flink1.9,Flink1.9中該模式已經不需要用到uncompletedQueue,架構圖如下:

在這裏插入圖片描述
另,雲邪博客中的asyncCollector等數據結構在Flink1.9中也不復存在,本文針對Flink1.9進行分析。

3.3.1 生產

UnorderedStreamElementQueue

@Internal
public class UnorderedStreamElementQueue implements StreamElementQueue {
  	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());

		if (streamElementQueueEntry.isWatermark()) {
			lastSet = new HashSet<>(capacity);
			
			if (firstSet.isEmpty()) {
				firstSet.add(streamElementQueueEntry);
			} else {
				Set<StreamElementQueueEntry<?>> watermarkSet = new HashSet<>(1);
				watermarkSet.add(streamElementQueueEntry);
				uncompletedQueue.offer(watermarkSet);
			}
			uncompletedQueue.offer(lastSet);
		} else {
      // 基於processtime的unordered模式只會走這裏,且lastSet和firstSet引用同一個對象
			lastSet.add(streamElementQueueEntry);
		}

		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// The accept executor thread got interrupted. This is probably cause by
					// the shutdown of the executor.
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);

		numberEntries++;
	}
  
   // StreamRecordBufferEntry對象中的Future對象返回結果時進行回調
  	public void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();
      
		try {
      // 將StreamRecordBufferEntry對象插入completedQueue隊列
      // 此處將StreamRecordBufferEntry對象插入lastSet(等同於firstSet),又從其中取出,確實是比較多餘的。這樣做只是因爲跟”基於eventTime的unordered模式”共用了一套代碼
			if (firstSet.remove(streamElementQueueEntry)) {
        // 將StreamRecordBufferEntry對象加入completedQueue
				completedQueue.offer(streamElementQueueEntry);
				// 該模式下不會走下面的代碼
				while (firstSet.isEmpty() && firstSet != lastSet) {
					firstSet = uncompletedQueue.poll();

					Iterator<StreamElementQueueEntry<?>> it = firstSet.iterator();

					while (it.hasNext()) {
						StreamElementQueueEntry<?> bufferEntry = it.next();

						if (bufferEntry.isDone()) {
							completedQueue.offer(bufferEntry);
							it.remove();
						}
					}
				}

				LOG.debug("Signal unordered stream element queue has completed entries.");
				hasCompletedEntries.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}
  
  	@Override
	public AsyncResult peekBlockingly() throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // emitter線程從completedQueue取出StreamRecordBufferEntry對象,相比ordered模式,這裏不需要判斷隊首StreamRecordBufferEntry對象中的Future對象是否已經返回,因爲只有Futrue已返回的StreamRecordBufferEntry對象才能被插入到completedQueue隊列
			while (completedQueue.isEmpty()) {
				hasCompletedEntries.await();
			}

			LOG.debug("Peeked head element from unordered stream element queue with filling degree " +
				"({}/{}).", numberEntries, capacity);

			return completedQueue.peek();
		} finally {
			lock.unlock();
		}
	}
  
}
3.3.2 消費

Emitter線程消費邏輯同ordered模式

3.4 基於eventTime的unordered模式

該模式下雖然一段時間內的消息之間是無序的,但是由於引入了watermark,watermark1和watermark2之間的數據必須還是原來那批數據,雖然數據之間是可以是亂序的。即Set集合內部的數據,發往下游時可以亂序,但是watermark1—set—watermark2這個順序不可以被打破。

如果watermark和數據集set直接的順序被打亂,那麼當watermark2觸發窗口計算時,窗口裏面的數據可能會變多或變少,影響計算的正確性。
在這裏插入圖片描述

3.4.1 生產

AsyncWaitOperator

@Internal
public class AsyncWaitOperator<IN, OUT>
		extends AbstractUdfStreamOperator<OUT, AsyncFunction<IN, OUT>>
		implements OneInputStreamOperator<IN, OUT>, OperatorActions, BoundedOneInput {
  	@Override
	public void processWatermark(Watermark mark) throws Exception {
		WatermarkQueueEntry watermarkBufferEntry = new WatermarkQueueEntry(mark);
    // 處理watermark
		addAsyncBufferEntry(watermarkBufferEntry);
	}
  
  	@Override
	public void processElement(StreamRecord<IN> element) throws Exception {
		final StreamRecordQueueEntry<OUT> streamRecordBufferEntry = new StreamRecordQueueEntry<>(element);
    // 處理StreamRecordBufferEntry對象
		addAsyncBufferEntry(streamRecordBufferEntry);
		userFunction.asyncInvoke(element.getValue(), streamRecordBufferEntry);
	}
  
  	private <T> void addAsyncBufferEntry(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		assert(Thread.holdsLock(checkpointingLock));

		pendingStreamElementQueueEntry = streamElementQueueEntry;
		// 嘗試將StreamRecordBufferEntry對象 or watermarkBufferEntry對象插入隊列
		while (!queue.tryPut(streamElementQueueEntry)) {
			// we wait for the emitter to notify us if the queue has space left again
			checkpointingLock.wait();
		}
		pendingStreamElementQueueEntry = null;
	}
}

UnorderedStreamElementQueue

@Internal
public class UnorderedStreamElementQueue implements StreamElementQueue {
	@Override
	public <T> boolean tryPut(StreamElementQueueEntry<T> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();
		try {
			if (numberEntries < capacity) {
				addEntry(streamElementQueueEntry);

				LOG.debug("Put element into unordered stream element queue. New filling degree " +
					"({}/{}).", numberEntries, capacity);

				return true;
			} else {
				LOG.debug("Failed to put element into unordered stream element queue because it " +
					"was full ({}/{}).", numberEntries, capacity);

				return false;
			}
		} finally {
			lock.unlock();
		}
	}

	private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) {
		assert(lock.isHeldByCurrentThread());

		if (streamElementQueueEntry.isWatermark()) {
      // 遇到watermark,將lastSet置空,方便塞入下一批StreamRecordBufferEntry對象
      // 注:firstSet可以存watermarkBufferEntry對象,也可以存StreamRecordBufferEntry對象;但是
      // lastSet只會存StreamRecordBufferEntry對象
			lastSet = new HashSet<>(capacity); 

			if (firstSet.isEmpty()) {
				firstSet.add(streamElementQueueEntry);
			} else {
				Set<StreamElementQueueEntry<?>> watermarkSet = new HashSet<>(1);
				watermarkSet.add(streamElementQueueEntry);
				uncompletedQueue.offer(watermarkSet);
			}
			uncompletedQueue.offer(lastSet);
		} else {
      //在沒有遇到watermark之前,一直往lastSet中塞入StreamRecordBufferEntry對象
			lastSet.add(streamElementQueueEntry);
		}

		streamElementQueueEntry.onComplete(
			(StreamElementQueueEntry<T> value) -> {
				try {
					onCompleteHandler(value);
				} catch (InterruptedException e) {
					// The accept executor thread got interrupted. This is probably cause by
					// the shutdown of the executor.
					LOG.debug("AsyncBufferEntry could not be properly completed because the " +
						"executor thread has been interrupted.", e);
				} catch (Throwable t) {
					operatorActions.failOperator(new Exception("Could not complete the " +
						"stream element queue entry: " + value + '.', t));
				}
			},
			executor);

		numberEntries++;
	}
		// watermarkBufferEntry對象 or StreamRecordBufferEntry對象中的Futrue對象返回後的回調邏輯
  	public void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
		lock.lockInterruptibly();

		try {
      // 從firstSet中取出StreamRecordBufferEntry對象,每次都是嘗試從firstSet中獲取StreamRecordBufferEntry對象,通過這個if邏輯來控制watermark和set之間的順序
			if (firstSet.remove(streamElementQueueEntry)) {
        // 將取出的StreamRecordBufferEntry對象加入completedQueue
				completedQueue.offer(streamElementQueueEntry);

				while (firstSet.isEmpty() && firstSet != lastSet) {
          // firstSet指針下移
					firstSet = uncompletedQueue.poll();

					Iterator<StreamElementQueueEntry<?>> it = firstSet.iterator();
					// 遍歷firstSet中的StreamRecordBufferEntry對象,如果完成,加入completedQueue隊列,且移出firstSet
					while (it.hasNext()) {
						StreamElementQueueEntry<?> bufferEntry = it.next();
						if (bufferEntry.isDone()) {
							completedQueue.offer(bufferEntry);
							it.remove();
						}
					}
				}

				LOG.debug("Signal unordered stream element queue has completed entries.");
				hasCompletedEntries.signalAll();
			}
		} finally {
			lock.unlock();
		}
	}

3.4.2 消費

Emitter線程消費邏輯同ordered模式

4. 總結

  1. Flink Async I/O利用隊列來存儲加寬前(ordered模式)或加寬後(基於processtime的unordered模式)的數據,並通過隊列和Emitter輪詢線程將生產數據與消費數據進行解耦。

  2. ordered模式,通過將未返回結果的StreamRecordBufferEntry對象按順序插入隊列,並通過判斷頭結點是否返回,來控制消費順序與生產順序一致

  3. 基於processtime的unordered模式,在數據回調邏輯中,將StreamRecordBufferEntry對象插入隊列,即隊列中的所有StreamRecordBufferEntry對象都是已經返回異步結果並加寬後的數據。

  4. 基於eventTime的unordered模式,uncompleteQueue存儲加寬前的數據(異步調用返回前),completeQueue存儲加寬後的數據,通過firstSet這個設計,來控制watermark和數據集set之間的順序。

參考:

http://wuchong.me/blog/2017/05/17/flink-internals-async-io/

https://www.cnblogs.com/ljygz/p/11864176.html

https://www.jianshu.com/p/f9bde854627b

https://blog.csdn.net/weixin_44904816/article/details/104305824?utm_medium=distribute.pc_relevant_right.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase&depth_1-utm_source=distribute.pc_relevant_right.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章