前提：

1 flink state分爲三種，1）operator state是跟算子關聯的，粒度是task，即便相同的算子的其他並行的task也不能互相訪問各自的狀態。 2）keyed state是跟key stream關聯的。粒度是key,相同的task不同key的數據狀態不共享，只有相同key纔可以共享狀態。, 3）broadcast state, 分爲批Set的廣播狀態和流stream的流合併，一個廣播流，一個事實流。

當Flink左右從checkpoint恢復，或者從savepoint中重啓的時候，就回涉及到狀態的重新分配，尤其是當並行度發生改變的時候。

Operator State Redistribute

當operator改變並行度的時候(Rescale),會觸發狀態的Redistribute，即Operator State裏的數據會重新分配到Operator的Task實例。這裏有三種方式，舉例說明平行度由3改爲2

1 這個是普通的ListState，原先所有State中的元素均勻劃分給新的Task

2 這個是UnionList State,所有的State中的元素全部分配給新的Task

3 還有一張是BroadState State，所有Task上的State都是一樣的，新的Task獲得State的一個備份。

選擇方式：

由於是operate state所以只能在operate中使用，並且由於是state從checkpoint中的重分配，所以對應的類必須實現CheckpointedFunction接口。如下圖方式應用，這樣就會初始化ListState了，就可以不用在open方法裏初始化了。

public class BufferingSink
        implements SinkFunction<Tuple2<String, Integer>>,
                   CheckpointedFunction {

    private final int threshold;

    private transient ListState<Tuple2<String, Integer>> checkpointedState;

    private List<Tuple2<String, Integer>> bufferedElements;

    public BufferingSink(int threshold) {
        this.threshold = threshold;
        this.bufferedElements = new ArrayList<>();
    }

    @Override
    public void invoke(Tuple2<String, Integer> value) throws Exception {
        bufferedElements.add(value);
        if (bufferedElements.size() == threshold) {
            for (Tuple2<String, Integer> element: bufferedElements) {
                // send it to the sink
            }
            bufferedElements.clear();
        }
    }

    @Override
    public void snapshotState(FunctionSnapshotContext context) throws Exception {
        checkpointedState.clear();
        for (Tuple2<String, Integer> element : bufferedElements) {
            checkpointedState.add(element);
        }
    }

    @Override
    public void initializeState(FunctionInitializationContext context) throws Exception {
        ListStateDescriptor<Tuple2<String, Integer>> descriptor =
            new ListStateDescriptor<>(
                "buffered-elements",
                TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {}));

        checkpointedState = context.getOperatorStateStore().getListState(descriptor);
//在這裏選擇模式，getListState或者getUnionListState

        if (context.isRestored()) {
            for (Tuple2<String, Integer> element : checkpointedState.get()) {
                bufferedElements.add(element);
            }
        }
    }
}

Keyed State Redistribute

對於Keyed State就比較好處理了，對應的Key被Redistribute到哪個task，對應的keyed state就被Redistribute到哪個Task

Keyed state redistribute是基於Key Group來做分配的：

    將key分爲group
    每個key分配到唯一的group
    將group分配給task實例
    Keygroup由最大並行度的大小所決定

Keyed State最終被分配到哪個Task，需要經過一下三個步驟:

    hash = hash(key)
    KeyGroup = hash%numOfKeyGroups
    SubTask=KeyGroup*parallelism/numOfKeyGroups

小結

flink有兩種基本的state，分別是Keyed State以及Operator State(non-keyed state)；其中Keyed State只能在KeyedStream上的functions及operators上使用；每個operator state會跟parallel operator中的一個實例綁定；Operator State支持parallelism變更時進行redistributing
Keyed State及Operator State都分別有managed及raw兩種形式，managed由flink runtime來管理，由runtime負責encode及寫入checkpoint；raw形式的state由operators自己管理，flink runtime無法瞭解該state的數據結構，將其視爲raw bytes；所有的datastream function都可以使用managed state，而raw state一般僅限於自己實現operators來使用
stateful function可以通過CheckpointedFunction接口或者ListCheckpointed接口來使用managed operator state；CheckpointedFunction定義了snapshotState、initializeState兩個方法；每當checkpoint執行的時候，snapshotState會被調用；而initializeState方法在每次用戶定義的function初始化的時候(第一次初始化或者從前一次checkpoint recover的時候)被調用，該方法不僅可以用來初始化state，還可以用於處理state recovery的邏輯
對於manageed operator state，目前僅僅支持list-style的形式，即要求state是serializable objects的List結構，方便在rescale的時候進行redistributed；關於redistribution schemes的模式目前有兩種，分別是Even-split redistribution(在restore/redistribution的時候每個operator僅僅得到整個state的sublist)及Union redistribution(在restore/redistribution的時候每個operator得到整個state的完整list)
FunctionSnapshotContext繼承了ManagedSnapshotContext接口，它定義了getCheckpointId、getCheckpointTimestamp方法；FunctionInitializationContext繼承了ManagedInitializationContext接口，它定義了isRestored、getOperatorStateStore、getKeyedStateStore方法，可以用來判斷是否是在前一次execution的snapshot中restored，以及獲取OperatorStateStore、KeyedStateStore對象

flink--state狀態管理

Operator State Redistribute

Keyed State Redistribute

小結

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

Hive數據傾斜的原因及主要解決方法

redis問題及答案

flink隨筆

flink維表join的幾種方式（1）

spark初始運行環境創建

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結