前言
好久不見(鞠躬
最近處在轉型期,每天忙到飛起,關注具體技術細節的精力自然就比較少了(上一篇許下的周更承諾也食言了 = =)。上週幫助他人快速解決了一個因誤用Flink狀態類型引發的性能問題,在這裏做個quick notes,並簡要介紹一下Flink狀態序列化方面的基礎知識。
問題及排查
上游部門同事反饋,一個計算邏輯並不複雜的多流join DataStream API作業頻繁發生消費積壓、checkpoint失敗(現場截圖已丟失)。作業拓撲如下圖所示。
按大狀態作業的pattern對集羣參數進行調優,未果。
通過Flink Web UI定位到問題點位於拓撲中倒數第二個算子,部分sub-task checkpoint總是過不去。觀察Metrics面板,發現有少量數據傾斜,而上下游反壓度量值全部爲0。
經過持續觀察,存在傾斜的sub-task數據量最多隻比其他sub-task多出10%~15%,按照常理不應引起如此嚴重的性能問題。遂找到對應的TaskManager pod打印火焰圖,結果如下。
可見RocksDB狀態讀寫的耗時極長,大部分時間花在了Kryo序列化上,說明狀態內存儲了Flink序列化框架原生不支持的對象。直接讓相關研發同學show me the code,真相大白:
private transient MapState<String, HashSet<String>> state1;
private transient MapState<String, HashSet<String>> state2;
private transient ValueState<Map<String, String>> state3;
Flink序列化框架內並沒有針對HashSet的序列化器,自然會fallback到Kryo。即使這些Set並不算大,狀態操作的開銷也會急劇上升。當然,ValueState<Map<String, String>>
用法也是錯誤的,應改成MapState<String, String>
。
最快的臨時解決方法很簡單:把所有狀態內用到的HashSet全部改成Map<String, Boolean>
,同樣可以去重。雖然並不優雅,但因爲有了原生MapSerializer
支持,效率大幅提升。下面簡要介紹Flink的狀態序列化。
TypeSerializer
在我們創建狀態句柄所需的描述符StateDescriptor
時,要指定狀態數據的類型,如:
ValueStateDescriptor<Integer> stateDesc = new ValueStateDescriptor<>("myState", Integer.class);
ValueState<Integer> state = this.getRuntimeContext().getState(stateDesc);
與此同時,也就指定了對應數據類型的Serializer。我們知道,TypeSerializer
是Flink Runtime序列化機制的底層抽象,狀態數據的序列化也不例外。以處理Map類型的MapSerializer
爲例,代碼如下,比較清晰。
@Internal
public final class MapSerializer<K, V> extends TypeSerializer<Map<K, V>> {
private static final long serialVersionUID = -6885593032367050078L;
/** The serializer for the keys in the map */
private final TypeSerializer<K> keySerializer;
/** The serializer for the values in the map */
private final TypeSerializer<V> valueSerializer;
/**
* Creates a map serializer that uses the given serializers to serialize the key-value pairs in
* the map.
*
* @param keySerializer The serializer for the keys in the map
* @param valueSerializer The serializer for the values in the map
*/
public MapSerializer(TypeSerializer<K> keySerializer, TypeSerializer<V> valueSerializer) {
this.keySerializer =
Preconditions.checkNotNull(keySerializer, "The key serializer cannot be null");
this.valueSerializer =
Preconditions.checkNotNull(valueSerializer, "The value serializer cannot be null.");
}
// ------------------------------------------------------------------------
// MapSerializer specific properties
// ------------------------------------------------------------------------
public TypeSerializer<K> getKeySerializer() {
return keySerializer;
}
public TypeSerializer<V> getValueSerializer() {
return valueSerializer;
}
// ------------------------------------------------------------------------
// Type Serializer implementation
// ------------------------------------------------------------------------
@Override
public boolean isImmutableType() {
return false;
}
@Override
public TypeSerializer<Map<K, V>> duplicate() {
TypeSerializer<K> duplicateKeySerializer = keySerializer.duplicate();
TypeSerializer<V> duplicateValueSerializer = valueSerializer.duplicate();
return (duplicateKeySerializer == keySerializer)
&& (duplicateValueSerializer == valueSerializer)
? this
: new MapSerializer<>(duplicateKeySerializer, duplicateValueSerializer);
}
@Override
public Map<K, V> createInstance() {
return new HashMap<>();
}
@Override
public Map<K, V> copy(Map<K, V> from) {
Map<K, V> newMap = new HashMap<>(from.size());
for (Map.Entry<K, V> entry : from.entrySet()) {
K newKey = keySerializer.copy(entry.getKey());
V newValue = entry.getValue() == null ? null : valueSerializer.copy(entry.getValue());
newMap.put(newKey, newValue);
}
return newMap;
}
@Override
public Map<K, V> copy(Map<K, V> from, Map<K, V> reuse) {
return copy(from);
}
@Override
public int getLength() {
return -1; // var length
}
@Override
public void serialize(Map<K, V> map, DataOutputView target) throws IOException {
final int size = map.size();
target.writeInt(size);
for (Map.Entry<K, V> entry : map.entrySet()) {
keySerializer.serialize(entry.getKey(), target);
if (entry.getValue() == null) {
target.writeBoolean(true);
} else {
target.writeBoolean(false);
valueSerializer.serialize(entry.getValue(), target);
}
}
}
@Override
public Map<K, V> deserialize(DataInputView source) throws IOException {
final int size = source.readInt();
final Map<K, V> map = new HashMap<>(size);
for (int i = 0; i < size; ++i) {
K key = keySerializer.deserialize(source);
boolean isNull = source.readBoolean();
V value = isNull ? null : valueSerializer.deserialize(source);
map.put(key, value);
}
return map;
}
@Override
public Map<K, V> deserialize(Map<K, V> reuse, DataInputView source) throws IOException {
return deserialize(source);
}
@Override
public void copy(DataInputView source, DataOutputView target) throws IOException {
final int size = source.readInt();
target.writeInt(size);
for (int i = 0; i < size; ++i) {
keySerializer.copy(source, target);
boolean isNull = source.readBoolean();
target.writeBoolean(isNull);
if (!isNull) {
valueSerializer.copy(source, target);
}
}
}
@Override
public boolean equals(Object obj) {
return obj == this
|| (obj != null
&& obj.getClass() == getClass()
&& keySerializer.equals(((MapSerializer<?, ?>) obj).getKeySerializer())
&& valueSerializer.equals(
((MapSerializer<?, ?>) obj).getValueSerializer()));
}
@Override
public int hashCode() {
return keySerializer.hashCode() * 31 + valueSerializer.hashCode();
}
// --------------------------------------------------------------------------------------------
// Serializer configuration snapshotting
// --------------------------------------------------------------------------------------------
@Override
public TypeSerializerSnapshot<Map<K, V>> snapshotConfiguration() {
return new MapSerializerSnapshot<>(this);
}
}
總結:
- 序列化和反序列化本質上都是對
MemorySegment
的操作,通過DataOutputView
寫出二進制數據,通過DataInputView
讀入二進制數據; - 對於複合數據類型,也應嵌套定義並調用內部元素類型的
TypeSerializer
; - 必須要有對應的
TypeSerializerSnapshot
。該組件定義了TypeSerializer
本身及其所包含的元數據(即state schema)的序列化方式,這些信息會存儲在快照中。可見,通過TypeSerializerSnapshot
可以判斷狀態恢復時數據的兼容性,是Flink實現state schema evolution特性的關鍵所在。
TypeSerializerSnapshot
TypeSerializerSnapshot
接口有以下幾個重要的方法。註釋寫得很清晰,不再廢話了(實際是因爲懶而且累 = =
/**
* Returns the version of the current snapshot's written binary format.
*
* @return the version of the current snapshot's written binary format.
*/
int getCurrentVersion();
/**
* Writes the serializer snapshot to the provided {@link DataOutputView}. The current version of
* the written serializer snapshot's binary format is specified by the {@link
* #getCurrentVersion()} method.
*
* @param out the {@link DataOutputView} to write the snapshot to.
* @throws IOException Thrown if the snapshot data could not be written.
* @see #writeVersionedSnapshot(DataOutputView, TypeSerializerSnapshot)
*/
void writeSnapshot(DataOutputView out) throws IOException;
/**
* Reads the serializer snapshot from the provided {@link DataInputView}. The version of the
* binary format that the serializer snapshot was written with is provided. This version can be
* used to determine how the serializer snapshot should be read.
*
* @param readVersion version of the serializer snapshot's written binary format
* @param in the {@link DataInputView} to read the snapshot from.
* @param userCodeClassLoader the user code classloader
* @throws IOException Thrown if the snapshot data could be read or parsed.
* @see #readVersionedSnapshot(DataInputView, ClassLoader)
*/
void readSnapshot(int readVersion, DataInputView in, ClassLoader userCodeClassLoader)
throws IOException;
/**
* Recreates a serializer instance from this snapshot. The returned serializer can be safely
* used to read data written by the prior serializer (i.e., the serializer that created this
* snapshot).
*
* @return a serializer instance restored from this serializer snapshot.
*/
TypeSerializer<T> restoreSerializer();
/**
* Checks a new serializer's compatibility to read data written by the prior serializer.
*
* <p>When a checkpoint/savepoint is restored, this method checks whether the serialization
* format of the data in the checkpoint/savepoint is compatible for the format of the serializer
* used by the program that restores the checkpoint/savepoint. The outcome can be that the
* serialization format is compatible, that the program's serializer needs to reconfigure itself
* (meaning to incorporate some information from the TypeSerializerSnapshot to be compatible),
* that the format is outright incompatible, or that a migration needed. In the latter case, the
* TypeSerializerSnapshot produces a serializer to deserialize the data, and the restoring
* program's serializer re-serializes the data, thus converting the format during the restore
* operation.
*
* @param newSerializer the new serializer to check.
* @return the serializer compatibility result.
*/
TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(
TypeSerializer<T> newSerializer);
特別注意,在狀態恢復時,state schema的兼容性判斷結果TypeSerializerSchemaCompatibility
有4種:
-
COMPATIBLE_AS_IS
:兼容,可以直接使用新Serializer; -
COMPATIBLE_AFTER_MIGRATION
:兼容,但需要用快照中的舊Serializer反序列化一遍數據,再將數據用新Serializer重新序列化。最常見的場景如狀態POJO中增加或刪除字段,詳情可以參考PojoSerializerSnapshot
類的相關代碼; -
COMPATIBLE_WITH_RECONFIGURED_SERIALIZER
:兼容,但需要將新Serializer重新配置之後再使用。此類場景不太常見,舉例如狀態POJO的類繼承關係發生變化; -
INCOMPATIBLE
:不兼容,無法恢復。例如,更改POJO中的一個簡單類型字段的type(e.g. String → Integer),由於負責處理簡單數據類型的SimpleTypeSerializerSnapshot
不支持此類更改,就會拋出異常:
@Override
public TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(
TypeSerializer<T> newSerializer) {
return newSerializer.getClass() == serializerSupplier.get().getClass()
? TypeSerializerSchemaCompatibility.compatibleAsIs()
: TypeSerializerSchemaCompatibility.incompatible();
}
顯然,對於複合類型(如List、Map),需要先判斷外部容器Serializer的兼容性,再判斷嵌套Serializer的兼容性。詳情可以參考Flink內部專門爲此定義的CompositeTypeSerializerSnapshot
抽象類,該類比較複雜,在此按下不表。
The End
在一些特殊的場景下,我們需要自定義Serializers來實現更好的狀態序列化(例如用RoaringBitmap代替Set在狀態中進行高效的去重),今天時間已經很晚,暫時不給出具體實現了。關於自定義狀態序列化器的更多細節,請看官參見官方文檔<<Custom Serialization for Managed State>>一章。
晚安晚安。