CS144-Lab1-StreamReassembler

原創

2023-01-24 13:20

lab 地址：lab1-doc
代碼實現：lab1-code

1. 目標

TCP 一個很重要的特性是可以實現順序、無差錯、不重複和無報文丟失的流傳輸。在 lab0 中我們已經實現了一個字節流 ByteStream，而在 lab1 我們需要保證傳入 ByteStream 的字節流是有序可靠不重複的，在此之上需要封裝實現一個 StreamReassembler。
爲了確定每次 push 進來的字節流的順序，每個字節流都有一個 index，如果完整字符串爲：
“abcdefg”，然後拆分成 abc，bcd，efg 三個子串，則他們的 index 分別爲 0，1，5，也就是首字符在完整字節流中的位置，index 從 0 開始，參考下圖：

StreamReassembler 需要做到去重，比如 abc 和 bcd 兩個子串重合了 bc 部分，那麼需要合併，最終的字符串爲 abcd。

2. 實現

StreamReassembler 的核心接口只有一個：

//! \brief Receive a substring and write any newly contiguous bytes into the stream.
//!
//! The StreamReassembler will stay within the memory limits of the `capacity`.
//! Bytes that would exceed the capacity are silently discarded.
//!
//! \param data the substring
//! \param index indicates the index (place in sequence) of the first byte in `data`
//! \param eof the last byte of `data` will be the last byte in the entire stream
void push_substring(const std::string &data, const uint64_t index, const bool eof);

由於字節流可能爲亂序 push，因此需要緩存還不能 push 到 ByteStream 的 string
大致流程如下：

檢查是否爲預期的字節流（維護一個當前預期接受的 _assemble_idx ，只有傳入的 string 的 index > _assemble_idx 時，才爲預期的字節流）
如果爲預期字節流，直接壓入 ByteStream，更新 _assemble_idx，合併緩存中滿足合併條件的 string
如果不是預期的字節流，則緩存起來，並且檢查能否和緩存的字符串合併

code 如下（還能再簡潔很多），完整 code 參考（lab1-code）：

void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
	if (eof) {
		_eof_idx = data.size() + index;
	}
	
	// not expect segement, cache it
	if (index > _assemble_idx) {
		_merge_segment(index, data);
		return;
	}
	
	// expect segment, write it to ByteStream
	int start_pos = _assemble_idx - index;
	int write_cnt = data.size() - start_pos;
	// not enough space
	if (write_cnt < 0) {
		return;
	}
	
	_assemble_idx += _output.write(data.substr(start_pos, write_cnt));
	// search the next segment
	std::vector<size_t> pop_list;
	for (auto segment : _segments) {
	// already process or empty string
		if (segment.first + segment.second.size() <= _assemble_idx || segment.second.size() == 0) {
			pop_list.push_back(segment.first);
			continue;
		}
		// not yet
		if (_assemble_idx < segment.first) {
			continue;
		}
		start_pos = _assemble_idx - segment.first;
		write_cnt = segment.second.size() - start_pos;
		_assemble_idx += _output.write(segment.second.substr(start_pos, write_cnt));
		pop_list.push_back(segment.first);
	}
	// remove the useless segment
	for (auto segment_id : pop_list) {
		_segments.erase(segment_id);
	}
	
	if (empty() && _assemble_idx == _eof_idx) {
		_output.end_input();
	}
}

正常情況下符合預期的字節流我們可以直接 push 進 ByteStream，如果有重疊部分則從重疊部分後面開始 push，理論上只有這兩種情況（ 符合預期 這個前提剔除掉了麻煩的情況）：

這裏麻煩點主要在於，對於不符合預期的字符串，我們要緩存起來，並且合併緩存中的字符串，這裏主要梳理好有幾種情況即可，主要有如下情況：
緩存字符串在目標字符串左側的

緩存字符串在目標字符串右側的

code 如下：

void StreamReassembler::_merge_segment(size_t index, const std::string& data) {
	size_t data_left = index;
	size_t data_right = index + data.size();
	std::string data_copy = data;
	std::vector<size_t> remove_list;
	bool should_cache = true;
	
	for (auto segment : _segments)
	{
		size_t seg_left = segment.first;
		size_t seg_right = segment.first + segment.second.size();
		//|new_index |segment.first |segment.second.size() |merge_segment.size
		if (data_left <= seg_left && data_right >= seg_left) {
			if (data_right >= seg_right) {
				remove_list.push_back(segment.first);
				continue;
			}
		
			if (data_right < seg_right) {
				data_copy = data_copy.substr(0, seg_left - data_left) + segment.second;
				data_right = data_left + data_copy.size();
				remove_list.push_back(segment.first);
			}
		}
		
		if (data_left > seg_left && data_left <= seg_right) {
			if (data_right <= seg_right) {
				should_cache = false;
			}
	
			if (data_right > seg_right) {
				data_copy = segment.second.substr(0, data_left - seg_left) + data_copy;
				data_left = seg_left;
				data_right = data_left + data_copy.size();
				remove_list.push_back(segment.first);
			}
		}
	}
	
	// remove overlap data
	for (auto remove_idx : remove_list) {
		_segments.erase(remove_idx);
	}
	
	if (should_cache)
		_segments[data_left] = data_copy;
	}
}

3. 測試

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CS144-Lab1-StreamReassembler

1. 目標

2. 實現

3. 測試

《日本蠟燭圖》讀書筆記 & 技術分析回測

《期貨-市場技術分析》讀書筆記

Python多線程編程深度探索：從入門到實戰

mongodb處理json數據很好

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

Games101現代計算機圖形學入門 - 作業1~8 集合含提高項總結

UnrealEngine - 動畫入門

UE4 內存寫壞導致異常崩潰問題記錄

UnrealEngine - 網絡同步之連接篇

UnrealEngine - 網絡同步入門

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結