ZIP 也能邊下載邊解壓?優酷流式解壓技術揭祕

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"導讀:"},{"type":"text","marks":[{"type":"italic"}],"text":"對於一個 ZIP 文件,由於標準的解壓方式總是從讀取文件的末尾開始的,因此必須下載完整個 ZIP 解壓後才能訪問。當用戶通過網絡訪問 ZIP 文件時,下載解壓所帶來的耗時將大大降低用戶體驗。那麼能不能邊下載邊解壓呢?阿里巴巴文娛技術 喻遠將介紹 ZIP 流式解壓的原理和技術實現路徑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打開網絡上的 ZIP 文件需要幾步?下載,解壓,拿到所有文件。面對一個 ZIP,能不能「邊下邊播」、「按需下載」?今年 6 月,優酷繪本技術團隊開發出新的解壓方式——ZIP 流式解壓技術,併成功應用在優酷繪本秒開項目中,30M+ 繪本平均加載時長只需 0.91s,加載耗時比傳統的解壓方式降低了 88.3%,讓用戶的閱讀體驗直線提升。實際對比效果如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fb3addcad41a2596281cdfb298d15d1.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"優化前"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a8/a8fa6cf53d350982ed5e882eb3487332.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"優化後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將介紹 ZIP 流式解壓的原理和技術實現路徑,希望爲大家帶來啓發,將 ZIP 流式解壓技術更多的應用到業務中。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什麼是ZIP ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 是一種文件格式,定義瞭如何將多個文件、數據塊組織在一起形成一個完整的文件。例如我們常見的 .apk,.ipa,.sketch,都是ZIP文件。通常程序是這樣創建 ZIP 文件的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"壓縮單個文件形成單文件數據塊;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據塊前後添加文件描述信息;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對每個待壓縮的文件重複以上步驟後,拼接所有數據形成更大的數據塊;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提取所有文件描述信息,生成一份「文件目錄」,附在最後一個數據塊的尾部。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將文件前部描述信息稱爲 Local File Header,文件後部描述信息稱爲 Data Descriptor, 被壓縮的文件本身稱爲 File Data,將最後的文件目錄稱爲 Central Directory。以上所有合在一起,就是一個標準的 ZIP 文件。如下圖:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5a/5a4007424278b19661fb4f84a0399960.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"一個標準的解壓方式總是從讀取 ZIP 文件末尾開始的,我們以解壓上圖的 File Data 1 爲例:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5f/5f14fe177cca8112f06c2ce7e28f343b.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先在 ZIP 文件末尾找到 Central Directory 數據塊;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Central Directory 數據塊中找到 File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 File Header 1 中讀取 Local File Header 1 的偏移量和 File Data 1 的相關信息;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據偏移量找到 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解密 File Data 1(如果需要);"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解壓 File Data 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取 Data Descriptor 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 File Header 1 中保存的 CRC-32 做校驗步驟 7 中計算的 CRC-32,以確保解壓後的數據完整性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"標準解壓方式存在的不足"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以發現,標準的解壓強依賴尾部的 Central Directory。當 ZIP 文件存儲在 cdn 上時,哪怕我們只想訪問其中的一個文件,也必須下載整個 ZIP 解壓後纔可訪問。假如 ZIP 文件有 100 MB,但是我們只需要訪問其中的某一個 10 KB 的文件,那麼下載整個 ZIP 將是對流量的巨大浪費。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、優酷技術方案:ZIP流式解壓"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的一個初步的想法是能不能邊下載邊解壓?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要實現這點,首先需要改變解壓方式,使其不能再依賴尾部的 Central Directory。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據 ZIP 文件格式標準可知,除了 Central Directory,每個 File Data 頭部的 Loca File Header 部分也包含了該文件的相關信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如 Local File Header 中包含了充分的信息,我們也許可以基於 Local File Header 去解壓文件數據,其解壓流程就可以變爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/987674c8a55a4713310334bd55d73497.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從頭開始,搜索到 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解密 File Data 1(如果需要);"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解壓 File Data 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取 Data Descriptor 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CRC32 的校驗。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼 Local File Header 裏到底存儲了什麼?是否滿足解密解壓所需?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"瞭解 Local File Header"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"我們根據文檔對 Local File Header 的描述,畫出其二進制文件中的排列:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fd/fda58f694b24ebde36e85ca9f0655fc4.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的關鍵信息爲:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95e3482c1bdb07a42924fba5d06dd1af.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據簽名是一個 Magic Number,用來標記接下來數據是什麼內容。例如 Local File Header 的簽名是 0x04034b50,用 char 表示也就是 { 'P', 'K', '3', '4' }。當讀取到對應數據簽名時,則意味着接下來的數據結構符合對應元數據的定義,需要使用對應規則解析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Compress Method 指明數據塊用何種算法壓縮,解壓需要使用對應的算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Compressed Size 和 UnCompressed Size可以幫助確定文件的結尾地址和 Data Descriptor 的偏移量。這兩個 Size 也是文件解密時 HMAC 計算的關鍵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了 Magic Number 作爲元數據簽名,我們只需要逐字節遍歷去匹配這個 Number,就可以找到 Loca File Header,而不再需要依賴尾部的定位信息。而且 Local File Header 中存儲的元數據足夠我們決定解壓算法、計算大小、校驗 CRC-32 了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一個問題是,解壓縮算法是否支持流式解壓縮?是否有特定的上下文依賴?通過了解壓縮算法的原理[1],我們知道,所有的壓縮算法都是支持從頭部開始流式解壓的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而下載方面,文件是以從頭到尾連續的方式下載,這又天然地和和從頭解壓的方式配合,便可以初步實現邊下邊解!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"加密 ZIP 文件的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一切都相當順利,直到遇到了加密後的 ZIP 文件。加密後的 ZIP 文件的 Local File Header 中的關鍵信息除了簽名和文件名以外,其他信息都被隱去,需要去 Central Directory 中讀取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再一次,我們回到了依賴 Central Directory 的狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在失去如此多關鍵信息的情況下能否繼續做到流式解壓?我們需要先挖掘一下 ZIP 的加密方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ZIP 的加密方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 文件支持多種加密方式,最常見的是 Traditional PKWARE Encryption 和 AES Encryption 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Traditional PKWARE Encryption 是 ZIP 自定義的一種基於密碼的對稱加密方式,每個字節的加密僅和密碼有關,加密前後的數據長度不變。這種不依賴上下文的加密方式可以實現我們需要的流式解密。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AES 加密採用的是 CTR 模式。CTR 模式將明文分組,並生成一個計數器。使用密鑰對計數器進行加密生成二進制字節流。利用這個字節流和明文進行 XOR 操作進行加密。其解密方式也是一樣的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"這種方式也支持流式解密。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/77/7710be5b8c9403ceb7ef25bdf1abbe4c.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩種常用的加密方式都支持流式解密,那麼加解密需要的關鍵信息,在 Local File Header 中是否有存儲就成了能否流式解密的關鍵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流式解密的關鍵信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論是 Traditional PKWARE Encryption 還是 AES Encryption,在解密時都需要一些除密碼之外的關鍵信息,例如鹽值,加密算法的強度等。此外,在 AES 加密的 ZIP 文件中, Local File Header 中的 Compress Method 字段被抹去,這樣我們便無法知曉壓縮算法,因此無法解壓。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,問題集中爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Local File Header 中是否有足夠的加密所需信息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"加密的 ZIP 文件,是否能在除 Central Directory 以外的位置找到 Compress Method 字段。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Local File Header 中加密相關的信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 格式的設計者在設計 ZIP 文件格式的初期就提供了文件拓展能力,一些額外的拓展數據可以存放在 Local File Header 的 Extra Field 中。ZIP AES 加密說明書[2]告訴我們 AES 的相關信息就存放在這裏。其關鍵信息如下:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b4/b43e65e6b535e95db91d8f52124a9604.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原來壓縮算法被藏到了 Extra Data 中。那麼鹽值被存放在哪裏了?答案是存放在 File Data 的頭尾。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/0614c7073ddaa85aea3c9dfbfedaead7.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"綜上,我們找到"},{"type":"text","text":"解密所"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"需的所有關鍵信息,整個流式解密解壓的所有技術點都被我們探索完。剩下的便是按原理實現,以及細節的打磨。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"說了那麼多,流式解壓究竟有什麼價值呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"由於流式解壓實現了邊下載邊解壓,將整個操作的時長從下載 + 解壓縮變成了約等於純下載的時長,直接抹掉了解壓的耗時。在 39.1 MB 大小的 ZIP 包下載解壓測試中,耗時從 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"9.08 秒"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"降低至 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"4.17 秒"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":",有將近 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"100% "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"的提速!同時,你可以"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"不必等待整個 ZIP 下載解壓完"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":",而是在解壓完一小部分數據的時候,就直接展示 UI。用戶側看起來就好像一瞬間就解壓完了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"因此,流式解壓可以應用在許多時間敏感的操作裏,也可以用來優化基於 ZIP 文件的相關業務。例如基於 ZIP 的全局換膚加速、基於 ZIP 的 Web 資源緩存加載的加速等等。前言中的優酷繪本秒開就是基於這一技術實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1]https://houbb.github.io/2018/11/09/althgorim-compress-althgorim-12-zip-02"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2]AES Encryption Information: Encryption Specification AE-1 and AE-2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://www.winzip.com/win/en/aes_info.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[3]ZIP File Format Specification"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.1.TXT"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[4]AES Coding Tips for Developers"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://www.winzip.com/win/en/aes_tips.html"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章