ZIP 也能边下载边解压?优酷流式解压技术揭秘

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"导读:"},{"type":"text","marks":[{"type":"italic"}],"text":"对于一个 ZIP 文件,由于标准的解压方式总是从读取文件的末尾开始的,因此必须下载完整个 ZIP 解压后才能访问。当用户通过网络访问 ZIP 文件时,下载解压所带来的耗时将大大降低用户体验。那么能不能边下载边解压呢?阿里巴巴文娱技术 喻远将介绍 ZIP 流式解压的原理和技术实现路径。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打开网络上的 ZIP 文件需要几步?下载,解压,拿到所有文件。面对一个 ZIP,能不能「边下边播」、「按需下载」?今年 6 月,优酷绘本技术团队开发出新的解压方式——ZIP 流式解压技术,并成功应用在优酷绘本秒开项目中,30M+ 绘本平均加载时长只需 0.91s,加载耗时比传统的解压方式降低了 88.3%,让用户的阅读体验直线提升。实际对比效果如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3f/3fb3addcad41a2596281cdfb298d15d1.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"优化前"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a8/a8fa6cf53d350982ed5e882eb3487332.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"优化后"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文将介绍 ZIP 流式解压的原理和技术实现路径,希望为大家带来启发,将 ZIP 流式解压技术更多的应用到业务中。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什么是ZIP ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 是一种文件格式,定义了如何将多个文件、数据块组织在一起形成一个完整的文件。例如我们常见的 .apk,.ipa,.sketch,都是ZIP文件。通常程序是这样创建 ZIP 文件的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"压缩单个文件形成单文件数据块;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在数据块前后添加文件描述信息;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对每个待压缩的文件重复以上步骤后,拼接所有数据形成更大的数据块;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提取所有文件描述信息,生成一份「文件目录」,附在最后一个数据块的尾部。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们将文件前部描述信息称为 Local File Header,文件后部描述信息称为 Data Descriptor, 被压缩的文件本身称为 File Data,将最后的文件目录称为 Central Directory。以上所有合在一起,就是一个标准的 ZIP 文件。如下图:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5a/5a4007424278b19661fb4f84a0399960.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"一个标准的解压方式总是从读取 ZIP 文件末尾开始的,我们以解压上图的 File Data 1 为例:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5f/5f14fe177cca8112f06c2ce7e28f343b.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先在 ZIP 文件末尾找到 Central Directory 数据块;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Central Directory 数据块中找到 File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从 File Header 1 中读取 Local File Header 1 的偏移量和 File Data 1 的相关信息;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据偏移量找到 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"读取 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解密 File Data 1(如果需要);"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解压 File Data 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"读取 Data Descriptor 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 File Header 1 中保存的 CRC-32 做校验步骤 7 中计算的 CRC-32,以确保解压后的数据完整性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"标准解压方式存在的不足"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以发现,标准的解压强依赖尾部的 Central Directory。当 ZIP 文件存储在 cdn 上时,哪怕我们只想访问其中的一个文件,也必须下载整个 ZIP 解压后才可访问。假如 ZIP 文件有 100 MB,但是我们只需要访问其中的某一个 10 KB 的文件,那么下载整个 ZIP 将是对流量的巨大浪费。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、优酷技术方案:ZIP流式解压"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们的一个初步的想法是能不能边下载边解压?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要实现这点,首先需要改变解压方式,使其不能再依赖尾部的 Central Directory。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据 ZIP 文件格式标准可知,除了 Central Directory,每个 File Data 头部的 Loca File Header 部分也包含了该文件的相关信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如 Local File Header 中包含了充分的信息,我们也许可以基于 Local File Header 去解压文件数据,其解压流程就可以变为:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/987674c8a55a4713310334bd55d73497.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从头开始,搜索到 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"读取 Local File Header 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解密 File Data 1(如果需要);"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解压 File Data 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"读取 Data Descriptor 1;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CRC32 的校验。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那么 Local File Header 里到底存储了什么?是否满足解密解压所需?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"了解 Local File Header"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"我们根据文档对 Local File Header 的描述,画出其二进制文件中的排列:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fd/fda58f694b24ebde36e85ca9f0655fc4.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的关键信息为:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95e3482c1bdb07a42924fba5d06dd1af.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元数据签名是一个 Magic Number,用来标记接下来数据是什么内容。例如 Local File Header 的签名是 0x04034b50,用 char 表示也就是 { 'P', 'K', '3', '4' }。当读取到对应数据签名时,则意味着接下来的数据结构符合对应元数据的定义,需要使用对应规则解析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Compress Method 指明数据块用何种算法压缩,解压需要使用对应的算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Compressed Size 和 UnCompressed Size可以帮助确定文件的结尾地址和 Data Descriptor 的偏移量。这两个 Size 也是文件解密时 HMAC 计算的关键。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了 Magic Number 作为元数据签名,我们只需要逐字节遍历去匹配这个 Number,就可以找到 Loca File Header,而不再需要依赖尾部的定位信息。而且 Local File Header 中存储的元数据足够我们决定解压算法、计算大小、校验 CRC-32 了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"还有一个问题是,解压缩算法是否支持流式解压缩?是否有特定的上下文依赖?通过了解压缩算法的原理[1],我们知道,所有的压缩算法都是支持从头部开始流式解压的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而下载方面,文件是以从头到尾连续的方式下载,这又天然地和和从头解压的方式配合,便可以初步实现边下边解!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"加密 ZIP 文件的问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一切都相当顺利,直到遇到了加密后的 ZIP 文件。加密后的 ZIP 文件的 Local File Header 中的关键信息除了签名和文件名以外,其他信息都被隐去,需要去 Central Directory 中读取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再一次,我们回到了依赖 Central Directory 的状态。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在失去如此多关键信息的情况下能否继续做到流式解压?我们需要先挖掘一下 ZIP 的加密方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ZIP 的加密方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 文件支持多种加密方式,最常见的是 Traditional PKWARE Encryption 和 AES Encryption 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Traditional PKWARE Encryption 是 ZIP 自定义的一种基于密码的对称加密方式,每个字节的加密仅和密码有关,加密前后的数据长度不变。这种不依赖上下文的加密方式可以实现我们需要的流式解密。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AES 加密采用的是 CTR 模式。CTR 模式将明文分组,并生成一个计数器。使用密钥对计数器进行加密生成二进制字节流。利用这个字节流和明文进行 XOR 操作进行加密。其解密方式也是一样的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"这种方式也支持流式解密。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/77/7710be5b8c9403ceb7ef25bdf1abbe4c.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"两种常用的加密方式都支持流式解密,那么加解密需要的关键信息,在 Local File Header 中是否有存储就成了能否流式解密的关键。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流式解密的关键信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"无论是 Traditional PKWARE Encryption 还是 AES Encryption,在解密时都需要一些除密码之外的关键信息,例如盐值,加密算法的强度等。此外,在 AES 加密的 ZIP 文件中, Local File Header 中的 Compress Method 字段被抹去,这样我们便无法知晓压缩算法,因此无法解压。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,问题集中为:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Local File Header 中是否有足够的加密所需信息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"加密的 ZIP 文件,是否能在除 Central Directory 以外的位置找到 Compress Method 字段。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Local File Header 中加密相关的信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ZIP 格式的设计者在设计 ZIP 文件格式的初期就提供了文件拓展能力,一些额外的拓展数据可以存放在 Local File Header 的 Extra Field 中。ZIP AES 加密说明书[2]告诉我们 AES 的相关信息就存放在这里。其关键信息如下:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b4/b43e65e6b535e95db91d8f52124a9604.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原来压缩算法被藏到了 Extra Data 中。那么盐值被存放在哪里了?答案是存放在 File Data 的头尾。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/0614c7073ddaa85aea3c9dfbfedaead7.jpeg","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"综上,我们找到"},{"type":"text","text":"解密所"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"需的所有关键信息,整个流式解密解压的所有技术点都被我们探索完。剩下的便是按原理实现,以及细节的打磨。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"说了那么多,流式解压究竟有什么价值呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"由于流式解压实现了边下载边解压,将整个操作的时长从下载 + 解压缩变成了约等于纯下载的时长,直接抹掉了解压的耗时。在 39.1 MB 大小的 ZIP 包下载解压测试中,耗时从 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"9.08 秒"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"降低至 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"4.17 秒"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":",有将近 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"100% "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"的提速!同时,你可以"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}},{"type":"strong"}],"text":"不必等待整个 ZIP 下载解压完"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":",而是在解压完一小部分数据的时候,就直接展示 UI。用户侧看起来就好像一瞬间就解压完了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"因此,流式解压可以应用在许多时间敏感的操作里,也可以用来优化基于 ZIP 文件的相关业务。例如基于 ZIP 的全局换肤加速、基于 ZIP 的 Web 资源缓存加载的加速等等。前言中的优酷绘本秒开就是基于这一技术实现。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"参考"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1]https://houbb.github.io/2018/11/09/althgorim-compress-althgorim-12-zip-02"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2]AES Encryption Information: Encryption Specification AE-1 and AE-2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://www.winzip.com/win/en/aes_info.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[3]ZIP File Format Specification"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.1.TXT"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[4]AES Coding Tips for Developers"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https://www.winzip.com/win/en/aes_tips.html"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章