高并发之存储篇:关注下索引原理和优化吧!躲得过实践,躲不过面试官!

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高并发系列历史文章微信链接文档","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"垂直性能提升","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.1. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/FEbR0JyKfYMY-8IypmXVig","title":null,"type":null},"content":[{"type":"text","text":"架构优化:集群部署,负载均衡","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.2. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/KM686VV1KJgmsRoaTNgJ6w","title":null,"type":null},"content":[{"type":"text","text":"万亿流量下负载均衡的实现","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.3. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/ZkSNv7whVlm-Lxv6EIVctg","title":null,"type":null},"content":[{"type":"text","text":"架构优化:消息中间件的妙用","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.4. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/yawrRw0_RTAb-GBe6Q8wlg","title":null,"type":null},"content":[{"type":"text","text":"存储优化:mysql的索引原理和优化","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4e/4e375e843a31b6f08434a5327e576d21.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#FF7021","name":"orange"}}],"text":"公号后台提供高并发系列文章的离线 PDF 文档整理版的下载,欢迎关注,欢迎讨论","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不管是啥业务,最终数据都要落地,数据库这一环是肯定少不了的。随着业务发展,并发越来越高,数据库很容易成为整个链路的短板。这也是大厂面试中比较常被问到的。而调优的第一步,都是从sql语句、索引入手。先得保证单个数据库执行没问题,才会有更高层次的分库分表、弹性、容灾等等。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part1 为什么Kafka不需要我们关心索引,而Mysql却需要?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 和 MySQL 虽然最终数据都是落磁盘,但是两者在用途和数据查询方式上有着很大的差异,所以决定了数据的存储结构不同,进而决定了索引的复杂程度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们先看下kafka的存储结构:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d4/d4d650a5f94fd02b71287a9c5d25cbf1.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于 kafka 的定位是进行稳定的高性能数据读写。所以对磁盘来说,是采用顺序读写的方式,落在了一些 .log 文件中,并以基准偏移量补0命名。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了实现高速查找 kafka 创建了稀疏索引文件(隔一段数据创建一条,而非全量),即index文件。其中维护消息的 offset 和 .log文件的物理位置。通过二分查找快速定位log文件并顺序扫描找到目标。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,kafka的索引组织方式是相对简单、方案相对固定,但MySQL却不行。Mysql是关系型数据库,是为了支持复杂的业务数据查询而创建的,","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"查询方式、数据获取需求多种多样,要求MySQL具备更加复杂的索引机制来加速复杂业务查询场景","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part2 MySQL数据怎么被组织","attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[1]","attrs":{}}],"attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[2]","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以InnoDB存储引擎来看mysql数据存储:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1f/1ffb04441ffbdb9c272b1c7c29a0b0c8.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"参考了三本资料,基本把最重要的部分都概括了","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"数据被分了多个逻辑层:行->页->区块->段->表空间","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们知道,InnoDB存储引擎表是Index organized的(数据即索引,索引即数据),他们都维护在一个B+树上,数据段就是叶子节点,索引段就是非叶子节点;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而我们划分的段、区块 其实都是为了利用操作系统的资源(比如每次从磁盘加载到内存的数据大小按区块来约定等等`)来达到更高效读写的目的,逻辑划分的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中页是MySQL和磁盘交互的最小单位,怎么从页找到行,怎么聚合到块、到段再到空间呢。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 数据记录最小单位-- 行","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从上面总图中摘出一条记录的结构如下图:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0e10bc3b711fe36db380742b70652d86.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们可以看到,记录头中除了行号,还有下一条记录的标识","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"next_record","attrs":{}},{"type":"text","text":",所以,我们可以通过","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"next_record","attrs":{}},{"type":"text","text":"将记录连接起来,以","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"单向链表","attrs":{}}],"attrs":{}},{"type":"text","text":"的形式,所以这就决定了,当我们在记录链中寻找某记录时,只能顺序遍历,这也决定了一条数据链不会太长。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但一个页默认是16K,加上行溢出等处理,一页最多存放","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"7992","attrs":{}}],"attrs":{}},{"type":"text","text":"行记录,这么多的记录,必须顺序遍历么?当然不需要,让我看看页是怎么组织记录行的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 与磁盘最小交互单位-- 页","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作为与磁盘交互的最小单位,是用来存放实际数据的(页类型是b-tree Node存真实数据,还有其他类型如索引目录页等用来加速查询)从上面的大图中可以大致看到一个页的整体结构:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f2/f25e9f305dbb002c3f0a7642645c639d.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"让我们来看几个关键的字段参数:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Page Directory 决定着记录项在页内的查询效率","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了更快速的查询,页目录存储的本页的数据目录(","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"槽","attrs":{}},{"type":"text","text":"),包含","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"最大最小记录","attrs":{}}],"attrs":{}},{"type":"text","text":"和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分组数据链的最大记录","attrs":{}}],"attrs":{}},{"type":"text","text":"的偏移量。方便使用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"二分法","attrs":{}}],"attrs":{}},{"type":"text","text":"快速查找数据,不需要再从最小值开始遍历,如下图:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cd/cd2e0fffd1df556ddb0983766d6f5289.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图片来自《从根儿上理解 MySQL》","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"File Header决定页和页之间怎样关联","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"记录本页的一些通用信息,主要包含。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过页号来找到本页、通过上下页进行双向链表串联、通过类型判断是索引页还是数据页。。。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/7232a901ee01300099d00384d8bcf954.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图片来自《从根儿上理解 MySQL》","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此字段决定了页和页之间可以很方便的通过上述属性进行关联。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Page Header决定页的层级","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存储的本页的数据信息,主要包含**","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了页面的数据组织概念,那么,怎么利用这些结构来实现的数据快速查询呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part3 索引的演进思路","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从上面的数据组织的知识里可以看到,行记录之间串联成单向链表,在每页中都按分组方式分布在此页的最小记录和最大记录之间。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"页面之间通过上一页、下一页的指针,串联成双向链表,在磁盘中进行存储,如下图:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/620ad96c663a44b40e24d4611b5b7982.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那么,要查询一条记录,可以怎么做?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 原始:顺序方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上图所示的数据串联方式,自然的提供了一种查询方式:即按主键顺序遍历每页和页中的记录行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,这样的查询方式,除了在页内有二分优化,再无效率可言。怎么办?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"寻求改进","attrs":{}},{"type":"text","text":":既然页内的行记录可以分组入槽,那数据页之间为什么不行呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 改进:目录方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们将页向上聚蔟,构建一个页号目录,先在目录中查找,再到对应页中查找,就比顺序查找要快很多了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ff/ff1f4bbaf48d2aaadae0fc83532f3898.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"寻求改进","attrs":{}},{"type":"text","text":":这样的方式所需大量连续空间 + 目录会随数据变动而频繁变动,怎么办?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 演进:主键B+树方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其实,在叙述行记录结构的时候,我们就看到,数据行的结构中,除了实际业务数据外,还有很多额外空间。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"record_type","attrs":{}}],"attrs":{}},{"type":"text","text":"用来表示该记录的类型是数据还是索引。正是这些额外的空间的设计,给InnoDB以更加适合的方式组织索引提供了支持:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e901f9c2023f606d5968baa0dc60d36.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图片来自《从根儿上理解 MySQL》","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这就是一棵B+树,页节点有层级区分,页中的行记录有类型区分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"业务数据都包含在叶子节点中,目录数据都包含在其他非叶节点中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这样组织方式的优势,是允许足够少的层级容纳足够多的数据项(可以简单的假设每一页的数据项大小来预估)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而这个索引方式就是我们常说的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"聚蔟索引","attrs":{}},{"type":"text","text":"。即使用主键值进行记录和页的排序,且叶子节点含有全部用户数据。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"寻求改进","attrs":{}},{"type":"text","text":":如果我想用其他列来查询,怎么办?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6 扩展:二级索引、联合索引","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"二级索引","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如用户需要根据某一列(a列)的值来查询,那就再重新创建一个B+树。此索引树和聚蔟索引树的差别在于,索引节点是以a列的值为目录,且","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"叶子节点只包含a列的值和主键两个值","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果用户需要查询除c列以外的更多信息,则需要拿主键ID再去聚蔟索引查一次,也叫","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"回表","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"联合索引","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二级索引是除主键外的单列索引,而联合索引则是多个列共同排序。假设用户需要用a 、b 两个列进行有序查询,那内在含义是,在a列值相同的情况下,再判断b的值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同二级索引一样,InnoDB也需要再创建一棵B+树,且目录项的排序按先a,后b进行排序串联,叶子节点的数据项只包含 a 、b、主键三个值。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part4 生产实践之触类旁通","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"7 美团定时任务索引优化","attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[3]","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系统需要定时的捞取特定时间段内特定状态、特定类型、特定操作者的任务进行定时处理。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"select * from task \nwhere\n   status=x\n   and operator_id=xxxx \n   and operate_time>xxxxxxxx01\n   and operate_time
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章