Ceph 进阶系列（三）：谈谈 Ceph Cache Tier(Cache Pool) 的配置、原理和源码分析

从GitHub上Clone Ceph项目，我是基于(ceph version 12.2.11 luminous 版本)的代码来分析的

一、Cache Tier（Cache Pool）是什么？

在Ceph里创建pool时，可以设置一个pool为另一个pool的 cache 层，做缓存层的pool 称为 cache pool(也就是cache tier)。而真正存数据的pool就是我们常用的data pool（代码里叫base pool）。使用如下命令来创建Cache Tier:

ceph osd tier add {data_pool} {cache_pool}

该命令行程序发送请求给Monitor，然后由Monitor相关的pool设置上述属性值，并由Monitor来持久化存储该pool信息。注：一个data pool可以有多个cache tier(cache pool).

另外，如何在指定的OSD上创建 Ceph Pool，请参考Ceph 进阶系列（二）：如何在指定的OSD设备上创建 pool

cache tier相关的命令（属于monitor command）:

命令	描述
ceph osd tier add <data_pool> <cache_pool> {--force-nonempty}	add the tier <tierpool> (the second one) to base pool <pool> (the first one)
ceph osd tier add-cache <data_pool> <cache_pool> <int[0-]>	add a cache <tierpool> (the second one) of size <size> to existing pool <pool> (the first one)
ceph osd tier cache-mode <cache_pool> none\|writeback\|forward\|readonly\|readforward\| proxy\|readproxy {--yes-i-really-mean-it}	specify the caching mode for cache tier <pool>
ceph osd tier remove <data_pool> <cache_pool>	remove the tier <tierpool> (the second one) from base pool <pool> (the first one)
ceph osd tier remove-overlay <data_pool>	remove the overlay pool for base pool <pool>
ceph osd tier rm <data_pool> <cache_pool>	remove the tier <tierpool> (the second one) from base pool <pool> (the first one)
ceph osd tier rm-overlay <data_pool>	remove the overlay pool for base pool <pool>
ceph osd tier set-overlay <data_pool> <overlaypool>	set the overlay pool for base pool <pool> to be <overlaypool>

二、为什么要有Cache Tier？

Cache Tier技术目标在于：把用户访问频率高的热数据放置在高性能、小容量的存储介质中（比如NVME SSD），把大量冷数据放置在大容量的存储介质中（比如HDD）。Cache Tier为用户提供的价值在于：提高热数据访问性能的同时，降低存储成本。实现这样的方法是：Cache Tier可以让冷数据自由安全地迁移到更低层的存储介质中（data pool），这样达到节约存储成本；让热点数据自动的从低层（data pool）迁移到高层存储层（cache tier），这样达到提高访问热点数据的性能。

三、Cache Tier的技术实现

数据访问行为的追踪、统计与分析：持续追踪与统计每个数据块的存取频率，并通过定期分析，识别出存取频率高的“热”区块，与存取频率低的“冷”区块。
数据迁移：以存取频率为基础，定期执行数据迁移，将热点数据块迁移到高速存储层，把较不活跃的冷数据块迁移到低速存储层。数据迁移一对象（默认为4MB）为基本单位。

四、Cache Tier在Ceph架构里的位置

read proxy模式

五、Cache Tier的关键代码分析

Cache Tier的代码分布在Ceph源代码的各个模块，其核心在对象的数据读写路径上。

1. 其相关的数据结构：

pool 的数据结构pg_pool_t，它有两个变量对应相应的cache pool 和 base pool 配置

//src/osd/osd_types.h

/*
 * pg_pool
 */
struct pg_pool_t {
...
set<uint64_t> tiers;      ///< pools that are tiers of usint64_t tier_of;         ///< pool for which we are a tier
...
}

这两个字段用来设置pool的属性：
·如果当前pool是一个cache pool，那么tier_of记录了该cache pool的base pool层。
·如果当前pool是base pool，那么tiers就记录该base pool的cache pool层，一个base pool可以设置多个cache pool层。

2. Cache Tier的初始化

Cache Tier初始化有两个入口，如下所示：
·on_active：如果该pool已经设置为cache pool，在该cache pool的所有PG处于activave状态后初始化。
·on_pool_change：当该pool的所有PG都已经处于active状态后，才设置该pool为cache pool，那么就等待Monitor通知osd map相关信息的变化，在on_pool_change函数里初始化。

3. 读写路径上的Cache Tier处理

在OSD的正常读写路径上，如果该pool有Cache Tier设置，处理逻辑就发生了变化。如下所示：

// hot/cold tracking
  HitSetRef hit_set;        ///< currently accumulating HitSet

/** do_op - do an op
 * pg lock will be held (if multithreaded)
 * osd_lock NOT held.
 */
void PrimaryLogPG::do_op(OpRequestRef& op)
{
......
bool in_hit_set = false;
  if (hit_set) {
    if (obc.get()) {
      if (obc->obs.oi.soid != hobject_t() && hit_set->contains(obc->obs.oi.soid))
	in_hit_set = true;
    } else {
      if (missing_oid != hobject_t() && hit_set->contains(missing_oid))
        in_hit_set = true;
    }
    if (!op->hitset_inserted) {
      hit_set->insert(oid);
      op->hitset_inserted = true;
      if (hit_set->is_full() ||
          hit_set_start_stamp + pool.info.hit_set_period <= m->get_recv_stamp()) {
        hit_set_persist();
      }
    }
  }

  if (agent_state) {
    if (agent_choose_mode(false, op))
      return;
  }

  if (obc.get() && obc->obs.exists && obc->obs.oi.has_manifest()) {
    if (maybe_handle_manifest(op,
			       write_ordered,
			       obc))
    return;
  }

  if (maybe_handle_cache(op,
			 write_ordered,
			 obc,
			 r,
			 missing_oid,
			 false,
			 in_hit_set))
    return;
......

参考：《Ceph源码分析》

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Ceph 进阶系列（三）：谈谈 Ceph Cache Tier(Cache Pool) 的配置、原理和源码分析

从GitHub上Clone Ceph项目，我是基于(ceph version 12.2.11 luminous 版本)的代码来分析的

一、Cache Tier（Cache Pool）是什么？

二、为什么要有Cache Tier？

三、Cache Tier的技术实现

四、Cache Tier在Ceph架构里的位置

五、Cache Tier的关键代码分析

参考：《Ceph源码分析》

前端使用 Konva 实现可视化设计器（13）- 折线 - 最优路径应用【思路篇】

製作USB Ubuntu 安裝盤及安裝Ubuntu系統

Ceph 進階系列（二）：如何讓某個 pool使用特定的OSD設備（1 of 2，手動版,早於luminous版本）

Ceph 進階系列（二）：如何讓某個 pool使用特定的OSD設備（2 of 2，luminous新特性）

SPECjbb 牽手 jdk 系列（一）：什麼是SPECjbb ？

Ceph 擼源碼系列（二）：Ceph源代碼裏的那些鎖 std::mutex（2 of 3）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Ceph 进阶系列（三）：谈谈 Ceph Cache Tier(Cache Pool) 的配置 、原理 和 源码分析

从GitHub上Clone Ceph项目，我是基于(ceph version 12.2.11 luminous 版本)的代码来分析的

一、Cache Tier（Cache Pool）是什么？

二、为什么要有Cache Tier？

三、Cache Tier的技术实现

四、Cache Tier在Ceph架构里的位置

五、Cache Tier的关键代码分析

参考：《Ceph源码分析》

Ceph 进阶系列（三）：谈谈 Ceph Cache Tier(Cache Pool) 的配置、原理和源码分析