一，nfs-ganesha版本2.3.3 ，2.4.5cache分析

https://github.com/zanglinjie/nfs-ganesha點擊打開鏈接

mdcache在2.4.0之後放在了FSAL層，對應的目錄爲src\FSAL\Stackable_FSALs\FSAL_MDCACHE

2.4版本ganesha緩存配置塊

struct config_block mdcache_param_blk = {
	.dbus_interface_name = "org.ganesha.nfsd.config.cache_inode",
	.blk_desc.name = "CacheInode",
	.blk_desc.type = CONFIG_BLOCK,
	.blk_desc.u.blk.init = mdcache_param_init,
	.blk_desc.u.blk.params = mdcache_params,
	.blk_desc.u.blk.commit = noop_conf_commit
};

ganesha框架圖採用模塊化的設計不易實現，但易於維護

二，cache_inode_lookup 與mdcache_lookup

2.1cache_inode_lookup

2.2mdcahce_lookup

函數定義：

fsal_status_t mdc_lookup(mdcache_entry_t *mdc_parent, const char *name,
			 bool uncached, mdcache_entry_t **new_entry,
			 struct attrlist *attrs_out)

流程圖：

mdcache_entry_t結構體！！！如下：

typedef struct mdcache_fsal_obj_handle mdcache_entry_t;

mdcache_entry_t結構體是一個對象（文件/目錄等）cache的實體，mdc_parent是父目錄存在於cache中的cache實例。

mdcache_entry->fsobj.fsdir.parent保存了父目錄的信息，如果是查找父目錄的話，直接調用mdcache_locate_host將mdcache_entry->fsobj.fsdir.parent轉化爲父目錄的mdcache_entry_t條目返回即可。

接下來調用mdc_try_get_cached，通過mdc_parent和name查找cache，如果不存在的話，我們調用mdc_lookup_uncached。

struct mdcache_fsal_obj_handle {
	/** Reader-writer lock for attributes */
	pthread_rwlock_t attr_lock;
	/** MDCache FSAL Handle */
	struct fsal_obj_handle obj_handle;
	/** Sub-FSAL handle */
	struct fsal_obj_handle *sub_handle;
	/** Cached attributes */
	struct attrlist attrs;
	/** FH hash linkage */
	struct {
		struct avltree_node node_k;	/*< AVL node in tree */
		mdcache_key_t key;	/*< Key of this entry */
		bool inavl;
	} fh_hk;
	/** Flags for this entry */
	uint32_t mde_flags;
	/** Time at which we last refreshed attributes. */
	time_t attr_time;
	/** Time at which we last refreshed acl. */
	time_t acl_time;
	/** New style LRU link */
	mdcache_lru_t lru;
	/** Exports per entry (protected by attr_lock) */
	struct glist_head export_list;
	/** ID of the first mapped export for fast path
	 *  This is an int32_t because we need it to be -1 to indicate
	 *  no mapped export.
	 */
	int32_t first_export_id;
	/** Lock on type-specific cached content.  See locking
	    discipline for details. */
	pthread_rwlock_t content_lock;
	/** Filetype specific data, discriminated by the type field.
	    Note that data for special files is in
	    attributes.rawdev */
	union mdcache_fsobj {
		struct state_hdl hdl;
		struct {
			/** List of chunks in this directory, not ordered */
			struct glist_head chunks;
			/** List of detached directory entries. */
			struct glist_head detached;
			/** Spin lock to protect the detached list. */
			pthread_spinlock_t spin;
			/** Count of detached directory entries. */
			int detached_count;
			/** @todo FSF
			 *
			 * This is somewhat fragile, however, a reorganization
			 * is possible. If state_lock was to be moved into
			 * state_file and state_dir, and the state code was
			 * made clear which it was working with, dhdl could
			 * be replaced with a state_dir which would be
			 * smaller than state_file, and then the additional
			 * members of fsdir would basically overlay
			 * the larger state_file that hdl is.
			 *
			 * Such a reorg could save memory AND make for a
			 * crisper interface.
			 */
			struct state_hdl dhdl; /**< Storage for dir state */
			/** The parent host-handle of this directory ('..') */
			struct gsh_buffdesc parent;
			/** The first dirent cookie in this directory.
			 *  0 if not known.
			 */
			fsal_cookie_t first_ck;
			struct {
				/** Children by name hash */
				struct avltree t;
				/** Table of dirents by FSAL cookie */
				struct avltree ck;
				/** Table of dirents in sorted order. */
				struct avltree sorted;
				/** Heuristic. Expect 0. */
				uint32_t collisions;
			} avl;
		} fsdir;		/**< DIRECTORY data */
	} fsobj;
}

2.2.1 mdc_lookup_uncached函數，瞭解元數據怎麼加入cache中之後，就會理解怎麼查找。

在調用了FSAL文件系統（CEPH）的lookup之後，進行cache的創建操作，這裏主要看mdcache_alloc_and_check_handle流程：

看流程比較清晰，做三個事情：

1、新建條目。

2、增加到父目錄的avl樹中。

下面分別將這三件事情做了什麼詳細說明：

2.2.2新建條目

這個是最複雜的事情，首先還是照常先看流程圖：

這裏着重討論新建部分，主要是針對目錄的avl樹的初始化，以及加入到全局的avl樹中。

對於目錄來說，會初始化它子目錄的avl樹，以緩存所有的子目錄和文件。下面是mdcache_entry->fsobj.fsdir.avl的結構的定義。

struct {

/*目錄下條目的avl樹Children by name hash */

struct avltree t;

/*刪除條目的avl樹刪除*/

struct avltree c;

/** FSAL的cookie的構成的avl樹 */

struct avltree ck;

/**排序的avl樹需要支持fso_compute_readdir_cookie，暫時不分析*/

struct avltree sorted;

/** 衝突標記0，暫時不知道搞什麼飛機的東西. */

uint32_t collisions;

} avl;

主要是兩棵樹，t 和 ck，t保存以文件名的hash值作爲比較值的avl樹，可以用來查找某個文件，而ck則構建的以子文件或目錄在文件夾中的offset爲比較值的avl樹，主要用來列舉目錄的所有或部分條目。

主要對這幾個樹進行初始化。

然後加入到全局的avl樹（key的hash值作爲avl樹的比較值）中。

struct cih_lookup_table結構體保存的全局的cache，默認有7個分區（配置文件中Nparts設置），每個分區存在一個avl樹，而且每個分區有cache字段，直接cache了32633個cache條目。全局cache查找策略是，通過key的hash直接查找，如果在cache沒有查到，纔到avl樹中找，找到了之後替換掉cache。

struct cih_lookup_table {
	GSH_CACHE_PAD(0);
	cih_partition_t *partition;
	uint32_t npart;
	uint32_t cache_sz;
};

/* Support inline lookups */
extern struct cih_lookup_table cih_fhcache;

增加到父目錄的avl樹中

主要做到是事情是新建一個mdcache_dir_entry_t條目，將其加入到的avl樹(t, ck)中。

這裏在上面提過，如果查找到的是目錄，需要將父目錄的key賦值給目錄cache條目的fsobj.fsdir.parent字段，方便lookup..的查找。

mdc_try_get_cached存在cache的流程

主要是在parent的avl樹t中查找，如果查到了，在全局的avl樹中確認存在，即返回。

cache每個avl樹的作用的簡單總結

全局avl樹的作用

通過key快速查詢mdcache_entry_t條目信息。

目錄avl樹中的t

查找子文件或者目錄時使用。

目錄avl樹中的ck

主要是readdir使用

三，cache_inode_readdir與mdcache_readdir

mdcache_readdir

在2.5.0之後的版本中，加入了readdir chunk，可以不完全將目錄存入cache中，默認每個chunk存128個條目（可以通過Dir_Chunk設置），整個系統chunk的水線爲10000（可以通過Chunks_HWMark設置）。chunk也有自己的lru列表，如果超過chunk的數目，就會被踢掉。

if (test_mde_flags(directory, MDCACHE_BYPASS_DIRCACHE)) {

/* Not caching dirents; pass through directly to FSAL */

return mdcache_readdir_uncached(directory, whence, dir_state,

cb, attrmask, eod_met);

}

if (mdcache_param.dir.avl_chunk > 0) {

/* Dirent chunking is enabled. */

LogDebugAlt(COMPONENT_NFS_READDIR, COMPONENT_CACHE_INODE,

"Calling mdcache_readdir_chunked whence=%"PRIx64,

whence ? *whence : (uint64_t) 0);

return mdcache_readdir_chunked(directory,

whence ? *whence : (uint64_t) 0,

dir_state, cb, attrmask,

eod_met);

}

MDCACHE_BYPASS_DIRCACHE標記目前只有在沒有開啓chunk的情況下，如果目錄過大，會打上此標記，不會被緩存，開啓了chunk，此標記永遠失效。除了這兩行之外，後面的操作是在沒有開啓chunk的情況下的流程，暫時不做分析。所以mdcache_readdir其實是執行了mdcache_readdir_chunked函數。

mdcache_readdir_chunked的流程如下：

還是和lookup一樣，我們先看沒有找到的流程。

3.1 mdcache_populate_dir_chunk

流程如下：

做了3件事情：

1、mdcache_get_chunk。

2、調用本地的readdir操作。

3、本地的readdir調用回調函數mdc_readdir_chunk_object。

3.1.1 mdcache_get_chunk

這個函數是獲取一個可用的chunk，如果全局的chunk數達到了閾值，就從lru中淘汰出一個。

3.1.2 調用本地的readdir操作

本地讀取數據

3.1.3 本地每讀取一個數據，調用一次mdc_readdir_chunk_object回調函數，我們照常看看mdc_readdir_chunk_object的流程

其實可以發現，很多事情和lookup類似的。與lookup不同的是，lookup如果沒有找到，會將chunk置無效（compute_readdir_cookie沒有實現的情況下，如果實現了，會加入chunk中），而在這裏，chunk的結構體中，存在一個dirents參數，這裏保存這個chunk的所有文件的鏈表。

3.2 mdcache_avl_lookup_ck

現在我們回頭看看直接在cache中找的函數，其實就是遍歷ck的過程。在mdcache_entry_t結構體中，存在一個first_ck的字段，作爲一個目錄ck的初始值，每個chunk中存這next_ck的字段，就可以進行ck的遍歷操作。

3.3 chunk->dirents的loop操作

針對每個ck查找到的chunk，對chunk->dirents 的loop操作，對每個cache條目，調用上層的回調函數，已完成readdir的操作。

nfs-ganesha cache代碼分析，轉mdcache readdir

二，cache_inode_lookup 與mdcache_lookup

2.1cache_inode_lookup

2.2mdcahce_lookup

2.2.1 mdc_lookup_uncached函數，瞭解元數據怎麼加入cache中之後，就會理解怎麼查找。

2.2.2新建條目

2.2.3 針對目錄保存父目錄的key

三，cache_inode_readdir與mdcache_readdir

SEGY數據分析，打印輸出4字節32bit位每一個bit的值，數據一致性分析，物探類分析

打印vdbench數據，分析數據一致性問題

vdbench數據一致性校驗原理 ===&gt; Data Validation Key miscompare. Expecting key .

ceph集羣的恢復流程架構思維導圖

多進程寫ceph-fuse單文件性能瓶頸

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結