linux驅動移植-Nand Flash ONFI標準和MTD子系統【轉】

轉自：https://www.cnblogs.com/zyly/p/16756273.html#_label0

一、ONFI標準

Nand Flash是嵌入式世界裏常見的存儲器，對於嵌入式開發而言，Nand Flash主要分爲兩大類：Serial Nand、Raw Nand，這兩類Nand的差異是很大的。

Raw Nand是相對於Serial Nand而言的，Serial Nand即串行接口的Nand Flash，比如採用SPI通信協議的Nand Flash，而Raw Nand是並行接口的Nand Flash。

這裏我們首先介紹ONFI協議，主要是因爲在Nand Flash驅動源碼分析的時候涉及到ONFI協議。而我們使用的K9F2G08U0C這款芯片並沒有支持ONFI協議，我們將該芯片支持的命令和ONFI 1.0規定的命令對比就可以發現。

1.1 ONFI標準

說到Raw Nand發展史，其實早期的Raw Nand沒有統一標準，雖然早在1989年Toshiba便發表了Nand Flash結構，但具體到Raw Nand芯片，各廠商都是自由設計，因此尺寸不統一、存儲結構差異大、接口命令不通用等問題導致客戶使用起來很難受。

爲了改變這一現狀，2006年幾個主流的Raw Nand廠商（Hynix、Intel、Micron、Phison、Sony、ST）聯合起來商量制訂一個Raw Nand標準，這個標準叫Open Nand Flash Interface，簡稱ONFI，2006年12月ONFI 1.0標準正式推出，此後幾乎所有的Raw Nand廠商都按照ONFI標準設計生產Raw Nand，從此不管哪家生產的Raw Nand對嵌入式設計者來說幾乎都是一樣的，至少在驅動代碼層面是一樣的。

ONFI官網：http://www.onfi.org/，在這裏我們下載到ONFI協議規範：

1.2 Raw Nand分類

1.2.1 單元層數

Nand Flash內存單元按照層數可以分爲：

單層單元（Single Level Cell，簡稱SLC）：這種類型的閃存在讀寫數據時具有最爲精確，並且還具有持續最長的數據讀寫壽命的優點。SLC擦寫壽命約在9萬到10萬次之間。這種類型的閃存由於其使用壽命，準確性和綜合性能，在企業市場上十分受衆。但由於儲存成本高、存儲容量相對較小，在家用市場則不太受青睞。
多層單元（Multi Level Cell，簡稱MLC）：它的命名來源於它在SLC的1位/單元的基礎上，變成了2位/單元。這樣做的一大優勢在於大大降低了大容量儲存閃存的成本，約3000--10000次擦寫壽命。
三層單元（Triple Level Cell，簡稱TLC）：TLC閃存是閃存生產中最低廉的規格，其儲存達到了3位/單元，雖然高儲存密度實現了較廉價的大容量格式，但其讀寫的生命週期被極大地縮短，擦寫壽命只有短短的500~1000次，同時讀寫速度較差，只適合普通消費者使用，不能達到工業使用的標準。
四層單元（Quad Lebel Cell，簡稱QLC）：QLC每個單元可儲存4bit數據，跟TLC相比，QLC的儲存密度提高了33%。QLC不僅能經受1000次編程或擦寫循環（與TLC相當，甚至更好），而且容量提升了，成本也更低。

結論：SLC>MLC>TLC。

目前大多數U盤都是採用TLC芯片顆粒，其優點是價格便宜，不過速度一般，壽命相對較短。

而SSD固態硬盤中，目前MLC顆粒固態硬盤是主流，其價格適中，速度與壽命相對較好，而低價SSD固態硬盤普遍採用的是TLC芯片顆粒，大家在購買固態硬盤的時候，可以在產品參數中去了解。

SLC顆粒固態目前主要在一些高端固態硬盤中出現，售價多數上千元，甚至更貴。

智能手機方面，目前多數智能手機存儲也是採用TLC芯片存儲，而蘋果iPhone6部分產品採用的TLC芯片，另外還有部分採用的是MLC芯片顆粒。總的來說，MLC閃存芯片顆粒是時下主流，產品在速度、壽命以及價格上適中，比較適合推薦。

1.2.2 數據線寬度

數據線寬度可以分爲x8 、x16。

1.2.3 數據採集模式

數據採集模式可以分爲 SDR、DDR。

1.2.4 接口命令標準

接口命令標準可以分爲：非標、ONFI。

1.3 Raw Nand內存模型

ONFI規定了Raw Nand內存單元從大到小最多分爲：Device、LUN(Die、Target)、Plane、Block、Page、Cell。

Device：就是指單片Nand Flash，對外提供Package封裝的芯片，1個Device包含1個或者多個LUN；
LUN(Die、Target)：是接收和執行Flash命令的基本單元，1個LUN包含1個或者多個plane。
Plane：1個Plane包含多個Block。
Block：能夠執行擦除操作的最小單元，通常由多個Page組成。
Page：能夠執行編程和讀操作的最小單元，通常大小爲2KB等。
Cell：Page中的最小操作擦寫讀單元，對應一個浮柵晶體管，可以存儲1bit或多bit。

其中Page和Block是必有的，因爲Page是讀寫的最小單元，Block是擦除的最小單元。而LUN和Plane則不是必有的（如沒有，可認爲LUN=1, Plane=1），一般在大容量Raw Nand（至少8Gb以上）上纔會出現。

常見的Nand Flash內部只有一個chip（LUN）、每個chip只有1個plane，而有些複雜得，容量更大的Nand Flash，內部有多個chip，每個chip有多個plane。這類的Nand Flash，其實就是多了一個主控將多塊Flash疊加在一起，如下圖：

注：對於chip的概念，我理解就是上面的LUN，其實任何某個型號的Nand Flash，都可以稱其是一個chip，但是實際上，這裏我們所提到的，是針對內部來說的，也就是某型號的Nand Flash，內部有幾個chip，比如：

三星的2GB的K9WAG08U1A芯片（可以理解爲外部芯片/型號）內部裝了2個單片是1GB的K9K8G08U0A，此時就稱K9WAG08U1A內部有2個chip；
而有些單個的chip，內部又包含多個plane，比如上面的K9K8G08U0A內部包含4個單片是2Gb的Plane；

1.4 Raw Nand信號與封裝

ONFI規定了Raw Nand信號線與封裝，如下是典型的x8 Raw Nand內部結構圖：

除了內存單元外，還有兩大組成，分別是IO控制單元和邏輯控制單元，信號線主要掛在IO控制與邏輯單元，x8 Raw Nand主要有15根信號線（其中必須的是13根，

引腳名稱	描述
CLE	命令使能，當CLE爲高電平時，
ALE	地址使能，當ALE爲高電平時，
	片選信號，低電位有效
	讀使能，低電位有效

	寫保護
	就緒/忙輸出信號（低電平表示操作還在進行中，高電平表示操作完成）
VCC	電源
VSS	地
NC	不接
I/O0 ~ I/O7	數據輸入輸出（命令、地址、數據公用數據總線）

ONFI規定的封裝標準有很多，比如TSOP48、LGA52、BGA63/100/132/152/272/316，其中對於嵌入式開發而言，最常用的是如下圖扁平封裝的TSOP-48，這種封裝常用於容量較小的Raw Nand（1/2/4/8/16/32Gb），1-32Gb容量對於嵌入式設計而言差不多夠用，且TSOP-48封裝易於PCB設計，因此得以流行。

1.5 Raw Nand接口命令

ONFI 1.0規定了Raw Nand接口命令，如下表所示，其中一部分是必須要支持的（M），還有一部分是可選支持的（O）。必須支持的命令裏最常用的是Read(Read Page)、Page Program、Block Erase、Read Status這三條，涵蓋讀寫擦最基本的三種操作。

此外比較重要的還有：

Read Status，用於獲取命令執行狀態與結果。
Read Parameter Page：用於獲取芯片內部存儲的出廠信息（包括內存結構、特性、時序、其他行爲參數等），其結構已由ONFI規定如下表，在設計Nand軟件驅動時，可以通過獲取這個Parameter Page來做到代碼通用。

回到頂部

二、MTD設備驅動

MTD（Memory Technology Drivers）是用於訪問memory設備（ ROM 、 Flash）的Linux 的子系統， MTD 的主要目的是爲了使新的memory設備的驅動更加簡單，爲此它在硬件和上層之間提供了一個抽象的接口。

2.1 MTD子系統概要

在介紹MTD之前，我們思考一個問題，linux內核爲什麼抽象出了MTD子系統呢？

我們回顧一下我們上一節塊設備驅動編寫的流程：

調用register_blkdev註冊塊設備主設備號；
使用alloc_disk申請一個通用磁盤對象gendisk；
使用blk_mq_init_sq_queue初始化一個請求隊列；
- 設置成員參數major、first_minor、disk_name、fops；
- 設置請求隊列queue，等於之前初始化的請求隊列；
設置gendisk結構體的成員；
使用add_disk註冊gendisk；

針對於每一種型號的Flash設備，我們進行塊設備驅動編寫的時候，都要重複進行如上的操作。那我們就開始想了，各種型號的Flash設備有什麼區別呢？以Nand Flash爲例，主要就是內存模型（頁大小、塊大小、頁數/塊、OOB等）、以及時序參數略有差別，那我們是否可以將與Nand Flash緊密相關的部分抽離出來，由Nand Flash驅動層提供，而其他相同部分單獨抽離出來。MTD子系統就是做了這樣的事情。

2.2 MTD子系統框架

如上圖所示，MTD程序框架通用可以分爲四層，從上到下以此爲設備節點、MTD設備層、MTD原始設備層，Flash驅動層。

設備節點：通過mknod在/dev子目錄下建立MTD塊設備節點（主設備號爲31）和MTD字符設備節點（主設備號爲90），通過訪問此設備節點即可訪問MTD字符設備和塊設備。
MTD設備層：基於MTD原始設備，linux系統可以定義出MTD的塊設備（主設備號31）和字符設備（設備號90）。其中：
- mtdchar.c：MTD字符設備接口相關實現；
- mtdblock.c：MTD塊設備接口相關實現；這部分負責設備的建立、數據的讀寫、優化處理等。這跟傳統的塊設備驅動類型，塊設備主設備號的申請，gendisk結構體的分配設置、隊列的初始化等，這些都是由內核自動完成。
MTD原始設備層：用於描述MTD原始設備的數據結構是mtd_info，它定義了大量的關於MTD的數據和操作函數。其中：
- mtdcore.c: MTD原始設備接口相關實現；
- mtdpart.c : MTD分區接口相關實現；
Flash驅動層：Flash驅動層負責對Flash硬件的讀、寫和擦除操作，Nand Flash和Nor Flash有不同的協議和硬件細節，這部分知道發什麼，如發送什麼命令可以識別、讀寫、擦除等操作，以及硬件該怎麼發。Nand Flash有Nand的協議，Nor Flash有Nor的協議，不同協議有不同的函數，通過對應的結構體和函數構造對應的操作環境。用戶只需要完成Flash驅動層的相關結構體的分配、設置、註冊，並建立從具體設備到MTD原始設備映射關係。

Nand Flash芯片的驅動位於drivers/mtd/nand/子目錄下，Nand Flash使用nand_chip結構體；
Nor Flash芯片驅動位於drivers/mtd/chips/子目錄下，Nor Flash使用map_info結構體；

2.2.1 Flash驅動層

(1) Nor Flash驅動

linux內核實現了針對CFI、JEDEC等接口標準的通用Nor Flash驅動。在上述接口驅動基礎上，芯片級驅動較簡單：定義具體內存映射結構體map_info，然後通過接口類型後調用do_map_probe。

以scb2_flash.c（位於drivers/mtd/maps/）爲例：

定義map_info結構體，初始化成員name、size、phys、bankwidth；
通過ioremap映射成員virt（虛擬內存地址）；
通過函數simple_map_init初始化map_info成員函數read、write、copy_from、copy_to；
通過do_map_probe進行CFI接口探測，返回mtd_info結構體；
通過parse_mtd_partitions、add_mtd_partitions註冊MTD原始設備；

(2) Nand Flash驅動

linux內核實現了通用Nand Flash驅動（drivers/mtd/nand/raw/nand_base.c），芯片級驅動需要實現nand_chip結構。

MTD使用nand_chip來表示一個Nand Flash芯片, 該結構體包含了關於Nand Flash的內存模型信息，讀寫方法，ECC模式，硬件控制等一系列底層機制。

以s3c2410.c（位於drivers/mtd/nand/raw）爲例：

分配nand_chip內存;
根據SOC Nand控制器初始化nand_chip成員，比如：chip->legacy(成員write_buf、read_buf、select_chip、cmd_ctrl、dev_ready、IO_ADDR_R、IO_ADDR_W)、chip->controller;
設置chip->priv爲mtd_info；
以mtd_info爲參數調用nand_scan()探測Nand Flash，nand_scan()會讀取nand芯片ID:
- 初始化chip->base.mtd（成員writesize、oobsize、erasesize等）;
- 初始化chip->base.memorg（成員bits_per_cell、pagesize、oobsize、pages_per_eraseblock、planes_per_lun、luns_per_target、ntatgets等）；
- 初始化chip->options、chip->base.eccreq；
- 初始化chip->ecc各個成員（設置ecc模式及處理函數）；
- chip成員中所有未初始化函數指針則使用nand_base.c中的默認函數；
mtd_info和mtd_partition爲參數調用mtd_device_register()進行MTD設備註冊;

2.3 核心結構體

2.3.1 struct mtd_info

linux內核使用mtd_info結構體表示MTD原始設備，描述一個設備或一個多分區設備中的一個分區，這其中定義了大量關於MTD的數據和操作函數；所有mtd_info結構體都被存放在mtd_info數組mtd_table中。

mtd_info定義在include/linux/mtd/mtd.h：

struct mtd_info {
        u_char type;     // MTD設備類型  包括MTD_NORFALSH、MTD_NANDFALSH等
        uint32_t flags;  // 標誌  MTD_WRITEABLE、MTD_NO_ERASE等
        uint32_t orig_flags; /* Flags as before running mtd checks */
        uint64_t size;   // Total size of the MTD  MTD設備總容量

        /* "Major" erase size for the device. Naïve users may take this
         * to be the only erase size available, or may use the more detailed
         * information below if they desire
         */
        uint32_t erasesize;   // MTD設備擦除單位大小，對於Nand Flash來說就是Block的大小
        /* Minimal writable flash unit size. In case of NOR flash it is 1 (even
         * though individual bits can be cleared), in case of NAND flash it is
         * one NAND page (or half, or one-fourths of it), in case of ECC-ed NOR
         * it is of ECC block size, etc. It is illegal to have writesize = 0.
         * Any driver registering a struct mtd_info must ensure a writesize of
         * 1 or larger.
         */
        uint32_t writesize;  // 可寫入數據最小字節數，對於Nor Flash是字節，對於Nand Flash爲一頁

        /*
         * Size of the write buffer used by the MTD. MTD devices having a write
         * buffer can write multiple writesize chunks at a time. E.g. while
         * writing 4 * writesize bytes to a device with 2 * writesize bytes
         * buffer the MTD driver can (but doesn't have to) do 2 writesize
         * operations, but not 4. Currently, all NANDs have writebufsize
         * equivalent to writesize (NAND page size). Some NOR flashes do have
         * writebufsize greater than writesize.
        uint32_t writebufsize;

        uint32_t oobsize;   // Amount of OOB data per block (e.g. 16)
        uint32_t oobavail;  // Available OOB bytes per block

        /*
         * If erasesize is a power of 2 then the shift is stored in
         * erasesize_shift otherwise erasesize_shift is zero. Ditto writesize.
         */
        unsigned int erasesize_shift;   // 擦除數據偏移值，根據erasesize計算
        unsigned int writesize_shift;    // 寫入數據偏移值，根據writesize計算
        /* Masks based on erasesize_shift and writesize_shift */
        unsigned int erasesize_mask;     // 擦除數據大小掩碼，根據erasesize_shift計算
        unsigned int writesize_mask;     // 寫入數據大小掩碼，根據writesize_shift計算

        /*
         * read ops return -EUCLEAN if max number of bitflips corrected on any
         * one region comprising an ecc step equals or exceeds this value.
         * Settable by driver, else defaults to ecc_strength.  User can override
         * in sysfs.  N.B. The meaning of the -EUCLEAN return code has changed;
         * see Documentation/ABI/testing/sysfs-class-mtd for more detail.
         */
        unsigned int bitflip_threshold;

        /* Kernel-only stuff starts here. */
        const char *name;  // MTD設備名稱
        int index;         // 索引值  

        /* OOB layout description */
        const struct mtd_ooblayout_ops *ooblayout;  // oob佈局描述

        /* NAND pairing scheme, only provided for MLC/TLC NANDs */
        const struct mtd_pairing_scheme *pairing;

        /* the ecc step size. */
        unsigned int ecc_step_size;

        /* max number of correctible bit errors per ecc step */
        unsigned int ecc_strength;

        /* Data for variable erase regions. If numeraseregions is zero,
         * it means that the whole device has erasesize as given above.
         */
        int numeraseregions;  // 可變擦除區域的數目，通常爲1
        struct mtd_erase_region_info *eraseregions;  // 可變擦除區域
        /*
         * Do not call via these pointers, use corresponding mtd_*()
         * wrappers instead.
         */
        int (*_erase) (struct mtd_info *mtd, struct erase_info *instr);  // 擦除
        int (*_point) (struct mtd_info *mtd, loff_t from, size_t len,
                       size_t *retlen, void **virt, resource_size_t *phys);
        int (*_unpoint) (struct mtd_info *mtd, loff_t from, size_t len);
        int (*_read) (struct mtd_info *mtd, loff_t from, size_t len,  // 讀取
                      size_t *retlen, u_char *buf);
        int (*_write) (struct mtd_info *mtd, loff_t to, size_t len,    // 寫入
                       size_t *retlen, const u_char *buf);
        int (*_panic_write) (struct mtd_info *mtd, loff_t to, size_t len,
                             size_t *retlen, const u_char *buf);
        int (*_read_oob) (struct mtd_info *mtd, loff_t from,
                          struct mtd_oob_ops *ops);
        int (*_write_oob) (struct mtd_info *mtd, loff_t to,
                           struct mtd_oob_ops *ops);
        int (*_get_fact_prot_info) (struct mtd_info *mtd, size_t len,
                                    size_t *retlen, struct otp_info *buf);
        int (*_read_fact_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len, size_t *retlen, u_char *buf);
        int (*_get_user_prot_info) (struct mtd_info *mtd, size_t len,
                                    size_t *retlen, struct otp_info *buf);
        int (*_read_user_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len, size_t *retlen, u_char *buf);
        int (*_write_user_prot_reg) (struct mtd_info *mtd, loff_t to,
                                     size_t len, size_t *retlen, u_char *buf);
        int (*_lock_user_prot_reg) (struct mtd_info *mtd, loff_t from,
                                    size_t len);
        int (*_writev) (struct mtd_info *mtd, const struct kvec *vecs,
                        unsigned long count, loff_t to, size_t *retlen);
        void (*_sync) (struct mtd_info *mtd);
        int (*_lock) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_unlock) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_is_locked) (struct mtd_info *mtd, loff_t ofs, uint64_t len);
        int (*_block_isreserved) (struct mtd_info *mtd, loff_t ofs);
        int (*_block_isbad) (struct mtd_info *mtd, loff_t ofs);  
        int (*_block_markbad) (struct mtd_info *mtd, loff_t ofs);
        int (*_max_bad_blocks) (struct mtd_info *mtd, loff_t ofs, size_t len);
        int (*_suspend) (struct mtd_info *mtd);
        void (*_resume) (struct mtd_info *mtd);
        void (*_reboot) (struct mtd_info *mtd);
        /*
         * If the driver is something smart, like UBI, it may need to maintain
         * its own reference counting. The below functions are only for driver.
         */
        int (*_get_device) (struct mtd_info *mtd);
        void (*_put_device) (struct mtd_info *mtd);

        struct notifier_block reboot_notifier;  /* default mode before reboot */

        /* ECC status information */
        struct mtd_ecc_stats ecc_stats;
        /* Subpage shift (NAND) */
        int subpage_sft;

        void *priv;

        struct module *owner;
        struct device dev;
        int usecount;
        struct mtd_debug_info dbg;
        struct nvmem_device *nvmem;
};

mtd_info結構體中的read()、write()、read_oob()、write_oob()、erase()是MTD設備驅動要實現的主要函數，這是MTD原始設備與Flash驅動層之間的接口；linux已經已經幫我們實現了一套適合大部分Flash設備的mtd_info成員函數。

2.3.2 mtd_part

在MTD中使用mtd_part來表示分區，其中包含了mtd_info，每一個分區都是被看做一個MTD原始設備，在mtd_table中，mtd_part.mtd_info中的大部分數據都從該分區的主分區mtd_part->master中獲得。master不作爲一個MTD原始設備加入mtd_table中。

mtd_part定義在drivers/mtd/mtdpart.c：

/**
 * struct mtd_part - our partition node structure
 *
 * @mtd: struct holding partition details
 * @parent: parent mtd - flash device or another partition
 * @offset: partition offset relative to the *flash device*
 */
struct mtd_part {
        struct mtd_info mtd;     // 分區信息
        struct mtd_info *parent; // 分區的主分區
        uint64_t offset;         // 分區的偏移地址
        struct list_head list;   // 雙向鏈表，將mtd_part鏈接成一個鏈表
};

2.3.3 struct mtd_partition

在MTD中用mtd_partition來表示分區的信息，mtd_partition定義在include/linux/mtd/partitions.h：

/*
 * Partition definition structure:
 *
 * An array of struct partition is passed along with a MTD object to
 * mtd_device_register() to create them.
 *
 * For each partition, these fields are available:
 * name: string that will be used to label the partition's MTD device.
 * types: some partitions can be containers using specific format to describe
 *      embedded subpartitions / volumes. E.g. many home routers use "firmware"
 *      partition that contains at least kernel and rootfs. In such case an
 *      extra parser is needed that will detect these dynamic partitions and
 *      report them to the MTD subsystem. If set this property stores an array
 *      of parser names to use when looking for subpartitions.
 * size: the partition size; if defined as MTDPART_SIZ_FULL, the partition
 *      will extend to the end of the master MTD device.
 * offset: absolute starting position within the master MTD device; if
 *      defined as MTDPART_OFS_APPEND, the partition will start where the
 *      previous one ended; if MTDPART_OFS_NXTBLK, at the next erase block;
 *      if MTDPART_OFS_RETAIN, consume as much as possible, leaving size
 *      after the end of partition.
 * mask_flags: contains flags that have to be masked (removed) from the
 *      master MTD flag set for the corresponding MTD partition.
 *      For example, to force a read-only partition, simply adding
 *      MTD_WRITEABLE to the mask_flags will do the trick.
 *
 * Note: writeable partitions require their size and offset be
 * erasesize aligned (e.g. use MTDPART_OFS_NEXTBLK).
 */

struct mtd_partition {
        const char *name;               /* identifier string  分區名 */
        const char *const *types;       /* names of parsers to use if any */
        uint64_t size;                  /* partition size  分區大小 */
        uint64_t offset;                /* offset within the master MTD space  分區的偏移值  */
        uint32_t mask_flags;            /* master MTD flags to mask out for this partition 標誌掩碼 */
        struct device_node *of_node; 
};

2.3.4 struct nand_chip

nand_chip是一個比較重要的數據結構，MTD使用nand_chip來表示一個Nand Flash內部的芯片，該結構體包含了關於Nand Flash的內存模型信息，讀寫方法，ECC模式，硬件控制等一系列底層機制。其定義在include/linux/mtd/rawnand.h：

/**
 * struct nand_chip - NAND Private Flash Chip Data
 * @base:               Inherit from the generic NAND device
 * @legacy:             All legacy fields/hooks. If you develop a new driver,
 *                      don't even try to use any of these fields/hooks, and if
 *                      you're modifying an existing driver that is using those
 *                      fields/hooks, you should consider reworking the driver
 *                      avoid using them.
 * @setup_read_retry:   [FLASHSPECIFIC] flash (vendor) specific function for
 *                      setting the read-retry mode. Mostly needed for MLC NAND.
 * @ecc:                [BOARDSPECIFIC] ECC control structure
 * @buf_align:          minimum buffer alignment required by a platform
 * @oob_poi:            "poison value buffer," used for laying out OOB data
 *                      before writing
 * @page_shift:         [INTERN] number of address bits in a page (column
 *                      address bits).
 * @phys_erase_shift:   [INTERN] number of address bits in a physical eraseblock
 * @bbt_erase_shift:    [INTERN] number of address bits in a bbt entry
 * @chip_shift:         [INTERN] number of address bits in one chip
 * @options:            [BOARDSPECIFIC] various chip options. They can partly
 *                      be set to inform nand_scan about special functionality.
 *                      See the defines for further explanation.
 * @bbt_options:        [INTERN] bad block specific options. All options used
 *                      here must come from bbm.h. By default, these options
 *                      will be copied to the appropriate nand_bbt_descr's.
 * @badblockpos:        [INTERN] position of the bad block marker in the oob
 *                      area.
 * @badblockbits:       [INTERN] minimum number of set bits in a good block's
 *                      bad block marker position; i.e., BBM == 11110111b is
 *                      not bad when badblockbits == 7
 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is
 *                            set to the actually used ONFI mode if the chip is
 *                            ONFI compliant or deduced from the datasheet if
 *                            the NAND chip is not ONFI compliant.
 * @pagemask:           [INTERN] page number mask = number of (pages / chip) - 1
 * @data_buf:           [INTERN] buffer for data, size is (page size + oobsize).
 * @pagecache:          Structure containing page cache related fields
 * @pagecache.bitflips: Number of bitflips of the cached page
 * @pagecache.page:     Page number currently in the cache. -1 means no page is
 *                      currently cached
 * @subpagesize:        [INTERN] holds the subpagesize
 * @id:                 [INTERN] holds NAND ID
 * @parameters:         [INTERN] holds generic parameters under an easily
 *                      readable form.
 * @data_interface:     [INTERN] NAND interface timing information
 * @cur_cs:             currently selected target. -1 means no target selected,
 *                      otherwise we should always have cur_cs >= 0 &&
 *                      cur_cs < nanddev_ntargets(). NAND Controller drivers
 *                      should not modify this value, but they're allowed to
 *                      read it.
 * @read_retries:       [INTERN] the number of read retry modes supported
 * @lock:               lock protecting the suspended field. Also used to
 *                      serialize accesses to the NAND device.
 * @suspended:          set to 1 when the device is suspended, 0 when it's not.
 * @bbt:                [INTERN] bad block table pointer
 * @bbt_td:             [REPLACEABLE] bad block table descriptor for flash
 *                      lookup.
 * @bbt_md:             [REPLACEABLE] bad block table mirror descriptor
 * @badblock_pattern:   [REPLACEABLE] bad block scan pattern used for initial
 *                      bad block scan.
 * @controller:         [REPLACEABLE] a pointer to a hardware controller
 *                      structure which is shared among multiple independent
 *                      devices.
 * @priv:               [OPTIONAL] pointer to private chip data
 * @manufacturer:       [INTERN] Contains manufacturer information
 * @manufacturer.desc:  [INTERN] Contains manufacturer's description
 * @manufacturer.priv:  [INTERN] Contains manufacturer private information
 */
struct nand_chip {
        struct nand_device base;    // 可以看作mtd_info子類

        struct nand_legacy legacy;  // 硬件操作函數

        int (*setup_read_retry)(struct nand_chip *chip, int retry_mode);

        unsigned int options;  // 與具體的nand芯片相關的一些選項，如NAND_BUSWIDTH_16等
        unsigned int bbt_options;

        int page_shift;       // 用來表示nand芯片的page大小，如某nand芯片的一個page有512個字節，那麼該值就是9
        int phys_erase_shift; // 用來表示nand芯片每次可擦除的大小，如某nand芯片每次可擦除16kb（通常爲一個block大小），那麼該值就是14
        int bbt_erase_shift;  // 用來表示bad block table的大小，通常bbt佔用一個block，所以該值通常和phys_erase_shift相同
        int chip_shift;       // 使用位表示nand芯片的容量
        int pagemask;         // nand總容量/每頁字節數 - 1    得到頁掩碼
        u8 *data_buf;

        struct {
                unsigned int bitflips;
                int page;
        } pagecache;

        int subpagesize;
        int onfi_timing_mode_default;
        unsigned int badblockpos;
        int badblockbits;

        struct nand_id id;  // 保存從nand讀取到的設備id信息，包含廠家ID、設備ID等
        struct nand_parameters parameters;

        struct nand_data_interface data_interface;

        int cur_cs;       // 當前選中的目標

        int read_retries;

        struct mutex lock;
        unsigned int suspended : 1;

        uint8_t *oob_poi;
        struct nand_controller *controller; // nand controller

        struct nand_ecc_ctrl ecc; // ecc校驗結構體，裏面有大量函數進行ecc校驗
        unsigned long buf_align;

        uint8_t *bbt;
        struct nand_bbt_descr *bbt_td;
        struct nand_bbt_descr *bbt_md;

        struct nand_bbt_descr *badblock_pattern;

        void *priv;

        struct {
                const struct nand_manufacturer *desc;
                void *priv;
        } manufacturer;   // 廠家ID信息
};

nand_chip中的ecc主要做一些與ecc有關的操作，如read_page_raw、write_pager_raw，裏面含有大量函數進行ecc校驗。

nand_chip中的legacy中讀寫函數，如read_buf、cmdfunc等，與具體的Nand Controller相關，這部分函數與硬件交互，通常需要我們自己根據SOC Nand Controller來實現。

2.3.5 struct nand_legacy

nand_legacy該結構體就是保存與SOC Nand Controller硬件相關的函數：

/**
 * struct nand_legacy - NAND chip legacy fields/hooks
 * @IO_ADDR_R: address to read the 8 I/O lines of the flash device
 * @IO_ADDR_W: address to write the 8 I/O lines of the flash device
 * @select_chip: select/deselect a specific target/die
 * @read_byte: read one byte from the chip
 * @write_byte: write a single byte to the chip on the low 8 I/O lines
 * @write_buf: write data from the buffer to the chip
 * @read_buf: read data from the chip into the buffer
 * @cmd_ctrl: hardware specific function for controlling ALE/CLE/nCE. Also used
 *            to write command and address
 * @cmdfunc: hardware specific function for writing commands to the chip.
 * @dev_ready: hardware specific function for accessing device ready/busy line.
 *             If set to NULL no access to ready/busy is available and the
 *             ready/busy information is read from the chip status register.
 * @waitfunc: hardware specific function for wait on ready.
 * @block_bad: check if a block is bad, using OOB markers
 * @block_markbad: mark a block bad
 * @set_features: set the NAND chip features
 * @get_features: get the NAND chip features
 * @chip_delay: chip dependent delay for transferring data from array to read
 *              regs (tR).
 * @dummy_controller: dummy controller implementation for drivers that can
 *                    only control a single chip
 *
 * If you look at this structure you're already wrong. These fields/hooks are
 * all deprecated.
 */
struct nand_legacy {
        void __iomem *IO_ADDR_R;                              // 讀8根I/O線地址  比如S3C2440設置爲數據寄存器地址 NFDATA
        void __iomem *IO_ADDR_W;                              // 寫8根I/O線地址  比如S3C2440設置爲數據寄存器地址 NFDATA
        void (*select_chip)(struct nand_chip *chip, int cs);  // 片選/取消片選
        u8 (*read_byte)(struct nand_chip *chip);              // 讀取一個字節數據
        void (*write_byte)(struct nand_chip *chip, u8 byte);   // 寫入一個字節數據
        void (*write_buf)(struct nand_chip *chip, const u8 *buf, int len);  // 寫入len個長度字節
        void (*read_buf)(struct nand_chip *chip, u8 *buf, int len);         // 讀取len個長度字節
        void (*cmd_ctrl)(struct nand_chip *chip, int dat, unsigned int ctrl);  // 硬件相關控制函數   寫命令/地址
        void (*cmdfunc)(struct nand_chip *chip, unsigned command, int column,  // 發送寫數據命令 傳入列地址、頁地址
                        int page_addr);
        int (*dev_ready)(struct nand_chip *chip); // 獲取nand狀態 繁忙/就緒  
        int (*waitfunc)(struct nand_chip *chip);  // 等待nand就緒
        int (*block_bad)(struct nand_chip *chip, loff_t ofs);      // 檢測是否有壞塊
        int (*block_markbad)(struct nand_chip *chip, loff_t ofs);  // 標記壞塊
        int (*set_features)(struct nand_chip *chip, int feature_addr,
                            u8 *subfeature_para);
        int (*get_features)(struct nand_chip *chip, int feature_addr,
                            u8 *subfeature_para);
        int chip_delay;           // 延遲時間
        struct nand_controller dummy_controller;
};

2.3.6 struct nand_ecc_ctrl

nand_ecc_ctrl中的讀寫函數read_page_raw、write_pager_raw等主要是用來做一些與ecc有關的操作：

/**
 * struct nand_ecc_ctrl - Control structure for ECC
 * @mode:       ECC mode
 * @algo:       ECC algorithm
 * @steps:      number of ECC steps per page
 * @size:       data bytes per ECC step
 * @bytes:      ECC bytes per step
 * @strength:   max number of correctible bits per ECC step
 * @total:      total number of ECC bytes per page
 * @prepad:     padding information for syndrome based ECC generators
 * @postpad:    padding information for syndrome based ECC generators
 * @options:    ECC specific options (see NAND_ECC_XXX flags defined above)
 * @priv:       pointer to private ECC control data
 * @calc_buf:   buffer for calculated ECC, size is oobsize.
 * @code_buf:   buffer for ECC read from flash, size is oobsize.
 * @hwctl:      function to control hardware ECC generator. Must only
 *              be provided if an hardware ECC is available
 * @calculate:  function for ECC calculation or readback from ECC hardware
 * @correct:    function for ECC correction, matching to ECC generator (sw/hw).
 *              Should return a positive number representing the number of
 *              corrected bitflips, -EBADMSG if the number of bitflips exceed
 *              ECC strength, or any other error code if the error is not
 *              directly related to correction.
 *              If -EBADMSG is returned the input buffers should be left
 *              untouched.
 * @read_page_raw:      function to read a raw page without ECC. This function
 *                      should hide the specific layout used by the ECC
 *                      controller and always return contiguous in-band and
 *                      out-of-band data even if they're not stored
 *                      contiguously on the NAND chip (e.g.
 *                      NAND_ECC_HW_SYNDROME interleaves in-band and
 *                      out-of-band data).
 * @write_page_raw:     function to write a raw page without ECC. This function
 *                      should hide the specific layout used by the ECC
 *                      controller and consider the passed data as contiguous
 *                      in-band and out-of-band data. ECC controller is
 *                      responsible for doing the appropriate transformations
 *                      to adapt to its specific layout (e.g.
 *                      NAND_ECC_HW_SYNDROME interleaves in-band and
 *                      out-of-band data).
 * @read_page:  function to read a page according to the ECC generator
 *              requirements; returns maximum number of bitflips corrected in
 *              any single ECC step, -EIO hw error
 * @read_subpage:       function to read parts of the page covered by ECC;
 *                      returns same as read_page()
 * @write_subpage:      function to write parts of the page covered by ECC.
 * @write_page: function to write a page according to the ECC generator
 *              requirements.
 * @write_oob_raw:      function to write chip OOB data without ECC
 * @read_oob_raw:       function to read chip OOB data without ECC
 * @read_oob:   function to read chip OOB data
 * @write_oob:  function to write chip OOB data
 */
struct nand_ecc_ctrl {
        nand_ecc_modes_t mode;
        enum nand_ecc_algo algo;
        int steps;
        int size;
        int bytes;
        int total;
        int strength;
        int prepad;
        int postpad;
        unsigned int options;
        void *priv;
        u8 *calc_buf;
        u8 *code_buf;
        void (*hwctl)(struct nand_chip *chip, int mode);
        int (*calculate)(struct nand_chip *chip, const uint8_t *dat,
                         uint8_t *ecc_code);
        int (*correct)(struct nand_chip *chip, uint8_t *dat, uint8_t *read_ecc,
                       uint8_t *calc_ecc);
        int (*read_page_raw)(struct nand_chip *chip, uint8_t *buf,
                             int oob_required, int page);
        int (*write_page_raw)(struct nand_chip *chip, const uint8_t *buf,
                              int oob_required, int page);
        int (*read_page)(struct nand_chip *chip, uint8_t *buf,
                         int oob_required, int page);
        int (*read_subpage)(struct nand_chip *chip, uint32_t offs,
                            uint32_t len, uint8_t *buf, int page);
        int (*write_subpage)(struct nand_chip *chip, uint32_t offset,
                             uint32_t data_len, const uint8_t *data_buf,
                             int oob_required, int page);
        int (*write_page)(struct nand_chip *chip, const uint8_t *buf,
                          int oob_required, int page);
        int (*write_oob_raw)(struct nand_chip *chip, int page);
        int (*read_oob_raw)(struct nand_chip *chip, int page);
        int (*read_oob)(struct nand_chip *chip, int page);
        int (*write_oob)(struct nand_chip *chip, int page);
};

2.3.7 struct nand_manufacturer

nand_manufacturer保存生產廠家信息，定義在drivers/mtd/nand/raw/internals.h：

/*
 * NAND Flash Manufacturer ID Codes
 */
#define NAND_MFR_AMD            0x01
#define NAND_MFR_ATO            0x9b
#define NAND_MFR_EON            0x92
#define NAND_MFR_ESMT           0xc8
#define NAND_MFR_FUJITSU        0x04
#define NAND_MFR_HYNIX          0xad
#define NAND_MFR_INTEL          0x89
#define NAND_MFR_MACRONIX       0xc2
#define NAND_MFR_MICRON         0x2c
#define NAND_MFR_NATIONAL       0x8f
#define NAND_MFR_RENESAS        0x07
#define NAND_MFR_SAMSUNG        0xec   // 三星廠家
#define NAND_MFR_SANDISK        0x45
#define NAND_MFR_STMICRO        0x20
#define NAND_MFR_TOSHIBA        0x98
#define NAND_MFR_WINBOND        0xef

/**
 * struct nand_manufacturer_ops - NAND Manufacturer operations
 * @detect: detect the NAND memory organization and capabilities
 * @init: initialize all vendor specific fields (like the ->read_retry()
 *        implementation) if any.
 * @cleanup: the ->init() function may have allocated resources, ->cleanup()
 *           is here to let vendor specific code release those resources.
 * @fixup_onfi_param_page: apply vendor specific fixups to the ONFI parameter
 *                         page. This is called after the checksum is verified.
 */
struct nand_manufacturer_ops {
        void (*detect)(struct nand_chip *chip);
        int (*init)(struct nand_chip *chip);
        void (*cleanup)(struct nand_chip *chip);
        void (*fixup_onfi_param_page)(struct nand_chip *chip,
                                      struct nand_onfi_params *p);
};

/**
 * struct nand_manufacturer - NAND Flash Manufacturer structure
 * @name: Manufacturer name
 * @id: manufacturer ID code of device.
 * @ops: manufacturer operations
 */
struct nand_manufacturer {
        int id;   // 廠家ID
        char *name;  // 廠家名字
        const struct nand_manufacturer_ops *ops; // 操作函數
};

2.3.8 struct nand_device

struct nand_device定義在include/linux/mtd/nand.h:

/**
 * struct nand_device - NAND device
 * @mtd: MTD instance attached to the NAND device
 * @memorg: memory layout
 * @eccreq: ECC requirements
 * @rowconv: position to row address converter
 * @bbt: bad block table info
 * @ops: NAND operations attached to the NAND device
 *
 * Generic NAND object. Specialized NAND layers (raw NAND, SPI NAND, OneNAND)
 * should declare their own NAND object embedding a nand_device struct (that's
 * how inheritance is done).
 * struct_nand_device->memorg and struct_nand_device->eccreq should be filled
 * at device detection time to reflect the NAND device
 * capabilities/requirements. Once this is done nanddev_init() can be called.
 * It will take care of converting NAND information into MTD ones, which means
 * the specialized NAND layers should never manually tweak
 * struct_nand_device->mtd except for the ->_read/write() hooks.
 */
struct nand_device {
        struct mtd_info mtd;
        struct nand_memory_organization memorg;
        struct nand_ecc_req eccreq;
        struct nand_row_converter rowconv;
        struct nand_bbt bbt;
        const struct nand_ops *ops;
};

2.3.9 結構體關係圖

2.4 核心函數

如果MTD設備只有一個分區，那麼使用下面兩個函數註冊和註銷MTD設備：

int add_mtd_device(struct mtd_info *mtd)  
int del_mtd_device (struct mtd_info *mtd)

如果MTD設備存在其他分區，那麼使用下面兩個函數註冊和註銷MTD設備：

int add_mtd_partitions(struct mtd_info *master,const struct mtd_partition *parts,int nbparts)  
int del_mtd_partitions(struct mtd_info *master)

回到頂部

三、MTD設備註冊

3.1 add_mtd_device

add_mtd_device定義在drivers/mtd/mtdcore.c:

/**
 *      add_mtd_device - register an MTD device
 *      @mtd: pointer to new MTD device info structure
 *
 *      Add a device to the list of MTD devices present in the system, and
 *      notify each currently active MTD 'user' of its arrival. Returns
 *      zero on success or non-zero on failure.
 */

int add_mtd_device(struct mtd_info *mtd)
{
        struct mtd_notifier *not;
        int i, error;

        /*
         * May occur, for instance, on buggy drivers which call
         * mtd_device_parse_register() multiple times on the same master MTD,
         * especially with CONFIG_MTD_PARTITIONED_MASTER=y.
         */
        if (WARN_ONCE(mtd->dev.type, "MTD already registered\n"))
                return -EEXIST;

        BUG_ON(mtd->writesize == 0);

        /*
         * MTD drivers should implement ->_{write,read}() or
         * ->_{write,read}_oob(), but not both.
         */
        if (WARN_ON((mtd->_write && mtd->_write_oob) ||  // 校驗函數指針
                    (mtd->_read && mtd->_read_oob)))
                return -EINVAL;

        if (WARN_ON((!mtd->erasesize || !mtd->_erase) &&
                    !(mtd->flags & MTD_NO_ERASE)))
                return -EINVAL;

        mutex_lock(&mtd_table_mutex);  // 互斥鎖

        i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 爲mtd設備分配index
        if (i < 0) {
                error = i;
                goto fail_locked;
        }

        mtd->index = i;
        mtd->usecount = 0;

        /* default value if not set by driver */
        if (mtd->bitflip_threshold == 0)    // 計算擦除數據偏移
                mtd->bitflip_threshold = mtd->ecc_strength;
        if (is_power_of_2(mtd->erasesize))
                mtd->erasesize_shift = ffs(mtd->erasesize) - 1;
        else
                mtd->erasesize_shift = 0;

        if (is_power_of_2(mtd->writesize))    // 計算寫入數據偏移值
                mtd->writesize_shift = ffs(mtd->writesize) - 1;
        else
                mtd->writesize_shift = 0;

        mtd->erasesize_mask = (1 << mtd->erasesize_shift) - 1;  // 計算擦除數據大小掩碼
        mtd->writesize_mask = (1 << mtd->writesize_shift) - 1;  // 計算寫入數據大小掩碼

        /* Some chips always power up locked. Unlock them now */
        if ((mtd->flags & MTD_WRITEABLE) && (mtd->flags & MTD_POWERUP_LOCK)) { // 有些芯片總是通電鎖定，立即解鎖（一般flash芯片都支持lock機制，在驅動上很少使用）
                error = mtd_unlock(mtd, 0, mtd->size);
                if (error && error != -EOPNOTSUPP)
                        printk(KERN_WARNING
                               "%s: unlock failed, writes may not work\n",
                               mtd->name);
                /* Ignore unlock failures? */
                error = 0;
        }

        /* Caller should have set dev.parent to match the
         * physical device, if appropriate.
         */
        mtd->dev.type = &mtd_devtype;  // 設置設備類型
        mtd->dev.class = &mtd_class;   // 設置設備類 會在/syc/class創建mtd類
        mtd->dev.devt = MTD_DEVT(i);   // 設置設備號，關於設備號的申請是在mtdchar.c模塊入口函數中完成的 
        dev_set_name(&mtd->dev, "mtd%d", i);  // 設置設備節點名字mtd%d
        dev_set_drvdata(&mtd->dev, mtd);      // mtd->dev.driver_data = mtd；
        of_node_get(mtd_get_of_node(mtd));
        error = device_register(&mtd->dev);   // 註冊MTD字符設備，會在/sys/class/mtd類下創建mtd%d文件，然後mdev通過這個自動創建/dev/mtd%d這個字符設備節點
        if (error)
                goto fail_added;

        /* Add the nvmem provider */
        error = mtd_nvmem_add(mtd);
        if (error)
                goto fail_nvmem_add;

        if (!IS_ERR_OR_NULL(dfs_dir_mtd)) {
                mtd->dbg.dfs_dir = debugfs_create_dir(dev_name(&mtd->dev), dfs_dir_mtd);
                if (IS_ERR_OR_NULL(mtd->dbg.dfs_dir)) {
                        pr_debug("mtd device %s won't show data in debugfs\n",
                                 dev_name(&mtd->dev));
                }
        }

        device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL,   // 創建MTD字符設備，內部調用了device_register 在/sys/class/mtd下創建mtd%dro設備，然後mdev通過這個自動創建/dev/mtd%dro這個字符設備節點
                      "mtd%dro", i);

        pr_debug("mtd: Giving out device %d to %s\n", i, mtd->name);
        /* No need to get a refcount on the module containing
           the notifier, since we hold the mtd_table_mutex */
        list_for_each_entry(not, &mtd_notifiers, list)  // 調用mtd子系統的notify機制，實現針對mtd設備添加、移除，移除notify機制，實現註冊的notify hook
                not->add(mtd);

        mutex_unlock(&mtd_table_mutex);                 // 解鎖
        /* We _know_ we aren't being removed, because
           our caller is still holding us here. So none
           of this try_ nonsense, and no bitching about it
           either. :) */
        __module_get(THIS_MODULE);
        return 0;

fail_nvmem_add:
        device_unregister(&mtd->dev);
fail_added:
        of_node_put(mtd_get_of_node(mtd));
        idr_remove(&mtd_idr, i);
fail_locked:
        mutex_unlock(&mtd_table_mutex);
        return error;
}

該函數主要進行了以下操作：

(1) 對mtd原始設備必要字段以及函數指針進行校驗；

(2) 在mtd_idr樹中爲該mtd原始設備分配節點，並返回分配的節點ID：

 i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); // 分配ID mtd_idr是一個redix樹、將mtd與新分配的ID關聯

idr_alloc函數用於爲mtd_idr樹新增一個節點，該節點在mtd_idr樹中有唯一的ID，並且將這個節點與mtd關聯。通過ID就可以定位到mtd。

此外該函數第三個參數和第四個參數含義如下：爲ID的起始範圍，結束範圍設置爲0，表示mtd_idr樹允許的最大ID。

全局變量mtd_idr定義在drivers/mtd/mtdcore.c:

static DEFINE_IDR(mtd_idr);

關於IDR的定義這裏就不介紹了，IDR主要實現ID與數據結構的綁定具體可以參考linux內核IDR機制詳解（一）。

後續字符設備及塊設備註冊需要該ID，比如後面設置mtd設備對應的device類型變量設備號爲MTD_DEVT(i)；

#define  MTD_DEVT(index)  MKDEV(MTD_CHAR_MAJOR, (index)*2)

主設備號爲MTD_CHAR_MAJOR，即90，次設備號爲index*2；

(3) 設備mtd原始設備的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息；

(4) 針對設置可寫屬性，且上電時對Flash進行lock的芯片，則調用unlock接口，進行解鎖（一般Flasg芯片都支持lock機制，但在驅動上很少使用）；

(5) 設置mtd原始設備對應的device類型變量所屬的class爲mtd_class，並設置其設備號，類型、名稱、driver_data；

mtd_class定義爲：

static struct class mtd_class = {
        .name = "mtd",
        .owner = THIS_MODULE,
        .pm = MTD_CLS_PM_OPS,
};

(6) 調用device_register完成名字爲mtd%d MTD字符設備的註冊；

(7)調用device_create完成名字爲mtd%dro MTD字符設備的創建、初始化以及註冊；

(8) 調用mtd子系統的notify機制，實現針對mtd設備添加、移除，移除notify機制，實現註冊的notify hook；

list_for_each_entry(not, &mtd_notifiers, list) 
      not->add(mtd);

list_for_each_entry函數包含三個參數，以此爲pos、head、member；它實際上是一個for循環，利用傳入的pos作爲循環變量，從鏈表頭head開始，逐項向後（next方向）移動pos，直至又回到head。

鏈表mtd_notifiers定義爲：

static LIST_HEAD(mtd_notifiers);

這裏實際上就是遍歷這個鏈表得到當前時刻的元素not，類型爲mtd_notifiers，然後調用not->add(mtd)方法，在這個方法裏會進行名字爲mtdblock%d MTD塊設備的註冊。

3.2 add_mtd_partitions

add_mtd_partitions定義在drivers/mtd/mtdpart.c:

/*
 * This function, given a master MTD object and a partition table, creates
 * and registers slave MTD objects which are bound to the master according to
 * the partition definitions.
 *
 * For historical reasons, this function's caller only registers the master
 * if the MTD_PARTITIONED_MASTER config option is set.
 */

int add_mtd_partitions(struct mtd_info *master,  // MTD設備信息
                       const struct mtd_partition *parts,  // 分區表
                       int nbparts) // 分區個數
{
        struct mtd_part *slave;
        uint64_t cur_offset = 0;
        int i, ret;

        printk(KERN_NOTICE "Creating %d MTD partitions on \"%s\":\n", nbparts, master->name);

        for (i = 0; i < nbparts; i++) {   // 遍歷分區表
                slave = allocate_partition(master, parts + i, i, cur_offset);   // 分配mtd_part
                if (IS_ERR(slave)) {
                        ret = PTR_ERR(slave);
                        goto err_del_partitions;
                }

                mutex_lock(&mtd_partitions_mutex);
                list_add(&slave->list, &mtd_partitions);  // slave添加到鏈表mtd_partitions
                mutex_unlock(&mtd_partitions_mutex);

                ret = add_mtd_device(&slave->mtd);  // 爲每個分區註冊mtd設備，會在/dev下成成mtdblock%d文件塊設備文件
                if (ret) {
                        mutex_lock(&mtd_partitions_mutex);
                        list_del(&slave->list);
                        mutex_unlock(&mtd_partitions_mutex);

                        free_partition(slave);
                        goto err_del_partitions;
                }

                mtd_add_partition_attrs(slave);
                /* Look for subpartitions */
                parse_mtd_partitions(&slave->mtd, parts[i].types, NULL);

                cur_offset = slave->offset + slave->mtd.size;
        }

        return 0;

err_del_partitions:
        del_mtd_partitions(master);

        return ret;
}

3.2.1 allocate_partition

allocate_partition定義在drivers/mtd/mtdpart.c：

static struct mtd_part *allocate_partition(struct mtd_info *parent,
                        const struct mtd_partition *part, int partno,
                        uint64_t cur_offset)
{
        int wr_alignment = (parent->flags & MTD_NO_ERASE) ? parent->writesize :
                                                            parent->erasesize;
        struct mtd_part *slave;
        u32 remainder;
        char *name;
        u64 tmp;

        /* allocate the partition structure */
        slave = kzalloc(sizeof(*slave), GFP_KERNEL);
        name = kstrdup(part->name, GFP_KERNEL);
        if (!name || !slave) {
                printk(KERN_ERR"memory allocation error while creating partitions for \"%s\"\n",
                       parent->name);
                kfree(name);
                kfree(slave);
                return ERR_PTR(-ENOMEM);
        }

        /* set up the MTD object for this partition */
        slave->mtd.type = parent->type;
        slave->mtd.flags = parent->orig_flags & ~part->mask_flags;
        slave->mtd.orig_flags = slave->mtd.flags;
        slave->mtd.size = part->size;
        slave->mtd.writesize = parent->writesize;
        slave->mtd.writebufsize = parent->writebufsize;
        slave->mtd.oobsize = parent->oobsize;
        slave->mtd.oobavail = parent->oobavail;
        slave->mtd.subpage_sft = parent->subpage_sft;
        slave->mtd.pairing = parent->pairing;

        slave->mtd.name = name;
        slave->mtd.owner = parent->owner;

        /* NOTE: Historically, we didn't arrange MTDs as a tree out of
         * concern for showing the same data in multiple partitions.
         * However, it is very useful to have the master node present,
         * so the MTD_PARTITIONED_MASTER option allows that. The master
         * will have device nodes etc only if this is set, so make the
         * parent conditional on that option. Note, this is a way to
         * distinguish between the master and the partition in sysfs.
         */
        slave->mtd.dev.parent = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER) || mtd_is_partition(parent) ?
                                &parent->dev :
                                parent->dev.parent;
        slave->mtd.dev.of_node = part->of_node;

        if (parent->_read)
                slave->mtd._read = part_read;
        if (parent->_write)
                slave->mtd._write = part_write;
        if (parent->_panic_write)
                slave->mtd._panic_write = part_panic_write;

        if (parent->_point && parent->_unpoint) {
                slave->mtd._point = part_point;
                slave->mtd._unpoint = part_unpoint;
        }

        if (parent->_read_oob)
                slave->mtd._read_oob = part_read_oob;
        if (parent->_write_oob)
                slave->mtd._write_oob = part_write_oob;
        if (parent->_read_user_prot_reg)
                slave->mtd._read_user_prot_reg = part_read_user_prot_reg;
        if (parent->_read_fact_prot_reg)
                slave->mtd._read_fact_prot_reg = part_read_fact_prot_reg;
        if (parent->_write_user_prot_reg)
                slave->mtd._write_user_prot_reg = part_write_user_prot_reg;
        if (parent->_lock_user_prot_reg)
                slave->mtd._lock_user_prot_reg = part_lock_user_prot_reg;
        if (parent->_get_user_prot_info)
                slave->mtd._get_user_prot_info = part_get_user_prot_info;
        if (parent->_get_fact_prot_info)
                slave->mtd._get_fact_prot_info = part_get_fact_prot_info;
        if (parent->_sync)
                slave->mtd._sync = part_sync;
        if (!partno && !parent->dev.class && parent->_suspend &&
            parent->_resume) {
                slave->mtd._suspend = part_suspend;
                slave->mtd._resume = part_resume;
        }
        if (parent->_writev)
                slave->mtd._writev = part_writev;
        if (parent->_lock)
                slave->mtd._lock = part_lock;
        if (parent->_unlock)
                slave->mtd._unlock = part_unlock;
        if (parent->_is_locked)
                slave->mtd._is_locked = part_is_locked;
        if (parent->_block_isreserved)
                slave->mtd._block_isreserved = part_block_isreserved;
        if (parent->_block_isbad)
                slave->mtd._block_isbad = part_block_isbad;
        if (parent->_block_markbad)
                slave->mtd._block_markbad = part_block_markbad;
        if (parent->_max_bad_blocks)
                slave->mtd._max_bad_blocks = part_max_bad_blocks;

        if (parent->_get_device)
                slave->mtd._get_device = part_get_device;
        if (parent->_put_device)
                slave->mtd._put_device = part_put_device;

        slave->mtd._erase = part_erase;
        slave->parent = parent;
        slave->offset = part->offset;

        if (slave->offset == MTDPART_OFS_APPEND)
                slave->offset = cur_offset;
        if (slave->offset == MTDPART_OFS_NXTBLK) {
                tmp = cur_offset;
                slave->offset = cur_offset;
                remainder = do_div(tmp, wr_alignment);
                if (remainder) {
                        slave->offset += wr_alignment - remainder;
                        printk(KERN_NOTICE "Moving partition %d: "
                               "0x%012llx -> 0x%012llx\n", partno,
                               (unsigned long long)cur_offset, (unsigned long long)slave->offset);
                }
        }
        if (slave->offset == MTDPART_OFS_RETAIN) {
                slave->offset = cur_offset;
                if (parent->size - slave->offset >= slave->mtd.size) {
                        slave->mtd.size = parent->size - slave->offset
                                                        - slave->mtd.size;
                } else {
                        printk(KERN_ERR "mtd partition \"%s\" doesn't have enough space: %#llx < %#llx, disabled\n",
                                part->name, parent->size - slave->offset,
                                slave->mtd.size);
                        /* register to preserve ordering */
                        goto out_register;
                }
        }
        if (slave->mtd.size == MTDPART_SIZ_FULL)
                slave->mtd.size = parent->size - slave->offset;

        printk(KERN_NOTICE "0x%012llx-0x%012llx : \"%s\"\n", (unsigned long long)slave->offset,
                (unsigned long long)(slave->offset + slave->mtd.size), slave->mtd.name);

        /* let's do some sanity checks */
        if (slave->offset >= parent->size) {
                /* let's register it anyway to preserve ordering */
                slave->offset = 0;
                slave->mtd.size = 0;

                /* Initialize ->erasesize to make add_mtd_device() happy. */
                slave->mtd.erasesize = parent->erasesize;

                printk(KERN_ERR"mtd: partition \"%s\" is out of reach -- disabled\n",
                        part->name);
                goto out_register;
        }
        if (slave->offset + slave->mtd.size > parent->size) {
                slave->mtd.size = parent->size - slave->offset;
                printk(KERN_WARNING"mtd: partition \"%s\" extends beyond the end of device \"%s\" -- size truncated to %#llx\n",
                        part->name, parent->name, (unsigned long long)slave->mtd.size);
        }
        if (parent->numeraseregions > 1) {
                /* Deal with variable erase size stuff */
                int i, max = parent->numeraseregions;
                u64 end = slave->offset + slave->mtd.size;
                struct mtd_erase_region_info *regions = parent->eraseregions;

                /* Find the first erase regions which is part of this
                 * partition. */
                for (i = 0; i < max && regions[i].offset <= slave->offset; i++)
                        ;
                /* The loop searched for the region _behind_ the first one */
                if (i > 0)
                        i--;

                /* Pick biggest erasesize */
                for (; i < max && regions[i].offset < end; i++) {
                        if (slave->mtd.erasesize < regions[i].erasesize) {
                                slave->mtd.erasesize = regions[i].erasesize;
                        }
                }
                BUG_ON(slave->mtd.erasesize == 0);
        } else {
                /* Single erase size */
                slave->mtd.erasesize = parent->erasesize;
        }

        /*
         * Slave erasesize might differ from the master one if the master
         * exposes several regions with different erasesize. Adjust
         * wr_alignment accordingly.
         */
        if (!(slave->mtd.flags & MTD_NO_ERASE))
                wr_alignment = slave->mtd.erasesize;

        tmp = part_absolute_offset(parent) + slave->offset;
        remainder = do_div(tmp, wr_alignment);
        if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
                /* Doesn't start on a boundary of major erase size */
                /* FIXME: Let it be writable if it is on a boundary of
                 * _minor_ erase size though */
                slave->mtd.flags &= ~MTD_WRITEABLE;
                printk(KERN_WARNING"mtd: partition \"%s\" doesn't start on an erase/write block boundary -- force read-only\n",
                        part->name);
        }

        tmp = part_absolute_offset(parent) + slave->mtd.size;
        remainder = do_div(tmp, wr_alignment);
        if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
                slave->mtd.flags &= ~MTD_WRITEABLE;
                printk(KERN_WARNING"mtd: partition \"%s\" doesn't end on an erase/write block -- force read-only\n",
                        part->name);
        }

        mtd_set_ooblayout(&slave->mtd, &part_ooblayout_ops);
        slave->mtd.ecc_step_size = parent->ecc_step_size;
        slave->mtd.ecc_strength = parent->ecc_strength;
        slave->mtd.bitflip_threshold = parent->bitflip_threshold;
        if (parent->_block_isbad) {
                uint64_t offs = 0;

                while (offs < slave->mtd.size) {
                        if (mtd_block_isreserved(parent, offs + slave->offset))
                                slave->mtd.ecc_stats.bbtblocks++;
                        else if (mtd_block_isbad(parent, offs + slave->offset))
                                slave->mtd.ecc_stats.badblocks++;
                        offs += slave->mtd.erasesize;
                }
        }

out_register:
        return slave;
}

3.2.2 mtd_partitions

鏈表mtd_partitions定義在drivers/mtd/mtdpart.c：

static LIST_HEAD(mtd_partitions);

3.3 mtd_device_register

宏mtd_device_register定義在include/linux/mtd/mtd.h:

#define mtd_device_register(master, parts, nr_parts)    \
        mtd_device_parse_register(master, NULL, NULL, parts, nr_parts)

函數mtd_device_parse_register定義在drivers/mtd/mtdcore.c：

/**
 * mtd_device_parse_register - parse partitions and register an MTD device.
 *
 * @mtd: the MTD device to register
 * @types: the list of MTD partition probes to try, see
 *         'parse_mtd_partitions()' for more information
 * @parser_data: MTD partition parser-specific data
 * @parts: fallback partition information to register, if parsing fails;
 *         only valid if %nr_parts > %0
 * @nr_parts: the number of partitions in parts, if zero then the full
 *            MTD device is registered if no partition info is found
 *
 * This function aggregates MTD partitions parsing (done by
 * 'parse_mtd_partitions()') and MTD device and partitions registering. It
 * basically follows the most common pattern found in many MTD drivers:
 *
 * * If the MTD_PARTITIONED_MASTER option is set, then the device as a whole is
 *   registered first.
 * * Then It tries to probe partitions on MTD device @mtd using parsers
 *   specified in @types (if @types is %NULL, then the default list of parsers
 *   is used, see 'parse_mtd_partitions()' for more information). If none are
 *   found this functions tries to fallback to information specified in
 *   @parts/@nr_parts.
 * * If no partitions were found this function just registers the MTD device
 *   @mtd and exits.
 *
 * Returns zero in case of success and a negative error code in case of failure.
 */
int mtd_device_parse_register(struct mtd_info *mtd, const char * const *types,
                              struct mtd_part_parser_data *parser_data,
                              const struct mtd_partition *parts, // 分區表
                              int nr_parts)  // 分區個數
{
        int ret;

        mtd_set_dev_defaults(mtd);

        if (IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER)) {  // 將Nand Flash當做一個分區註冊進內核
                ret = add_mtd_device(mtd);   // 註冊MTD設備
                if (ret)
                        return ret;
        }

        /* Prefer parsed partitions over driver-provided fallback */
        ret = parse_mtd_partitions(mtd, types, parser_data);
        if (ret > 0)
                ret = 0;
        else if (nr_parts)  // 註冊MTD設備
                ret = add_mtd_partitions(mtd, parts, nr_parts);
        else if (!device_is_registered(&mtd->dev))
                ret = add_mtd_device(mtd);
        else
                ret = 0;

        if (ret)
                goto out;
        /*
         * FIXME: some drivers unfortunately call this function more than once.
         * So we have to check if we've already assigned the reboot notifier.
         *
         * Generally, we can make multiple calls work for most cases, but it
         * does cause problems with parse_mtd_partitions() above (e.g.,
         * cmdlineparts will register partitions more than once).
         */
        WARN_ONCE(mtd->_reboot && mtd->reboot_notifier.notifier_call,
                  "MTD already registered\n");
        if (mtd->_reboot && !mtd->reboot_notifier.notifier_call) {
                mtd->reboot_notifier.notifier_call = mtd_reboot_notifier;
                register_reboot_notifier(&mtd->reboot_notifier);
        }

out:
        if (ret && device_is_registered(&mtd->dev))
                del_mtd_device(mtd);  // 卸載MTD設備

        return ret;
}

回到頂部

四、mtdblock.c

之前我們已經介紹過mtdbloc.c文件，該文件實現了MTD塊設備相關接口，我們直接定位到drivers/mtd/mtdblock.c文件，並對源碼進行解析。

4.1 模塊入口函數

我們定位到MTD塊設備模塊入口函數：

static struct mtd_blktrans_ops mtdblock_tr = {  // 這裏面定義了MTD塊設備相關信息以及操作函數
        .name           = "mtdblock",
        .major          = MTD_BLOCK_MAJOR,   // MTD塊設備主設備號  31
        .part_bits      = 0,                 // 磁盤設備分區位數  0表示不分區  1表示有2個分區  2表示有4個分區...
        .blksize        = 512,               // 扇區大小
        .open           = mtdblock_open,    
        .flush          = mtdblock_flush,
        .release        = mtdblock_release,
        .readsect       = mtdblock_readsect,
        .writesect      = mtdblock_writesect,
        .add_mtd        = mtdblock_add_mtd,
        .remove_dev     = mtdblock_remove_dev,
        .owner          = THIS_MODULE,
};

static int __init init_mtdblock(void)
{
        return register_mtd_blktrans(&mtdblock_tr);
}

4.2 register_mtd_blktrans

定位到register_mtd_blktrans函數，該函數位於drivers/mtd/mtd_blkdevs.c：

int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
{
        struct mtd_info *mtd;
        int ret;

        /* Register the notifier if/when the first device type is
           registered, to prevent the link/init ordering from fucking
           us over. */
        if (!blktrans_notifier.list.next)  // next指向NULL,進入
                register_mtd_user(&blktrans_notifier);  // 註冊blktrans_notifier到mtd_notifiers鏈表                     


        mutex_lock(&mtd_table_mutex);

        ret = register_blkdev(tr->major, tr->name);   // 註冊塊設備，主設備號爲MTD_BLOCK_MAJOR,定義爲31
        if (ret < 0) {
                printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n",
                       tr->name, tr->major, ret);
                mutex_unlock(&mtd_table_mutex);
                return ret;
        }

        if (ret)
                tr->major = ret;

        tr->blkshift = ffs(tr->blksize) - 1;

        INIT_LIST_HEAD(&tr->devs);
        list_add(&tr->list, &blktrans_majors);  // 註冊tr到鏈表blktrans_majors

        mtd_for_each_device(mtd)
                if (mtd->type != MTD_ABSENT)
                        tr->add_mtd(tr, mtd);

        mutex_unlock(&mtd_table_mutex);
        return 0;
}

該函數主要包含三部分：

調用register_mtd_user：註冊blktrans_notifier到鏈表mtd_notifiers，然後遍歷全局變量mtd_idr獲取mtd，執行blktrans_notify_add(mtd)；
調用register_blkdev註冊塊設備，主設備號爲31，塊設備名稱爲mtdblock；
註冊mtdblock_tr到鏈表blktrans_majors，鏈表定義爲static LIST_HEAD(blktrans_majors);；
然後遍歷全局變量mtd_idr獲取mtd，執行mtdblock_add_mtd(mtdblock_tr,mtd)；

4.2.1 mtd_notifier

mtd_notifier定義在include/linux/mtd/mtd.h：

struct mtd_notifier {
        void (*add)(struct mtd_info *mtd);
        void (*remove)(struct mtd_info *mtd);
        struct list_head list;
};

4.2.2 blktrans_notifier

這裏我們關注一下register_mtd_user(&blktrans_notifier)，變量blktrans_notifier，定義在drivers/mtd/mtd_blkdevs.c：

static struct mtd_notifier blktrans_notifier = {
        .add = blktrans_notify_add,
        .remove = blktrans_notify_remove,
};

4.2.3 register_mtd_user

register_mtd_user函數將new->list添加到鏈表mtd_notifiers:

/**
 *      register_mtd_user - register a 'user' of MTD devices.
 *      @new: pointer to notifier info structure
 *
 *      Registers a pair of callbacks function to be called upon addition
 *      or removal of MTD devices. Cau                    ses the 'add' callback to be immediately
 *      invoked for each MTD device currently present in the system.                       
 */
void register_mtd_user (struct mtd_notifier *new)
{
        struct mtd_info *mtd;

        mutex_lock(&mtd_table_mutex);           // 互斥鎖

        list_add(&new->list, &mtd_notifiers);   // 加入鏈表

        __module_get(THIS_MODULE);

        mtd_for_each_device(mtd)       // 遍歷mtd_idr，得到mtd
                new->add(mtd);     // 最終執行blktrans_notify_add(mtd)

        mutex_unlock(&mtd_table_mutex);       // 解鎖
}

4.2.4 mtd_for_each_device

mtd_for_each_device宏定義在drivers/mtd/mtdcore.h:

#define mtd_for_each_device(mtd)                        \
        for ((mtd) = __mtd_next_device(0);              \
             (mtd) != NULL;                             \
             (mtd) = __mtd_next_device(mtd->index + 1))

__mtd_next_device定義在drivers/mtd/mtdcore.c：

struct mtd_info *__mtd_next_device(int i)
{
        return idr_get_next(&mtd_idr, &i);
}

這裏實際上就是去遍歷mtd_idr這個redix樹上的所有節點，得到每個節點關聯的mtd。

4.2.5 blktrans_notify_add

然後進入blktrans_notifier變量的blktrans_notify_add ()函數。

static void blktrans_notify_add(struct mtd_info *mtd)
{
        struct mtd_blktrans_ops *tr;

        if (mtd->type == MTD_ABSENT)
                return;

        list_for_each_entry(tr, &blktrans_majors, list)   // 遍歷blktrans_majors鏈表
                tr->add_mtd(tr, mtd);  // 執行mtd_blktrans_ops結構體的add_mtd
}

在MTD塊設備驅動入口函數中，會將mtdblock_tr添加到鏈表blktrans_majors，所以這裏遍歷blktrans_majors鏈表，實際上得到的tr就是mtdblock_tr：然後執行mtdblock_tr.add_mtd(mtdblock_tr,mtd)方法。

mtdblock_tr的add_mtd函數,就是mtdblock_add_mtd函數。

4.2.6 在mtdblock_add_mtd

static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd)
{
        struct mtdblk_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL);

        if (!dev)
                return;

        dev->mbd.mtd = mtd;             // 設置MTD原始設備
        dev->mbd.devnum = mtd->index;   // 設置起始次設備號

        dev->mbd.size = mtd->size >> 9;  // 總扇區個數
        dev->mbd.tr = tr;

        if (!(mtd->flags & MTD_WRITEABLE))
                dev->mbd.readonly = 1;

        if (add_mtd_blktrans_dev(&dev->mbd))
                kfree(dev);
}

mtdblock_add_mtd函數：

分配了一個mtdblk_dev結構體遍歷dev：
初始化dev成員；
調用add_mtd_blktrans_dev(dev->mtd)；

mtdblk_dev數據結構實際描述的就是一個MTD塊設備，其包含MTD原始設備，定義在drivers/mtd/mtdblock.c：

struct mtdblk_dev {
        struct mtd_blktrans_dev mbd;
        int count;
        struct mutex cache_mutex;
        unsigned char *cache_data;
        unsigned long cache_offset;
        unsigned int cache_size;
        enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } cache_state;
};

struct mtd_blktrans_dev {
        struct mtd_blktrans_ops *tr;    // MTD設備相關信息以及操作函數
        struct list_head list;
        struct mtd_info *mtd;     // MTD原始設備
        struct mutex lock;
        int devnum;                // 用於計算起始次設備號（devnum<<tr->part_bits，左移0位），由於一個MTD塊設備可能存在若干個分區,假設有2個分區 那兩個分區次設備號就是devnum+1，devnum+2,其中devnum表示整個磁盤
        bool bg_stop;
        unsigned long size;         // 扇區個數
        int readonly;
        int open;
        struct kref ref;
        struct gendisk *disk;          // 磁盤設備
        struct attribute_group *disk_attributes;
        struct request_queue *rq;       // 請求隊列
        struct list_head rq_list;
        struct blk_mq_tag_set *tag_set;  // 標籤集
        spinlock_t queue_lock;
        void *priv;
        fmode_t file_mode;
};

4.2.7 add_mtd_blktrans_dev

add_mtd_blktrans_dev定義在drivers/mtd/mtd_blkdevs.c：

int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
{
        struct mtd_blktrans_ops *tr = new->tr;
        struct mtd_blktrans_dev *d;
        int last_devnum = -1;
        struct gendisk *gd;
        int ret;

        if (mutex_trylock(&mtd_table_mutex)) {
                mutex_unlock(&mtd_table_mutex);
                BUG();
        }

        mutex_lock(&blktrans_ref_mutex);
        list_for_each_entry(d, &tr->devs, list) {   // tr->devs是個鏈表，遍歷鏈表得到mtd_blktrans_dev
                if (new->devnum == -1) {            // new設備未設置devnum號，分配一個空閒的devnum，默認從0開始分配，逐漸遞增.....
                        /* Use first free number */
                        if (d->devnum != last_devnum+1) {
                                /* Found a free devnum. Plug it in here */
                                new->devnum = last_devnum+1;          // 新的devnum
                                list_add_tail(&new->list, &d->list);  // 將當前new添加到鏈表尾部
                                goto added;
                        }
                } else if (d->devnum == new->devnum) {   // new設置的devnum已經被佔用
                        /* Required number taken */
                        mutex_unlock(&blktrans_ref_mutex);
                        return -EBUSY;
                } else if (d->devnum > new->devnum) {
                        /* Required number was free */
                        list_add_tail(&new->list, &d->list);
                        goto added;
                }
                last_devnum = d->devnum;  // 更新最新設備分配的次設備號
        }

        ret = -EBUSY;
        if (new->devnum == -1)
                new->devnum = last_devnum+1;

        /* Check that the device and any partitions will get valid
         * minor numbers and that the disk naming code below can cope
         * with this number. */
        if (new->devnum > (MINORMASK >> tr->part_bits) ||
            (tr->part_bits && new->devnum >= 27 * 26)) {
                mutex_unlock(&blktrans_ref_mutex);
                goto error1;
        }

        list_add_tail(&new->list, &tr->devs);
 added:
        mutex_unlock(&blktrans_ref_mutex);

        mutex_init(&new->lock);
        kref_init(&new->ref);
        if (!tr->writesect)
                new->readonly = 1;

        /* Create gendisk */
        ret = -ENOMEM;
        gd = alloc_disk(1 << tr->part_bits);  // 分配一個gendisk結構體，設置分區個數

        if (!gd)
                goto error2;

        new->disk = gd;
        gd->private_data = new;  // 私有數據
        gd->major = tr->major;   // 設置主設備號
        gd->first_minor = (new->devnum) << tr->part_bits;  // 設置起始次設備號
        gd->fops = &mtd_block_ops;  // 設置塊設備操作函數

        if (tr->part_bits)   //0    
                if (new->devnum < 26)
                        snprintf(gd->disk_name, sizeof(gd->disk_name),
                                 "%s%c", tr->name, 'a' + new->devnum);
                else
                        snprintf(gd->disk_name, sizeof(gd->disk_name),
                                 "%s%c%c", tr->name,
                                 'a' - 1 + new->devnum / 26,
                                 'a' + new->devnum % 26);
        else     // 設置磁盤名 即/dev/mtdblock%d
                snprintf(gd->disk_name, sizeof(gd->disk_name),
                         "%s%d", tr->name, new->devnum);

        set_capacity(gd, ((u64)new->size * tr->blksize) >> 9);  // 設置容量 單位扇區

        /* Create the request queue */
        spin_lock_init(&new->queue_lock);
        INIT_LIST_HEAD(&new->rq_list);

        new->tag_set = kzalloc(sizeof(*new->tag_set), GFP_KERNEL);
        if (!new->tag_set)
                goto error3;

        new->rq = blk_mq_init_sq_queue(new->tag_set, &mtd_mq_ops, 2,
                                BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);  // 設置請求隊列，同時設置塊設備驅動行爲的回調函數爲mtd_mq_ops
        if (IS_ERR(new->rq)) {
                ret = PTR_ERR(new->rq);
                new->rq = NULL;
                goto error4;
        }

        if (tr->flush)
                blk_queue_write_cache(new->rq, true, false);

        new->rq->queuedata = new;
        blk_queue_logical_block_size(new->rq, tr->blksize);

        blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);

        if (tr->discard) {
                blk_queue_flag_set(QUEUE_FLAG_DISCARD, new->rq);
                blk_queue_max_discard_sectors(new->rq, UINT_MAX);
        }

        gd->queue = new->rq;  // 設置請求隊列

        if (new->readonly)
                set_disk_ro(gd, 1);

        device_add_disk(&new->mtd->dev, gd, NULL);  // 向內核註冊gendisk

        if (new->disk_attributes) {
                ret = sysfs_create_group(&disk_to_dev(gd)->kobj,
                                        new->disk_attributes);
                WARN_ON(ret);
        }
        return 0;
error4:
        kfree(new->tag_set);
error3:
        put_disk(new->disk);
error2:
        list_del(&new->list);
error1:
        return ret;
}

從該函數我們可以看到無論註冊多少個MTD塊設備，其主設備號都是31，只是次設備號不一樣而已，主設備號用來表示一個特定的驅動程序。次設備號用來表示使用該驅動程序的各設備。

4.2.8 mtd_block_ops

這裏我們關注一下MTD塊設備操作集mtd_block_ops，定義在drivers/mtd/mtd_blkdevs.c。

static const struct block_device_operations mtd_block_ops = {
        .owner          = THIS_MODULE,
        .open           = blktrans_open,
        .release        = blktrans_release,
        .ioctl          = blktrans_ioctl,
        .getgeo         = blktrans_getgeo,
};

其中部分函數指針的意義：

open：當打開一個MTD塊設備的時候被調用；
release：當關閉一個MTD塊設備的時候被調用；
getgeo：獲取驅動器的集合信息，獲取到的信息會被填充在一個hd_geometry結構中；
ioctl：對MTD塊設備進行一些特殊操作時調用；

4.2.9 blktrans_open

static int blktrans_open(struct block_device *bdev, fmode_t mode)
{
        struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk);
        int ret = 0;

        if (!dev)
                return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/

        mutex_lock(&mtd_table_mutex);
        mutex_lock(&dev->lock);

        if (dev->open)
                goto unlock;

        kref_get(&dev->ref);
        __module_get(dev->tr->owner);

        if (!dev->mtd)
                goto unlock;

        if (dev->tr->open) {
                ret = dev->tr->open(dev);  // 實際上調用了mtd_blktrans_ops的open函數
                if (ret)
                        goto error_put;
        }

        ret = __get_mtd_device(dev->mtd);
        if (ret)
                goto error_release;
        dev->file_mode = mode;

unlock:
        dev->open++;
        mutex_unlock(&dev->lock);
        mutex_unlock(&mtd_table_mutex);
        blktrans_dev_put(dev);
        return ret;

error_release:
        if (dev->tr->release)
                dev->tr->release(dev);
error_put:
        module_put(dev->tr->owner);
        kref_put(&dev->ref, blktrans_dev_release);
        mutex_unlock(&dev->lock);
        mutex_unlock(&mtd_table_mutex);
        blktrans_dev_put(dev);

4.2.10 blktrans_ioctl

static int blktrans_ioctl(struct block_device *bdev, fmode_t mode,
                              unsigned int cmd, unsigned long arg)
{
        struct mtd_blktrans_dev *dev = blktrans_dev_get(bdev->bd_disk);
        int ret = -ENXIO;

        if (!dev)
                return ret;

        mutex_lock(&dev->lock);

        if (!dev->mtd)
                goto unlock;

        switch (cmd) {
        case BLKFLSBUF:
                ret = dev->tr->flush ? dev->tr->flush(dev) : 0;
                break;
        default:
                ret = -ENOTTY;
        }
unlock:
        mutex_unlock(&dev->lock);
        blktrans_dev_put(dev);
        return ret;
}

4.2.11 mtd_mq_ops

這裏我們關注一下MTD塊設備驅動mq的操作集合，定義在drivers/mtd/mtd_blkdevs.c。

static const struct blk_mq_ops mtd_mq_ops = {
        .queue_rq       = mtd_queue_rq,
};

在上一節分析我們已經知道將request請求派發給塊設備驅動的時候會被調用queue_rq函數，該函數本質上就是進行磁盤和內存之間的數據交互操作。比如將內存數據寫入磁盤、或者從磁盤讀取數據到內存等。

static blk_status_t mtd_queue_rq(struct blk_mq_hw_ctx *hctx,
                                 const struct blk_mq_queue_data *bd)
{
        struct mtd_blktrans_dev *dev;

        dev = hctx->queue->queuedata;
        if (!dev) {
                blk_mq_start_request(bd->rq);
                return BLK_STS_IOERR;
        }

        spin_lock_irq(&dev->queue_lock);
        list_add_tail(&bd->rq->queuelist, &dev->rq_list);
        mtd_blktrans_work(dev);   // 這裏就不細究了，讀取操作會調用mtdblock_tr.readsect、寫入操作會調用mtdblock_tr.writesect，有興趣自己研究哈
        spin_unlock_irq(&dev->queue_lock);

        return BLK_STS_OK;
}

4.3 MTD塊設備流程圖

register_mtd_blktrans函數執行流程如圖：

MTD塊設備的入口函數：

將blktrans_notifier添加到mtd_notifiers鏈表中；
上圖第一個雙向循環裏mtd_idr樹只有根節點，所以並不會進入循環，循環內這塊代碼不會執行；
然後接着註冊塊設備號主設備號，主設備號爲31，塊設備名稱爲mtdblock；
然後進入下面第二個循環裏，同理，第二個循環也不會進入。

然後在add_mtd_device(mtd)函數中：

爲mtd原始設備分配節點；
設置mtd原始設備的erasesize_shift、writesize_shift、erasesize_mask、writesize_mask等信息；
設置mtd原始設備對應的device類型變量所屬的class爲mtd_class，並設置其設備號，類型、名稱、driver_data；調用device_register完成名字爲mtd%d MTD字符設備的註冊；
調用device_create完成名字爲mtd%dro MTD字符設備的創建、初始化以及註冊；
遍歷blktrans_notifier，當查找到有blktrans_notifier時，就調用blktrans_notifier->add(mtd)：

分配gendisk結構體，設置成員參數：
- private_data；
- 設置主設備號major（MTD_BLOCK_MAJOR，值爲31）；
- 設置起始次設備號first_minor（如果註冊了多個MTD設備，該值是逐漸遞增的）；
- 磁盤設備disk_name，設置爲mtdblock%d，會在/dev下創建該文件；
- 塊設備操作集fops；
初始化請求隊列；
最後註冊gendisk。

比如開發板啓動後，我們加載Nand Flash驅動後，可以查看到如下信息：

[root@zy:/]# ls /sys/class/mtd/ -l
total 0
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd0 -> ../../devices/virtual/mtd/mtd0
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd0ro -> ../../devices/virtual/mtd/mtd0ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd1 -> ../../devices/virtual/mtd/mtd1
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd1ro -> ../../devices/virtual/mtd/mtd1ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd2 -> ../../devices/virtual/mtd/mtd2
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd2ro -> ../../devices/virtual/mtd/mtd2ro
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd3 -> ../../devices/virtual/mtd/mtd3
lrwxrwxrwx    1 0        0                0 Jan  1 01:19 mtd3ro -> ../../devices/virtual/mtd/mtd3ro
[root@zy:/]# ls -l /dev/mtd*
crw-rw----    1 0        0          90,   0 Jan  1 00:00 /dev/mtd0
crw-rw----    1 0        0          90,   1 Jan  1 00:00 /dev/mtd0ro
crw-rw----    1 0        0          90,   2 Jan  1 00:00 /dev/mtd1
crw-rw----    1 0        0          90,   3 Jan  1 00:00 /dev/mtd1ro
crw-rw----    1 0        0          90,   4 Jan  1 00:00 /dev/mtd2
crw-rw----    1 0        0          90,   5 Jan  1 00:00 /dev/mtd2ro
crw-rw----    1 0        0          90,   6 Jan  1 00:00 /dev/mtd3
crw-rw----    1 0        0          90,   7 Jan  1 00:00 /dev/mtd3ro
brw-rw----    1 0        0          31,   0 Jan  1 00:00 /dev/mtdblock0
brw-rw----    1 0        0          31,   1 Jan  1 00:00 /dev/mtdblock1
brw-rw----    1 0        0          31,   2 Jan  1 00:00 /dev/mtdblock2
brw-rw----    1 0        0          31,   3 Jan  1 00:00 /dev/mtdblock3

回到頂部

五、mtdchar.c

之前我們已經介紹過mtdchar.c文件，該文件實現了MTD字符設備相關接口，我們直接定位到drivers/mtd/mtdchar.c文件，並對源碼進行解析。

5.1 模塊入口函數

static const struct file_operations mtd_fops = {  // 字符設備操作集
        .owner          = THIS_MODULE,
        .llseek         = mtdchar_lseek,
        .read           = mtdchar_read,
        .write          = mtdchar_write,
        .unlocked_ioctl = mtdchar_unlocked_ioctl,
#ifdef CONFIG_COMPAT
        .compat_ioctl   = mtdchar_compat_ioctl,
#endif
        .open           = mtdchar_open,
        .release        = mtdchar_close,
        .mmap           = mtdchar_mmap,
#ifndef CONFIG_MMU
        .get_unmapped_area = mtdchar_get_unmapped_area,
        .mmap_capabilities = mtdchar_mmap_capabilities,
#endif
};

int __init init_mtdchar(void)
{
        int ret;

        ret = __register_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS,    // MTD字符設備主設備號90， MINORBITS=20
                                   "mtd", &mtd_fops);  // 字符設備名稱爲mtd%d
        if (ret < 0) {
                pr_err("Can't allocate major number %d for MTD\n",
                       MTD_CHAR_MAJOR);
                return ret;
        }

        return ret;
}

5.2 __register_chrdev

定位到__register_chrdev函數，該函數位於fs/char_dev.c：

/**
 * __register_chrdev() - create and register a cdev occupying a range of minors
 * @major: major device number or 0 for dynamic allocation
 * @baseminor: first of the requested range of minor numbers
 * @count: the number of minor numbers required
 * @name: name of this range of devices
 * @fops: file operations associated with this devices
 *
 * If @major == 0 this functions will dynamically allocate a major and return
 * its number.
 *
 * If @major > 0 this function will attempt to reserve a device with the given
 * major number and will return zero on success.
 *
 * Returns a -ve errno on failure.
 *
 * The name of this device has nothing to do with the name of the device in
 * /dev. It only helps to keep track of the different owners of devices. If
 * your module name has only one type of devices it's ok to use e.g. the name
 * of the module here.
 */
int __register_chrdev(unsigned int major, unsigned int baseminor,
                      unsigned int count, const char *name,
                      const struct file_operations *fops)
{
        struct char_device_struct *cd;
        struct cdev *cdev;
        int err = -ENOMEM;

        cd = __register_chrdev_region(major, baseminor, count, name); // 靜態註冊一組字符設備號
        if (IS_ERR(cd))
                return PTR_ERR(cd);

        cdev = cdev_alloc();  // 動態申請字符設備
        if (!cdev)
                goto out2;

        cdev->owner = fops->owner;  // 初始化字符設備
        cdev->ops = fops;
        kobject_set_name(&cdev->kobj, "%s", name);

        err = cdev_add(cdev, MKDEV(cd->major, baseminor), count);  // 將字符設備註冊到系統
        if (err)
                goto out;

        cd->cdev = cdev;

        return major ? 0 : cd->major;
out:
        kobject_put(&cdev->kobj);
out2:
        kfree(__unregister_chrdev_region(cd->major, baseminor, count));
        return err;
}

實際上我們發現模塊入口函數中主要進行了：

字符設備號的申請，主設備號90，次設備號數量1<<20；
字符設備的動態申請；
字符設備的註冊；

但是這裏並沒有創建class類、以及類下的文件，這一塊是在add_mtd_device中實現的：

調用class_create、device_create生成/sys/class下的class類（這裏爲mtd）以及class類下的dev文件，供mdev程序掃描生成/dev下的節點；

參考文章

[1]linux MTD系統解析（轉）

[2]痞子衡嵌入式：並行NAND接口標準(ONFI)及SLC Raw NAND簡介

[3]最新SSD固態硬盤顆粒QLC、SLC、MLC、TLC詳解

[4]35.驅動--MTD子系統

[5]MTD NANDFLASH驅動相關知識介紹