boke 測試

test
Android上普遍使用UBI文件系統,根據UBI官方文檔http://www.linux-mtd.infradead.org/doc/ubi.html#L_ubi_operations的說法,UBI比傳統的MTD驅動有很多好處,比如屏蔽了壞塊管理,可以均衡負載,可以更新分區,ECC出錯時也能自動搬移數據到好的塊,這些都簡化了上層軟件的工作量
UBI volumes have no eraseblock wear-leveling constraints, so users do not have to care about this at all, which means the upper level software may be simpler;
UBI volumes have no bad eraseblocks, which also leads to simpler upper level software;
UBI volumes are dynamic in a sense that they may be created, removed or re-sized dynamically, while MTD partitions are static;
UBI handles bit-flips which again makes the upper level software simpler;
UBI provides a volume update operations which makes it easy to detect interrupted software updates and recover;
UBI provides an atomic logical eraseblock change operation which allows to change the contents of a logical eraseblock without loosing the data if an unclean reboot happens during the operation; this is might be very useful for the upper-level software (e.g., for a file-system);
UBI has an un-map operation, which just un-maps a logical eraseblock from the physical eraseblock, schedules the physical eraseblock for erasure and returns; this is very quick and frees upper level software from implementing their own mechanisms to defer erasures (e.g., JFFS2 has to implements such mechanisms).
至於工具git://git.infradead.org/mtd-utils.git
ubinfo - provides information about UBI devices and volumes found in the system;
ubiattach - attaches MTD devices (which describe raw flash) to UBI and creates corresponding UBI devices;
ubidetach - detaches MTD devices from UBI devices (the opposite to what ubiattach does);
ubimkvol - creates UBI volumes on UBI devices;
ubirmvol - removes UBI volumes from UBI devices;
ubiupdatevol - updates UBI volumes; this tool uses the UBI volume update feature which leaves the volume in "corrupted" state if the update was interrupted; additionally, this tool may be used to wipe out UBI volumes;
ubicrc32 - calculates CRC-32 checksum of a file with the same initial seed as UBI would use;
ubinize - generates UBI images;
ubiformat - formats empty flash, erases flash and preserves erase counters, flashes UBI images to MTD devices;
mtdinfo - reports information about MTD devices found in the system.
下面來看看UBI文件系統的具體結構:

UBI headers
在每個物理塊上面,UBI均存有兩個64bytes的頭:
erase counter header (or EC header) which contains the erase counter of the physical eraseblock (PEB) plus some other not so important information;
volume identifier header (or VID header) which stores volume ID and logical eraseblock (LEB) number this PEB belongs to (plus some other not so important information).
這也是爲什麼邏輯上的每個塊的可擦寫大小比物理上小的原因,這兩個頭是用CRC-32的checksum保護起來的,drivers/mtd/ubi/ubi-media.h中有詳細的關於頭部分的內容。
當UBI文件系統attache一個MTD設備時,首先讀取頭信息,並且進行校驗,將擦寫技術以及物理區塊(PEB)-邏輯區塊(LEB)映射表都讀取到RAM中。
當UBI擦寫一個物理區塊(PEB)以後,擦寫計數會隨着增加。這意味着EC header會永遠的存在於物理區塊中,除非是在當舊的EC header被擦除,新的EC header寫入的過程中掉電,那麼UBI在下次掃描這可區塊的時候,會寫入一個平均的擦寫計數值。
VID header只是當前塊被啓用時纔會被寫入:

The LEB un-map operation just un-maps the LEB from the PEB and schedules the PEB for erasure. When the PEB is erased, the EC header is written straight away. The VID header is not written.
The LEB map operation or a write operation to an un-mapped LEB makes UBI find an appropriate PEB and write the VID header to it (the EC header must already be there). Note, the write operation to an already mapped LEB just writes the data straight to PEB and does not change the UBI headers.
UBI之所以把頭信息分爲兩個區來存儲,是因爲EC header和VID header會在不同時刻進行存儲,這樣帶來的好處:
after a PEB is erased, the EC header is written straight away, which minimizes the probability of losing the erase counter due to unclean reboots;
when UBI associates a PEB with an LEB, the VID header is written to the PEB.
當EC header被寫入物理區塊的時候,UBI系統還不知道這個塊會被關聯到哪個分區和邏輯區塊(LEB)
UBI volume table
分區表是一個存儲在flash上的數據結構,它包含UBI文件系統中每一個分區信息,可以把它看成是一個分區表數據的數組,每一個分區記錄的結構如下:
volume size;
volume name;
volume type (dynamic or static);
volume alignment;
update marker (set for volumes which had interrupted updates;
auto-resize flag;
CRC-32 checksum for this record.
數組下標和分區號一一對應,分區表的數量受LEB大小的限制不能大於128,也就是說UBI文件系統最多能有128個卷
當分區被創建,移動,重新分配大小,重命名或者被更新時,對應的分區表數據會被更新,UBI維護兩個分區表數據的拷貝,這樣即便在更新時掉電,分區表信息也能被恢復。從UBI內部看來,分區表存儲於一個特殊的分區中,稱之爲layout volume,佔用兩個LEB的大小,每一個對應一個分區表的copy,這部分數據對用戶透明,由UBI自己維護,對於這個分區數據的更新的機制和其他數據分區一樣:
Prepare in-memory buffer with the new volume table contents.
Un-map LEB0 of the layout volume.
Write the new volume table to LEB0.
Un-map LEB1 of the layout volume.
Write the new volume table to LEB1.
Flush the UBI work queue to make sure the PEBs are corresponding to the un-mapped LEBs are erased.
當UBI系統關聯MTD設備的時候,會首先檢查兩個分區表是否一致,如果不一致,那麼首先將LEB0複製到LEB1,如果其中一個損壞,那麼就用另外一個來恢復。
Minimum flash input/output unit
UBI認爲flash或者MTD設備由可擦寫的好塊與壞塊組成,每個好快都可以被讀寫和擦除,好塊也可以標記爲壞塊。
最小的存儲單元依據不同類型的flash而不同:
NOR flashes usually have min. I/O unit size of 1 byte, because NOR flashes usually allow reading and writing single bytes (in fact, it is even be possible to change individual bits).
Some NOR flashes may have other min. I/O unit sizes, e.g. 16 or 32 bytes in case of ECC'd NOR flashes.
NAND flashes usually have 512, 2048 or 4096 byte min. I/O. unit size, which corresponds to NAND page size. NAND flashes store per-NAND page ECC codes in the OOB area, which means that whole NAND page has to be written at once to calculate the ECC code, and whole NAND page has to be read at once to check the ECC code.
最小存儲單位是MTD設備一個非常重要的屬性:
VID header 的物理存儲位置依賴於此,也就是說LEB的大小也由此決定,通常情況下,LEB的size比最小存儲單位要小,NOR flashes usually have min. I/O unit size of 1 byte, because NOR flashes usually allow reading and writing single bytes (in fact, it is even be possible to change individual bits).
所有對於LEB的寫操作,都需要針對最小存儲單位來對齊,雖然對於讀操作似乎沒有這樣的規定,但是實際上在MTD這一層也是一樣,只是最後將緩衝區中用戶請求的大小copy回給用戶而已
NAND flash sub-pages
前面提到所有的寫操作都需要對齊,對於NAND則是針對page大小,雖然有些SLC的flash允許更小的單位,在MTD這一層我們稱之爲sub-pages,並不是所有的NAND都有sub-pages
MLC NANDs do not have sub-pages, at least to the date of writing of this piece of documentation (April 2009).
SLC NANDs usually do have sub-pages. E.g., 512-byte NAND pages usually consist of 2x256-byte sub-pages, and 2048-byte NAND pages consist of 4x512-byte sub-pages.
SLC OneNAND chips with 2048 bytes NAND page size have 4x512-byte sub-pages.
比如,對於128KiB的block大小,2048-byte page的flash,如果沒有sub-pages,EC header 存於第一個page,VID header存儲於2048偏移處,LEB大小爲128KiB-2048-2048=124KiB。如果有sub-pages,那麼EC header存儲於第一個sub-page,VID header存儲於512偏移處(第二個sub-page),LEB大小變爲128KiB-2048=126KiB
in case of NOR flash which has 1 byte min. I/O unit, the VID header resides at offset 64;
in case of NAND flash which does not have sub-pages, the VID header resides at the second NAND page;
in case of NAND flash which has sub-pages, the VID header resides at the second sub-page.
Sub-pages只是UBI內部用於存儲頭信息,UBI的API不允許用戶訪問sub-pages,因爲爲了寫一個sub-page的數據,驅動需要對整個page做寫操作,比如寫4個sub-page的時間會4倍於一個page的時間。
UBI headers position
EC header always resides at offset 0 and takes 64 bytes, the VID header resides at the next available min. I/O unit or sub-page, and also takes 64 bytes. For example:
in case of NOR flash which has 1 byte min. I/O unit, the VID header resides at offset 64;
in case of NAND flash which does not have sub-pages, the VID header resides at the second NAND page;
in case of NAND flash which has sub-pages, the VID header resides at the second sub-page.
Flash space overhead
UBI系統本身會佔用一些flash空間,從而用戶能使用的falsh空間會減少:
2 PEBs are used to store the volume table;
1 PEB is reserved for wear-leveling purposes;
1 PEB is reserved for the atomic LEB change operation;
some amount of PEBs is reserved for bad PEB handling; this is applicable for NAND flash, but not for NOR flash; the percentage of reserved PEBs is configurable and is 1% by default;
UBI stores the EC and VID headers at the beginning of each PEB; the amount of bytes used for these purposes depends on the flash type and is explained below.
Lets introduce symbols:
P - total number of physical eraseblocks on the MTD device;
SP - physical eraseblock size;
SL - logical eraseblock size;
B - number of PEBs reserved for bad PEB handling; it is 1% of P for NAND by default, and 0 for NOR and other flash types which do not have bad PEBs;
O - the overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL.
The UBI overhead is (B + 4) SP + O (P - B - 4) i.e., this amount of bytes will not be accessible for users. O is different for different flashes:
in case of NOR flash which has 1 byte minimum input/output unit, O is 128 bytes;
in case of NAND flash which does not have sub-pages (e.g., MLC NAND), O is 2 NAND pages, i.e. 4KiB in case of 2KiB NAND page and 1KiB in case of 512 bytes NAND page;
in case of NAND flash which has sub-pages, UBI optimizes its on-flash layout and puts the EC and VID headers at the same NAND page, but different sub-pages; in this case O is only one NAND page;
for other flashes the overhead should be 2 min. I/O units if the min. I/O unit size is greater or equivalent to 64 bytes, and 2 times 64 bytes aligned to the min. I/O unit size if the min. I/O unit size is less than 64 bytes.
Saving erase counters
使用 UBI文件系統,最重要的是要認識到UBI存儲EC頭在每一個物理擦除塊(PEB)上用以記錄當前塊被擦寫的次數。當然這部分信息也是在做擦寫操作時要避免丟失的。

How UBI flasher should work
下面看看UBI文件系統是如何擦除flash和燒寫映像的

First of all, scan the flash and collect the erase counters. Namely, it read the EC header from each PEB, check the CRC-32 checksum of the header, and save the erase counter in a RAM. It is not necessary to read VID headers. Bad PEBs should be skipped.
Calculate average erase counter. It should be used for PEBs with corrupted or missing EC headers. Such PEBs may be there because of unclean reboots, but there shouldn't be too many of them.
If the intention is to just erase the flash, then each PEB has to be erased and proper EC header has to be written at the beginning of the PEB. The EC header should contain incremented erase counter. Bad PEBs should be just skipped. For NAND flashes, in case of I/O errors while erasing or writing, the PEB should be marked as bad (see here for more information how UBI marks PEBs as bad).
If the intention is to flash an UBI image, then the flasher should do the following for each non-bad PEB.
Read the contents of this PEB from the UBI image (PEB size bytes) into a buffer.
Stripe min. I/O units full of 0xFF bytes from the end of the buffer (the details are given below in this section).
Erase the PEB.
Change the EC header in the buffer - put the new erase counter value there and re-calculate the CRC-32 checksum.
Write the buffer to the physical eraseblock.
As usually, bad PEBs should be just skipped. And for NAND flashes, in case I/O errors while erasing or writing, the PEB should be marked as bad.
一般情況下要燒寫的文件大小都小於flash的大小,所以燒寫程序需要燒寫所有用到的物理區塊,然後擦除未使用的區塊。

需要注意到的是,UBI燒寫時並不一定按照輸入文件的區塊順序來做燒錄,也就是說輸入文件的第一個區塊不一定就寫在第一個物理區塊(PEB)上,也有可能寫在第二個甚至是最後一個。

如果你要寫一個在生產線上燒寫UBI文件的程序,那麼你並不需要去改變輸入映像裏的EC值,因爲對於一個新的flash,所有的PEB的EC值都是0,所以程序的邏輯相對簡單。

當你的燒寫映像包含UBI文件系統,並且你使用NAND,你需要在每一個沒有使用完的PEB末端補上0xFF,雖然並非所有的NAND都要求這麼做,但是如果不這麼做,有可能在後續出現非常難於debug的問題。

在擦寫時,實際擦寫大小和燒寫映像大小保持一致是有意義的,而不是每一次都擦寫所有空間,也就是說,燒寫程序需要丟掉所有從末端起空的物理區塊(PEB),這麼做同時也減小了擦寫時間。不光是UBI文件系統,JFFS2也同樣需要注意這個問題,如果不這樣做,當會遇到ECC錯誤。

當使用mkfs.ubifs產生UBI文件映像時,可以使用參數(--space-fixup)來避免這個問題。

Marking eraseblocks as bad
UBI文件系統在兩種情況下會把PEB標記爲壞塊:
1.當寫一個區塊失敗時,UBI將要寫到這個區塊的數據寫到其它區塊,並且開始對這個區塊做再次校驗。
2.擦寫操作遇到了EIO錯誤,此時這個區塊直接標識爲壞塊。

再次校驗的過程是在後臺進行的,目的是檢查區塊是否真正損壞,因爲操作失敗有可能是由其他原因引起,比如驅動本身和不當的文件系統調用(比如對同一區塊做多次寫操作),校驗包括以下步驟:
erase the eraseblock;
read it back and make sure it contains only 0xFF bytes;
write test pattern bytes;
read the eraseblock back and check the pattern;
and so on for several patterns (0xA5, 0x5A, 0x00).
如果區塊通過了校驗,那麼不會被標記爲壞塊,比如在校驗過程中檢測到bit-flip。參見torture_peb()函數

Scalability issues

UBI系統初始化時需要讀取所有PEB上的頭信息,所以顯然flash尺寸越大,那麼所花時間也就會越大。
UBI scans the MTD device when attaching - it reads the erase EC and VID headers from every single PEB; the headers are small (64 bytes each), so this means reading 128 bytes from each PEB in case of NOR flash or one or two NAND pages in case of NAND flash (this depends on whether the NAND flash supports sub-pages or not); this is anyway much less than JFFS2 needs to read when it mounts MTD devices, so UBI attaches MTD devices many times faster than JFFS2 would mount a file system on the same MTD device;
UBI calculates CRC-32 checksum of each EC and VID header, which consumes CPU, although this is usually minor comparing to the flash I/O overhead

一些具體數據:
a 256MiB OneNAND flash found in Nokia N800 devices is attached for less than 1 sec; the flash does support sub-pages so UBI has to read the first 2KiB NAND page of each PEB while scanning;
a 1GiB NAND flash found in OLPC XO-1 devices is attached for about 2 seconds; the flash is an SLC NAND and supports sub-pages, but the Cafe controller which is used in the laptop does not allow sub-page writes, so UBI has to read two 2KiB NAND pages from each PEB.
Implementation details
UBI文件系統運行需要3個表:
volume table維護在flash上,只在volume創建,刪除和重新分配大小時纔會被改變,時延性要求不高,所以管理機制很簡單。
EBA和EC table在每次LEB映射到PEB或者PEB被擦除,這會發生的相當頻繁,所以管理機制要求速度快而且高效。
在flash上維護EBA和EC table無法滿足時延和效率的要求,因此UBI系統在每次attach MTD設備時,在RAM裏面建立這兩個表,這意味着,UBI需要掃描整個flash以讀取每個PEB上的EC和VID頭,然後在RAM裏構建EBA和EC table。

Volume auto-resize
Nand芯片在出廠的時候,會有一些PEB會標記爲壞塊,每個芯片壞塊的數量和位置都是不一樣的,芯片商一般都會保證開始的幾個物理區塊是好的,並且壞塊數量不會超過一定的比率,比如一個256M的三星OneNAND不超過40個128KiB PEBs(當然隨着使用時間,壞塊會有所增加),大約佔總容量的2%

當你要創建一個UBI映像並燒寫到flash上時,你需要規劃每個volume的大小(the sizes are stored in the UBI volume table),但是由於壞塊數量的不同,每個flash的實際容量大小都不一樣。

一個解決辦法就是按照最壞的預期,每個flash都有最大數量的壞塊。但是實際上每個flash擁有的壞塊數量遠沒有那麼多,當然這可以增加可靠性,因爲UBI總是使用全部的flash區塊。另外一方面,UBI通常都預留大約1%的區塊來處理壞塊問題.對於之前提到的三星NAND那麼就是說,有1%的flash預留,然後0-2%的flash容量不可用。

還有另外一個辦法就是使用auto-resize,UBI會在第一次運行的時候擴大volume的大小,然後去掉存儲在volume table裏面的auto-resize的標記,並且只能有一個volume有auto-resize標記。

對於之前例子裏的flash,如果有分區使用了auto-resize,那麼可用容量會增加0-2%,但是UBI仍然會預留1%。
linux內核2.6.25以上支持auto-resize
volume table which contains per-volume information, like volume size, type, etc;
eraseblock association (EBA) table which contains the logical-to-physical eraseblock mapping information; for example, when reading an LEB, UBI first looks up the table to find the corresponding PEB number, then reads from this PEB;
erase counters (EC) table which contains the erase counter value for each physical eraseblock; UBI wear-leveling sub-system uses this table when it needs to find, for example, a highly worn-out LEB;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章