ILD

ubi & ubifs power cut 场景下数据一致性
作者:Yuan Jianpeng 邮箱:yuanjp89@163.com
发布时间:2022-4-24 站点:Inside Linux Development

附件中的文档,可以完整的学习ubi和ubifs。但本文重点关注文档中关于 ubi 和 ubifs 的 unclean reboot 和 power cut 场景下的数据一致性。


UBI Power-cuts tolerance

Both UBI and UBIFS are designed with tolerance to power-cuts in mind.


UBI has an internal debugging infrastructure that can emulate power failures for testing. The advantage of the emulation is that it emulates power failures at the critical points where control data structures are written to the device, whereas the probability of interrupting the system at those precise moments with physical power-cut testing is rather low.


UBI Power-cut recovery testing

UBI suppors power-cut emulation for testing which emulates power-cuts after a random number of writes. When a power-cut is emulated, UBI switches to read-only mode and disallows any further write to the UBI volume, thus emulating a power cut. The main idea of this mode is to emulate power cuts in interesting places, e.g. when writing the vid header.


有3个debugfs可以用来模拟ubi的异常断电:

/sys/kernel/debug/ubi/ubi0/

tst_emulate_power_cut

tst_emulate_power_cut_max

tst_emulate_power_cut_min


Emulation typeFlag value
Allow power-cut to be emulated during EC header write1
Allow power-cut to be emulated during VID header write2


min和max指定成功写入的次数。


UBIFS

tolerance to unclean reboots - UBIFS is a journaling file system and it tolerates sudden crashes and unclean reboots; UBIFS just replays the journal and recovers from the unclean reboot; mount time is a little bit slower in this case, because of the need to replay the journal, but UBIFS does not need to scan whole media, so it anyway takes fractions of a second to mount UBIFS; note, authors payed special attention to this UBIFS aspect


UBIFS Power-cuts tolerance

UBIFS has internal debugging infrastructure to emulate power failures and the authors used it for extensive testing. It was tested for long time with power-fail emulation. The advantage of the emulation is that it emulates power failures even at the situations which happen not very often. For example, when the master node is updated, or the log is changed. The probability to interrupt the system at those moments is very low in real-life.


There is also a powerful user-space test program called integck which performs a lot of random I/O operations and checks the integrity of the FS after remount. This test can also handle emulated power-cuts and check the FS integrity.


UBIFS Write-back support

UBIFS supports write-back, which means that file changes do not go to the flash media straight away, but they are cached and go to the flash later, when it is absolutely necessary. This helps to greatly reduce the amount of I/O which results in better performance. Write-back caching is a standard technique which is used by most file systems like ext3 or XFS.


UBIFS write-buffer

Write-buffer is an additional UBIFS buffer, which is implemented inside UBIFS, and it sits between the page cache and the flash. This means that write-back actually writes to the write-buffer, not directly to the flash.


Write-buffer size is equivalent to NAND page size (so it is tiny comparing to the page cache). It's purpose is to accumulate small writes, and write full NAND pages instead of partially filled.


The write-buffer implementation is a little more complex, and we actually have several of them - one for each journal head. But this does not change the basic idea behind the write-buffer.



Few notes with regards to synchronization:


Take into account that write-buffers delay the data synchronization timeout defined by "dirty_expire_centisecs" (see here) by 3-5 seconds. However, since write-buffers are small, only few data are delayed.


UBIFS in synchronous mode vs JFFS2

jffs2将meta data存储在data node的头中。所以jffs2扫描到最新的节点时,就知道了meta data。顺序写入发生断电的时候,知会丢失结尾的一部分数据。

In JFFS2 all the meta-data (like inode atime/mtime/ctime, inode size, UID/GID, etc) are stored in the data node headers. Data nodes carry 4KiB of (compressed) data. This means that the meta-data information is duplicated in many places, but this also means that every time JFFS2 writes a data node to the flash media, it updates inode size as well. So when JFFS2 mounts it scans the flash media, finds the latest data node, and fetches the inode size from there.

In practice this means that JFFS2 will write these 10MiB of data sequentially, from the beginning to the end. And if you have a power cut, you will just lose some amount of data at the end of the inode. For example, if JFFS2 starts writing those 10MiB of data, write 5MiB, and a power cut happens, you will end up with a 5MiB f.dat file. You lose only the last 5MiB.


ubifs的情况点复杂,因为ubifs的meta data存在单独的inode节点。ubifs的策略是:

写入的data node不能超过flash上的inode中的size。但是可以超过内存中inode的size。如果超过了,ubifs

会先更新inode节点,然后在更新数据节点。如果更新数据节点发生了丢失,将导致文件结尾有些空洞。


UBIFS Checksumming

Every piece of information UBIFS writes to the media has a CRC-32 checksum. UBIFS protects both data and meta-data with CRC. Every time the meta-data is read, the CRC checksum is verified.


The data CRC is not verified by default. We do this to improve the default file-system read speed. 

But UBIFS allows to switch the data verification on using the chk_data_crc mount option. 


Note, currently UBIFS cannot disable CRC-32 calculations on write, because UBIFS recovery process depends on in. When recovering from an unclean reboot and re-playing the journal, UBIFS has to be able to detect broken and half-written UBIFS nodes and drop them, and UBIFS depends on the CRC-32 checksum here.


In other words, if you use UBIFS with data CRC-32 checking disabled, you still have the CRC-32 checksum attached to each piece of data, and you may mount UBIFS with the chk_data_crc option to enable CRC-32 checking at any time


meta-data和 data node总是写入CRC。但是只有meta-data会做CRC校验。data node默认不做CRC检查。但是可以通过

挂载选项做CRC检查。


UBIFS faq How do I change a file atomically?

Changing a file atomically means changing its contents in a way that unclean reboots could not lead to any corruption or inconsistency in the file.


The only reliable way to do this in UBIFS (and in most of other file-systems, e.g. JFFS2 or ext3) is the following:

Note, if a power-cut happens during the re-naming, the original file will be intact because the re-name operation is atomic. This is a POSIX requirement and UBIFS satisfies it.


OpenWrt UCI使用这个方法。它在uci commit的时候,先写入到一个文件,最后rename。


UBIFS faq Why is my file empty after an unclean reboot?

Zero-length files are a special case of corruption which happens when an application first truncates a file, then updates it. The truncation is synchronous in UBIFS, so it is written to the media straight away. But when the data are written, they go to the page cache, not to the flash media. So when an unclean reboot happens, the file becomes empty (truncated) because the data are lost.


Zero-length files also appear when an application creates a new file, then writes to the file, and a power cut happens. The reason is similar - file creation is a synchronous operation, data writing is not.


Well, the description is a bit simplified. Actually, when a file is created or truncated, the creation/truncation UBIFS information is written to the write-buffer, not straight to the media. So if a power cut happens before the write-buffer is synchronized, the file will disappear (creation case) or stay intact (truncation case). But since the write-buffer is small and all UBIFS writes go there, it is usually synchronized very soon. After this point the file is created/truncated for real.


参考

【1】http://www.linux-mtd.infradead.org/doc/ubi.html


【2】Thomas Gleixner, Frank Haverkamp, Artem Bityutskiy. UBI - Unsorted Block Images.

http://www.linux-mtd.infradead.org/doc/ubidesign/ubidesign.pdf


【3】https://www.linux-mtd.infradead.org/doc/ubifs.html


【4】Adrian Hunter, Artem Bityutskiy. UBIFS file system, NOKIA.

http://www.linux-mtd.infradead.org/doc/ubifs.pdf


【5】Adrian Hunter. A Brief Introduction to the Design of UBIFS. 2008.

http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf


【6】UBI FAQ and HOWTO

http://www.linux-mtd.infradead.org/faq/ubi.html


【7】UBIFS FAQ and HOWTO

http://www.linux-mtd.infradead.org/faq/ubifs.html


【8】https://www.kernel.org/doc/html/latest/filesystems/ubifs.html


【9】Katsuki. Evaluation of UBI and UBIFS. TOSHIBA. 2009

https://elinux.org/images/f/f8/CELFJamboree30-UBIFS_update.pdf


Theodore Ts'o. Delayed allocation and the zero-length file problem. 2009

https://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/


Theodore Ts'o. Don’t fear the fsync! 2009

https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/


Copyright © linuxdev.cc 2017-2024. Some Rights Reserved.