123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141 |
- dm-log-writes
- =============
- This target takes 2 devices, one to pass all IO to normally, and one to log all
- of the write operations to. This is intended for file system developers wishing
- to verify the integrity of metadata or data as the file system is written to.
- There is a log_write_entry written for every WRITE request and the target is
- able to take arbitrary data from userspace to insert into the log. The data
- that is in the WRITE requests is copied into the log to make the replay happen
- exactly as it happened originally.
- Log Ordering
- ============
- We log things in order of completion once we are sure the write is no longer in
- cache. This means that normal WRITE requests are not actually logged until the
- next REQ_PREFLUSH request. This is to make it easier for userspace to replay
- the log in a way that correlates to what is on disk and not what is in cache,
- to make it easier to detect improper waiting/flushing.
- This works by attaching all WRITE requests to a list once the write completes.
- Once we see a REQ_PREFLUSH request we splice this list onto the request and once
- the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only
- completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
- simulate the worst case scenario with regard to power failures. Consider the
- following example (W means write, C means complete):
- W1,W2,W3,C3,C2,Wflush,C1,Cflush
- The log would show the following
- W3,W2,flush,W1....
- Again this is to simulate what is actually on disk, this allows us to detect
- cases where a power failure at a particular point in time would create an
- inconsistent file system.
- Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
- they complete as those requests will obviously bypass the device cache.
- Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would
- have all the DISCARD requests, and then the WRITE requests and then the FLUSH
- request. Consider the following example:
- WRITE block 1, DISCARD block 1, FLUSH
- If we logged DISCARD when it completed, the replay would look like this
- DISCARD 1, WRITE 1, FLUSH
- which isn't quite what happened and wouldn't be caught during the log replay.
- Target interface
- ================
- i) Constructor
- log-writes <dev_path> <log_dev_path>
- dev_path : Device that all of the IO will go to normally.
- log_dev_path : Device where the log entries are written to.
- ii) Status
- <#logged entries> <highest allocated sector>
- #logged entries : Number of logged entries
- highest allocated sector : Highest allocated sector
- iii) Messages
- mark <description>
- You can use a dmsetup message to set an arbitrary mark in a log.
- For example say you want to fsck a file system after every
- write, but first you need to replay up to the mkfs to make sure
- we're fsck'ing something reasonable, you would do something like
- this:
- mkfs.btrfs -f /dev/mapper/log
- dmsetup message log 0 mark mkfs
- <run test>
- This would allow you to replay the log up to the mkfs mark and
- then replay from that point on doing the fsck check in the
- interval that you want.
- Every log has a mark at the end labeled "dm-log-writes-end".
- Userspace component
- ===================
- There is a userspace tool that will replay the log for you in various ways.
- It can be found here: https://github.com/josefbacik/log-writes
- Example usage
- =============
- Say you want to test fsync on your file system. You would do something like
- this:
- TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
- dmsetup create log --table "$TABLE"
- mkfs.btrfs -f /dev/mapper/log
- dmsetup message log 0 mark mkfs
- mount /dev/mapper/log /mnt/btrfs-test
- <some test that does fsync at the end>
- dmsetup message log 0 mark fsync
- md5sum /mnt/btrfs-test/foo
- umount /mnt/btrfs-test
- dmsetup remove log
- replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
- mount /dev/sdb /mnt/btrfs-test
- md5sum /mnt/btrfs-test/foo
- <verify md5sum's are correct>
- Another option is to do a complicated file system operation and verify the file
- system is consistent during the entire operation. You could do this with:
- TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
- dmsetup create log --table "$TABLE"
- mkfs.btrfs -f /dev/mapper/log
- dmsetup message log 0 mark mkfs
- mount /dev/mapper/log /mnt/btrfs-test
- <fsstress to dirty the fs>
- btrfs filesystem balance /mnt/btrfs-test
- umount /mnt/btrfs-test
- dmsetup remove log
- replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
- btrfsck /dev/sdb
- replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
- --fsck "btrfsck /dev/sdb" --check fua
- And that will replay the log until it sees a FUA request, run the fsck command
- and if the fsck passes it will replay to the next FUA, until it is completed or
- the fsck command exists abnormally.
|