123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313 |
- dm-raid
- =======
- The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
- It allows the MD RAID drivers to be accessed using a device-mapper
- interface.
- Mapping Table Interface
- -----------------------
- The target is named "raid" and it accepts the following parameters:
- <raid_type> <#raid_params> <raid_params> \
- <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
- <raid_type>:
- raid0 RAID0 striping (no resilience)
- raid1 RAID1 mirroring
- raid4 RAID4 with dedicated last parity disk
- raid5_n RAID5 with dedicated last parity disk suporting takeover
- Same as raid4
- -Transitory layout
- raid5_la RAID5 left asymmetric
- - rotating parity 0 with data continuation
- raid5_ra RAID5 right asymmetric
- - rotating parity N with data continuation
- raid5_ls RAID5 left symmetric
- - rotating parity 0 with data restart
- raid5_rs RAID5 right symmetric
- - rotating parity N with data restart
- raid6_zr RAID6 zero restart
- - rotating parity zero (left-to-right) with data restart
- raid6_nr RAID6 N restart
- - rotating parity N (right-to-left) with data restart
- raid6_nc RAID6 N continue
- - rotating parity N (right-to-left) with data continuation
- raid6_n_6 RAID6 with dedicate parity disks
- - parity and Q-syndrome on the last 2 disks;
- laylout for takeover from/to raid4/raid5_n
- raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
- - layout for takeover from raid5_la from/to raid6
- raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
- - layout for takeover from raid5_ra from/to raid6
- raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk
- - layout for takeover from raid5_ls from/to raid6
- raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk
- - layout for takeover from raid5_rs from/to raid6
- raid10 Various RAID10 inspired algorithms chosen by additional params
- (see raid10_format and raid10_copies below)
- - RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
- - RAID1E: Integrated Adjacent Stripe Mirroring
- - RAID1E: Integrated Offset Stripe Mirroring
- - and other similar RAID10 variants
- Reference: Chapter 4 of
- http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
- <#raid_params>: The number of parameters that follow.
- <raid_params> consists of
- Mandatory parameters:
- <chunk_size>: Chunk size in sectors. This parameter is often known as
- "stripe size". It is the only mandatory parameter and
- is placed first.
- followed by optional parameters (in any order):
- [sync|nosync] Force or prevent RAID initialization.
- [rebuild <idx>] Rebuild drive number 'idx' (first drive is 0).
- [daemon_sleep <ms>]
- Interval between runs of the bitmap daemon that
- clear bits. A longer interval means less bitmap I/O but
- resyncing after a failure is likely to take longer.
- [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
- [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
- [write_mostly <idx>] Mark drive index 'idx' write-mostly.
- [max_write_behind <sectors>] See '--write-behind=' (man mdadm)
- [stripe_cache <sectors>] Stripe cache size (RAID 4/5/6 only)
- [region_size <sectors>]
- The region_size multiplied by the number of regions is the
- logical size of the array. The bitmap records the device
- synchronisation state for each region.
- [raid10_copies <# copies>]
- [raid10_format <near|far|offset>]
- These two options are used to alter the default layout of
- a RAID10 configuration. The number of copies is can be
- specified, but the default is 2. There are also three
- variations to how the copies are laid down - the default
- is "near". Near copies are what most people think of with
- respect to mirroring. If these options are left unspecified,
- or 'raid10_copies 2' and/or 'raid10_format near' are given,
- then the layouts for 2, 3 and 4 devices are:
- 2 drives 3 drives 4 drives
- -------- ---------- --------------
- A1 A1 A1 A1 A2 A1 A1 A2 A2
- A2 A2 A2 A3 A3 A3 A3 A4 A4
- A3 A3 A4 A4 A5 A5 A5 A6 A6
- A4 A4 A5 A6 A6 A7 A7 A8 A8
- .. .. .. .. .. .. .. .. ..
- The 2-device layout is equivalent 2-way RAID1. The 4-device
- layout is what a traditional RAID10 would look like. The
- 3-device layout is what might be called a 'RAID1E - Integrated
- Adjacent Stripe Mirroring'.
- If 'raid10_copies 2' and 'raid10_format far', then the layouts
- for 2, 3 and 4 devices are:
- 2 drives 3 drives 4 drives
- -------- -------------- --------------------
- A1 A2 A1 A2 A3 A1 A2 A3 A4
- A3 A4 A4 A5 A6 A5 A6 A7 A8
- A5 A6 A7 A8 A9 A9 A10 A11 A12
- .. .. .. .. .. .. .. .. ..
- A2 A1 A3 A1 A2 A2 A1 A4 A3
- A4 A3 A6 A4 A5 A6 A5 A8 A7
- A6 A5 A9 A7 A8 A10 A9 A12 A11
- .. .. .. .. .. .. .. .. ..
- If 'raid10_copies 2' and 'raid10_format offset', then the
- layouts for 2, 3 and 4 devices are:
- 2 drives 3 drives 4 drives
- -------- ------------ -----------------
- A1 A2 A1 A2 A3 A1 A2 A3 A4
- A2 A1 A3 A1 A2 A2 A1 A4 A3
- A3 A4 A4 A5 A6 A5 A6 A7 A8
- A4 A3 A6 A4 A5 A6 A5 A8 A7
- A5 A6 A7 A8 A9 A9 A10 A11 A12
- A6 A5 A9 A7 A8 A10 A9 A12 A11
- .. .. .. .. .. .. .. .. ..
- Here we see layouts closely akin to 'RAID1E - Integrated
- Offset Stripe Mirroring'.
- [delta_disks <N>]
- The delta_disks option value (-251 < N < +251) triggers
- device removal (negative value) or device addition (positive
- value) to any reshape supporting raid levels 4/5/6 and 10.
- RAID levels 4/5/6 allow for addition of devices (metadata
- and data device tupel), raid10_near and raid10_offset only
- allow for device addtion. raid10_far does not support any
- reshaping at all.
- A minimum of devices have to be kept to enforce resilience,
- which is 3 devices for raid4/5 and 4 devices for raid6.
- [data_offset <sectors>]
- This option value defines the offset into each data device
- where the data starts. This is used to provide out-of-place
- reshaping space to avoid writing over data whilst
- changing the layout of stripes, hence an interruption/crash
- may happen at any time without the risk of losing data.
- E.g. when adding devices to an existing raid set during
- forward reshaping, the out-of-place space will be allocated
- at the beginning of each raid device. The kernel raid4/5/6/10
- MD personalities supporting such device addition will read the data from
- the existing first stripes (those with smaller number of stripes)
- starting at data_offset to fill up a new stripe with the larger
- number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
- and write that new stripe to offset 0. Same will be applied to all
- N-1 other new stripes. This out-of-place scheme is used to change
- the RAID type (i.e. the allocation algorithm) as well, e.g.
- changing from raid5_ls to raid5_n.
- <#raid_devs>: The number of devices composing the array.
- Each device consists of two entries. The first is the device
- containing the metadata (if any); the second is the one containing the
- data. A Maximum of 64 metadata/data device entries are supported
- up to target version 1.8.0.
- 1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
- If a drive has failed or is missing at creation time, a '-' can be
- given for both the metadata and data drives for a given position.
- Example Tables
- --------------
- # RAID4 - 4 data drives, 1 parity (no metadata devices)
- # No metadata devices specified to hold superblock/bitmap info
- # Chunk size of 1MiB
- # (Lines separated for easy reading)
- 0 1960893648 raid \
- raid4 1 2048 \
- 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
- # RAID4 - 4 data drives, 1 parity (with metadata devices)
- # Chunk size of 1MiB, force RAID initialization,
- # min recovery rate at 20 kiB/sec/disk
- 0 1960893648 raid \
- raid4 4 2048 sync min_recovery_rate 20 \
- 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
- Status Output
- -------------
- 'dmsetup table' displays the table used to construct the mapping.
- The optional parameters are always printed in the order listed
- above with "sync" or "nosync" always output ahead of the other
- arguments, regardless of the order used when originally loading the table.
- Arguments that can be repeated are ordered by value.
- 'dmsetup status' yields information on the state and health of the array.
- The output is as follows (normally a single line, but expanded here for
- clarity):
- 1: <s> <l> raid \
- 2: <raid_type> <#devices> <health_chars> \
- 3: <sync_ratio> <sync_action> <mismatch_cnt>
- Line 1 is the standard output produced by device-mapper.
- Line 2 & 3 are produced by the raid target and are best explained by example:
- 0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
- Here we can see the RAID type is raid4, there are 5 devices - all of
- which are 'A'live, and the array is 2/490221568 complete with its initial
- recovery. Here is a fuller description of the individual fields:
- <raid_type> Same as the <raid_type> used to create the array.
- <health_chars> One char for each device, indicating: 'A' = alive and
- in-sync, 'a' = alive but not in-sync, 'D' = dead/failed.
- <sync_ratio> The ratio indicating how much of the array has undergone
- the process described by 'sync_action'. If the
- 'sync_action' is "check" or "repair", then the process
- of "resync" or "recover" can be considered complete.
- <sync_action> One of the following possible states:
- idle - No synchronization action is being performed.
- frozen - The current action has been halted.
- resync - Array is undergoing its initial synchronization
- or is resynchronizing after an unclean shutdown
- (possibly aided by a bitmap).
- recover - A device in the array is being rebuilt or
- replaced.
- check - A user-initiated full check of the array is
- being performed. All blocks are read and
- checked for consistency. The number of
- discrepancies found are recorded in
- <mismatch_cnt>. No changes are made to the
- array by this action.
- repair - The same as "check", but discrepancies are
- corrected.
- reshape - The array is undergoing a reshape.
- <mismatch_cnt> The number of discrepancies found between mirror copies
- in RAID1/10 or wrong parity values found in RAID4/5/6.
- This value is valid only after a "check" of the array
- is performed. A healthy array has a 'mismatch_cnt' of 0.
- Message Interface
- -----------------
- The dm-raid target will accept certain actions through the 'message' interface.
- ('man dmsetup' for more information on the message interface.) These actions
- include:
- "idle" - Halt the current sync action.
- "frozen" - Freeze the current sync action.
- "resync" - Initiate/continue a resync.
- "recover"- Initiate/continue a recover process.
- "check" - Initiate a check (i.e. a "scrub") of the array.
- "repair" - Initiate a repair of the array.
- Discard Support
- ---------------
- The implementation of discard support among hardware vendors varies.
- When a block is discarded, some storage devices will return zeroes when
- the block is read. These devices set the 'discard_zeroes_data'
- attribute. Other devices will return random data. Confusingly, some
- devices that advertise 'discard_zeroes_data' will not reliably return
- zeroes when discarded blocks are read! Since RAID 4/5/6 uses blocks
- from a number of devices to calculate parity blocks and (for performance
- reasons) relies on 'discard_zeroes_data' being reliable, it is important
- that the devices be consistent. Blocks may be discarded in the middle
- of a RAID 4/5/6 stripe and if subsequent read results are not
- consistent, the parity blocks may be calculated differently at any time;
- making the parity blocks useless for redundancy. It is important to
- understand how your hardware behaves with discards if you are going to
- enable discards with RAID 4/5/6.
- Since the behavior of storage devices is unreliable in this respect,
- even when reporting 'discard_zeroes_data', by default RAID 4/5/6
- discard support is disabled -- this ensures data integrity at the
- expense of losing some performance.
- Storage devices that properly support 'discard_zeroes_data' are
- increasingly whitelisted in the kernel and can thus be trusted.
- For trusted devices, the following dm-raid module parameter can be set
- to safely enable discard support for RAID 4/5/6:
- 'devices_handle_discards_safely'
- Version History
- ---------------
- 1.0.0 Initial version. Support for RAID 4/5/6
- 1.1.0 Added support for RAID 1
- 1.2.0 Handle creation of arrays that contain failed devices.
- 1.3.0 Added support for RAID 10
- 1.3.1 Allow device replacement/rebuild for RAID 10
- 1.3.2 Fix/improve redundancy checking for RAID10
- 1.4.0 Non-functional change. Removes arg from mapping function.
- 1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
- 1.4.2 Add RAID10 "far" and "offset" algorithm support.
- 1.5.0 Add message interface to allow manipulation of the sync_action.
- New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
- 1.5.1 Add ability to restore transiently failed devices on resume.
- 1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
- 1.6.0 Add discard support (and devices_handle_discard_safely module param).
- 1.7.0 Add support for MD RAID0 mappings.
- 1.8.0 Explictely check for compatible flags in the superblock metadata
- and reject to start the raid set if any are set by a newer
- target version, thus avoiding data corruption on a raid set
- with a reshape in progress.
- 1.9.0 Add support for RAID level takeover/reshape/region size
- and set size reduction.
- 1.9.1 Fix activation of existing RAID 4/10 mapped devices
|