X-Git-Url: http://ftp.safe.ca/?a=blobdiff_plain;f=Documentation%2Fmd.txt;h=4edd39ec7db91abcbcbac352fc3c5e6d3e3c3eca;hb=0b4b2ad5307c76c7105d6e7c724b1c14b8daf482;hp=69f742dee00fc753c4689aaa9a4da9062f02ac6c;hpb=da943b9912df063322d37b1a1f285460531d481d;p=safe%2Fjmp%2Flinux-2.6 diff --git a/Documentation/md.txt b/Documentation/md.txt index 69f742d..4edd39e 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -62,7 +62,7 @@ be reconstructed (due to no parity). For this reason, md will normally refuse to start such an array. This requires the sysadmin to take action to explicitly start the array -desipite possible corruption. This is normally done with +despite possible corruption. This is normally done with mdadm --assemble --force .... This option is not really available if the array has the root @@ -154,29 +154,63 @@ contains further md-specific information about the device. All md devices contain: level - a text file indicating the 'raid level'. This may be a standard - numerical level prefixed by "RAID-" - e.g. "RAID-5", or some - other name such as "linear" or "multipath". + a text file indicating the 'raid level'. e.g. raid0, raid1, + raid5, linear, multipath, faulty. If no raid level has been set yet (array is still being - assembled), this file will be empty. + assembled), the value will reflect whatever has been written + to it, which may be a name like the above, or may be a number + such as '0', '5', etc. raid_disks a text file with a simple number indicating the number of devices in a fully functional array. If this is not yet known, the file - will be empty. If an array is being resized (not currently - possible) this will contain the larger of the old and new sizes. - Some raid level (RAID1) allow this value to be set while the - array is active. This will reconfigure the array. Otherwise - it can only be set while assembling an array. + will be empty. If an array is being resized this will contain + the new number of devices. + Some raid levels allow this value to be set while the array is + active. This will reconfigure the array. Otherwise it can only + be set while assembling an array. + A change to this attribute will not be permitted if it would + reduce the size of the array. To reduce the number of drives + in an e.g. raid5, the array size must first be reduced by + setting the 'array_size' attribute. chunk_size - This is the size if bytes for 'chunks' and is only relevant to - raid levels that involve striping (1,4,5,6,10). The address space + This is the size in bytes for 'chunks' and is only relevant to + raid levels that involve striping (0,4,5,6,10). The address space of the array is conceptually divided into chunks and consecutive chunks are striped onto neighbouring devices. - The size should be atleast PAGE_SIZE (4k) and should be a power + The size should be at least PAGE_SIZE (4k) and should be a power of 2. This can only be set while assembling an array + layout + The "layout" for the array for the particular level. This is + simply a number that is interpretted differently by different + levels. It can be written while assembling an array. + + array_size + This can be used to artificially constrain the available space in + the array to be less than is actually available on the combined + devices. Writing a number (in Kilobytes) which is less than + the available size will set the size. Any reconfiguration of the + array (e.g. adding devices) will not cause the size to change. + Writing the word 'default' will cause the effective size of the + array to be whatever size is actually available based on + 'level', 'chunk_size' and 'component_size'. + + This can be used to reduce the size of the array before reducing + the number of devices in a raid4/5/6, or to support external + metadata formats which mandate such clipping. + + reshape_position + This is either "none" or a sector number within the devices of + the array where "reshape" is up to. If this is set, the three + attributes mentioned above (raid_disks, chunk_size, layout) can + potentially have 2 values, an old and a new value. If these + values differ, reading the attribute returns + new (old) + and writing will effect the 'new' value, leaving the 'old' + unchanged. + component_size For arrays with data redundancy (i.e. not raid0, linear, faulty, multipath), all components must be the same size - or at least @@ -191,14 +225,77 @@ All md devices contain: about the array. It can be 0.90 (traditional format), 1.0, 1.1, 1.2 (newer format in varying locations) or "none" indicating that the kernel isn't managing metadata at all. + Alternately it can be "external:" followed by a string which + is set by user-space. This indicates that metadata is managed + by a user-space program. Any device failure or other event that + requires a metadata update will cause array activity to be + suspended until the event is acknowledged. + + resync_start + The point at which resync should start. If no resync is needed, + this will be a very large number. At array creation it will + default to 0, though starting the array as 'clean' will + set it much larger. + + new_dev + This file can be written but not read. The value written should + be a block device number as major:minor. e.g. 8:0 + This will cause that device to be attached to the array, if it is + available. It will then appear at md/dev-XXX (depending on the + name of the device) and further configuration is then possible. + + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrives, the array is marked as 'dirty' before the write + commences. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + + array_state + This file contains a single word which describes the current + state of the array. In many cases, the state can be set by + writing the word for the desired state, however some states + cannot be explicitly set, and some transitions are not allowed. + + Select/poll works on this file. All changes except between + active_idle and active (which can be frequent and are not + very interesting) are notified. active->active_idle is + reported if the metadata is externally managed. + + clear + No devices, no size, no level + Writing is equivalent to STOP_ARRAY ioctl + inactive + May have some settings, but array is not active + all IO results in error + When written, doesn't tear down array, but just stops it + suspended (not supported yet) + All IO requests will block. The array can be reconfigured. + Writing this, if accepted, will block until array is quiessent + readonly + no resync can happen. no superblocks get written. + write requests fail + read-auto + like readonly, but behaves like 'clean' on a write request. + + clean - no pending writes, but otherwise active. + When written to inactive array, starts without resync + If a write request arrives then + if metadata is known, mark 'dirty' and switch to 'active'. + if not known, block and switch to write-pending + If written to an active array that has pending writes, then fails. + active + fully active: IO and resync can be happening. + When written to inactive array, starts with resync + + write-pending + clean, but writes are blocked waiting for 'active' to be written. + + active-idle + like active, but no writes have been seen for a while (safe_mode_delay). - level - The raid 'level' for this array. The name will often (but not - always) be the same as the name of the module that implements the - level. To be auto-loaded the module must have an alias - md-$LEVEL e.g. md-raid5 - This can be written only while the array is being assembled, not - after it is started. As component devices are added to an md array, they appear in the 'md' directory as new directories named @@ -220,10 +317,28 @@ Each directory contains: faulty - device has been kicked from active use due to a detected fault in_sync - device is a fully in-sync member of the array + writemostly - device will only be subject to read + requests if there are no other options. + This applies only to raid1 arrays. + blocked - device has failed, metadata is "external", + and the failure hasn't been acknowledged yet. + Writes that would write to this device if + it were not faulty are blocked. spare - device is working, but not a full member. This includes spares that are in the process - of being recoverred to - This list make grow in future. + of being recovered to + This list may grow in future. + This can be written to. + Writing "faulty" simulates a failure on the device. + Writing "remove" removes the device from the array. + Writing "writemostly" sets the writemostly flag. + Writing "-writemostly" clears the writemostly flag. + Writing "blocked" sets the "blocked" flag. + Writing "-blocked" clear the "blocked" flag and allows writes + to complete. + + This file responds to select/poll. Any change to 'faulty' + or 'blocked' causes an event. errors An approximate count of read errors that have been detected on @@ -236,13 +351,34 @@ Each directory contains: providing an ongoing count for arrays with metadata managed by userspace. + slot + This gives the role that the device has in the array. It will + either be 'none' if the device is not active in the array + (i.e. is a spare or has failed) or an integer less than the + 'raid_disks' number for the array indicating which position + it currently fills. This can only be set while assembling an + array. A device for which this is set is assumed to be working. + + offset + This gives the location in the device (in sectors from the + start) where data from the array will be stored. Any part of + the device before this offset us not touched, unless it is + used for storing metadata (Formats 1.1 and 1.2). + + size + The amount of the device, after the offset, that can be used + for storage of data. This will normally be the same as the + component_size. This can be written while assembling an + array. If a value less than the current component_size is + written, it will be rejected. + An active md device will also contain and entry for each active device in the array. These are named rdNN -where 'NN' is the possition in the array, starting from 0. +where 'NN' is the position in the array, starting from 0. So for a 3 drive array there will be rd0, rd1, rd2. These are symbolic links to the appropriate 'dev-XXX' entry. Thus, for example, @@ -283,6 +419,19 @@ also have 'check' and 'repair' will start the appropriate process providing the current state is 'idle'. + This file responds to select/poll. Any important change in the value + triggers a poll event. Sometimes the value will briefly be + "recover" if a recovery seems to be needed, but cannot be + achieved. In that case, the transition to "recover" isn't + notified, but the transition away is. + + degraded + This contains a count of the number of devices by which the + arrays is degraded. So an optimal array with show '0'. A + single failed/missing drive will show '1', etc. + This file responds to select/poll, any increase or decrease + in the count of missing devices will trigger an event. + mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are @@ -292,6 +441,54 @@ also have than sectors, this my be larger than the number of actual errors by a factor of the number of sectors in a page. + bitmap_set_bits + If the array has a write-intent bitmap, then writing to this + attribute can set bits in the bitmap, indicating that a resync + would need to check the corresponding blocks. Either individual + numbers or start-end pairs can be written. Multiple numbers + can be separated by a space. + Note that the numbers are 'bit' numbers, not 'block' numbers. + They should be scaled by the bitmap_chunksize. + + sync_speed_min + sync_speed_max + This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + however they only apply to the particular array. + If no value has been written to these, of if the word 'system' + is written, then the system-wide value is used. If a value, + in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value + followed by "(local)" or "(system)" depending on whether it is + a locally set or system-wide value. + + sync_completed + This shows the number of sectors that have been completed of + whatever the current sync_action is, followed by the number of + sectors in total that could need to be processed. The two + numbers are separated by a '/' thus effectively showing one + value, a fraction of the process that is complete. + A 'select' on this attribute will return when resync completes, + when it reaches the current sync_max (below) and possibly at + other times. + + sync_max + This is a number of sectors at which point a resync/recovery + process will pause. When a resync is active, the value can + only ever be increased, never decreased. The value of 'max' + effectively disables the limit. + + + sync_speed + This shows the current actual speed, in K/sec, of the current + sync_action. It is averaged over the last 30 seconds. + + suspend_lo + suspend_hi + The two values, given as numbers of sectors, indicate a range + within the array where IO will be blocked. This is currently + only supported for raid4/5/6. + + Each active md device may also have attributes specific to the personality module that manages it. These are specific to the implementation of the module and could @@ -304,3 +501,9 @@ These currently include there are upper and lower limits (32768, 16). Default is 128. strip_cache_active (currently raid5 only) number of active entries in the stripe cache + preread_bypass_threshold (currently raid5 only) + number of times a stripe requiring preread will be bypassed by + a stripe that does not require preread. For fairness defaults + to 1. Setting this to 0 disables bypass accounting and + requires preread stripes to wait until all full-width stripe- + writes are complete. Valid values are 0 to stripe_cache_size.