X-Git-Url: http://ftp.safe.ca/?a=blobdiff_plain;f=Documentation%2Fmd.txt;h=4edd39ec7db91abcbcbac352fc3c5e6d3e3c3eca;hb=3b6cee7bc4b2f7858e9202293104acda8826bb68;hp=084ecf4eb2f87d1c89f6943db0c5f5bfbef1f32e;hpb=9b1d1dac181d8c1b9492e05cee660a985d035a06;p=safe%2Fjmp%2Flinux-2.6 diff --git a/Documentation/md.txt b/Documentation/md.txt index 084ecf4..4edd39e 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -62,7 +62,7 @@ be reconstructed (due to no parity). For this reason, md will normally refuse to start such an array. This requires the sysadmin to take action to explicitly start the array -desipite possible corruption. This is normally done with +despite possible corruption. This is normally done with mdadm --assemble --force .... This option is not really available if the array has the root @@ -154,29 +154,63 @@ contains further md-specific information about the device. All md devices contain: level - a text file indicating the 'raid level'. This may be a standard - numerical level prefixed by "RAID-" - e.g. "RAID-5", or some - other name such as "linear" or "multipath". + a text file indicating the 'raid level'. e.g. raid0, raid1, + raid5, linear, multipath, faulty. If no raid level has been set yet (array is still being - assembled), this file will be empty. + assembled), the value will reflect whatever has been written + to it, which may be a name like the above, or may be a number + such as '0', '5', etc. raid_disks a text file with a simple number indicating the number of devices in a fully functional array. If this is not yet known, the file - will be empty. If an array is being resized (not currently - possible) this will contain the larger of the old and new sizes. - Some raid level (RAID1) allow this value to be set while the - array is active. This will reconfigure the array. Otherwise - it can only be set while assembling an array. + will be empty. If an array is being resized this will contain + the new number of devices. + Some raid levels allow this value to be set while the array is + active. This will reconfigure the array. Otherwise it can only + be set while assembling an array. + A change to this attribute will not be permitted if it would + reduce the size of the array. To reduce the number of drives + in an e.g. raid5, the array size must first be reduced by + setting the 'array_size' attribute. chunk_size - This is the size if bytes for 'chunks' and is only relevant to - raid levels that involve striping (1,4,5,6,10). The address space + This is the size in bytes for 'chunks' and is only relevant to + raid levels that involve striping (0,4,5,6,10). The address space of the array is conceptually divided into chunks and consecutive chunks are striped onto neighbouring devices. - The size should be atleast PAGE_SIZE (4k) and should be a power + The size should be at least PAGE_SIZE (4k) and should be a power of 2. This can only be set while assembling an array + layout + The "layout" for the array for the particular level. This is + simply a number that is interpretted differently by different + levels. It can be written while assembling an array. + + array_size + This can be used to artificially constrain the available space in + the array to be less than is actually available on the combined + devices. Writing a number (in Kilobytes) which is less than + the available size will set the size. Any reconfiguration of the + array (e.g. adding devices) will not cause the size to change. + Writing the word 'default' will cause the effective size of the + array to be whatever size is actually available based on + 'level', 'chunk_size' and 'component_size'. + + This can be used to reduce the size of the array before reducing + the number of devices in a raid4/5/6, or to support external + metadata formats which mandate such clipping. + + reshape_position + This is either "none" or a sector number within the devices of + the array where "reshape" is up to. If this is set, the three + attributes mentioned above (raid_disks, chunk_size, layout) can + potentially have 2 values, an old and a new value. If these + values differ, reading the attribute returns + new (old) + and writing will effect the 'new' value, leaving the 'old' + unchanged. + component_size For arrays with data redundancy (i.e. not raid0, linear, faulty, multipath), all components must be the same size - or at least @@ -191,19 +225,11 @@ All md devices contain: about the array. It can be 0.90 (traditional format), 1.0, 1.1, 1.2 (newer format in varying locations) or "none" indicating that the kernel isn't managing metadata at all. - - level - The raid 'level' for this array. The name will often (but not - always) be the same as the name of the module that implements the - level. To be auto-loaded the module must have an alias - md-$LEVEL e.g. md-raid5 - This can be written only while the array is being assembled, not - after it is started. - - layout - The "layout" for the array for the particular level. This is - simply a number that is interpretted differently by different - levels. It can be written while assembling an array. + Alternately it can be "external:" followed by a string which + is set by user-space. This indicates that metadata is managed + by a user-space program. Any device failure or other event that + requires a metadata update will cause array activity to be + suspended until the event is acknowledged. resync_start The point at which resync should start. If no resync is needed, @@ -221,8 +247,8 @@ All md devices contain: safe_mode_delay When an md array has seen no write requests for a certain period of time, it will be marked as 'clean'. When another write - request arrive, the array is marked as 'dirty' before the write - commenses. This is known as 'safe_mode'. + request arrives, the array is marked as 'dirty' before the write + commences. This is known as 'safe_mode'. The 'certain period' is controlled by this file which stores the period as a number of seconds. The default is 200msec (0.200). Writing a value of 0 disables safemode. @@ -233,6 +259,11 @@ All md devices contain: writing the word for the desired state, however some states cannot be explicitly set, and some transitions are not allowed. + Select/poll works on this file. All changes except between + active_idle and active (which can be frequent and are not + very interesting) are notified. active->active_idle is + reported if the metadata is externally managed. + clear No devices, no size, no level Writing is equivalent to STOP_ARRAY ioctl @@ -266,29 +297,6 @@ All md devices contain: like active, but no writes have been seen for a while (safe_mode_delay). - sync_speed_min - sync_speed_max - This are similar to /proc/sys/dev/raid/speed_limit_{min,max} - however they only apply to the particular array. - If no value has been written to these, of if the word 'system' - is written, then the system-wide value is used. If a value, - in kibibytes-per-second is written, then it is used. - When the files are read, they show the currently active value - followed by "(local)" or "(system)" depending on whether it is - a locally set or system-wide value. - - sync_completed - This shows the number of sectors that have been completed of - whatever the current sync_action is, followed by the number of - sectors in total that could need to be processed. The two - numbers are separated by a '/' thus effectively showing one - value, a fraction of the process that is complete. - - sync_speed - This shows the current actual speed, in K/sec, of the current - sync_action. It is averaged over the last 30 seconds. - - As component devices are added to an md array, they appear in the 'md' directory as new directories named dev-XXX @@ -312,15 +320,25 @@ Each directory contains: writemostly - device will only be subject to read requests if there are no other options. This applies only to raid1 arrays. + blocked - device has failed, metadata is "external", + and the failure hasn't been acknowledged yet. + Writes that would write to this device if + it were not faulty are blocked. spare - device is working, but not a full member. This includes spares that are in the process - of being recoverred to - This list make grow in future. + of being recovered to + This list may grow in future. This can be written to. Writing "faulty" simulates a failure on the device. Writing "remove" removes the device from the array. Writing "writemostly" sets the writemostly flag. Writing "-writemostly" clears the writemostly flag. + Writing "blocked" sets the "blocked" flag. + Writing "-blocked" clear the "blocked" flag and allows writes + to complete. + + This file responds to select/poll. Any change to 'faulty' + or 'blocked' causes an event. errors An approximate count of read errors that have been detected on @@ -337,7 +355,7 @@ Each directory contains: This gives the role that the device has in the array. It will either be 'none' if the device is not active in the array (i.e. is a spare or has failed) or an integer less than the - 'raid_disks' number for the array indicating which possition + 'raid_disks' number for the array indicating which position it currently fills. This can only be set while assembling an array. A device for which this is set is assumed to be working. @@ -352,7 +370,7 @@ Each directory contains: for storage of data. This will normally be the same as the component_size. This can be written while assembling an array. If a value less than the current component_size is - written, component_size will be reduced to this value. + written, it will be rejected. An active md device will also contain and entry for each active device @@ -360,7 +378,7 @@ in the array. These are named rdNN -where 'NN' is the possition in the array, starting from 0. +where 'NN' is the position in the array, starting from 0. So for a 3 drive array there will be rd0, rd1, rd2. These are symbolic links to the appropriate 'dev-XXX' entry. Thus, for example, @@ -401,6 +419,19 @@ also have 'check' and 'repair' will start the appropriate process providing the current state is 'idle'. + This file responds to select/poll. Any important change in the value + triggers a poll event. Sometimes the value will briefly be + "recover" if a recovery seems to be needed, but cannot be + achieved. In that case, the transition to "recover" isn't + notified, but the transition away is. + + degraded + This contains a count of the number of devices by which the + arrays is degraded. So an optimal array with show '0'. A + single failed/missing drive will show '1', etc. + This file responds to select/poll, any increase or decrease + in the count of missing devices will trigger an event. + mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are @@ -419,6 +450,45 @@ also have Note that the numbers are 'bit' numbers, not 'block' numbers. They should be scaled by the bitmap_chunksize. + sync_speed_min + sync_speed_max + This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + however they only apply to the particular array. + If no value has been written to these, of if the word 'system' + is written, then the system-wide value is used. If a value, + in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value + followed by "(local)" or "(system)" depending on whether it is + a locally set or system-wide value. + + sync_completed + This shows the number of sectors that have been completed of + whatever the current sync_action is, followed by the number of + sectors in total that could need to be processed. The two + numbers are separated by a '/' thus effectively showing one + value, a fraction of the process that is complete. + A 'select' on this attribute will return when resync completes, + when it reaches the current sync_max (below) and possibly at + other times. + + sync_max + This is a number of sectors at which point a resync/recovery + process will pause. When a resync is active, the value can + only ever be increased, never decreased. The value of 'max' + effectively disables the limit. + + + sync_speed + This shows the current actual speed, in K/sec, of the current + sync_action. It is averaged over the last 30 seconds. + + suspend_lo + suspend_hi + The two values, given as numbers of sectors, indicate a range + within the array where IO will be blocked. This is currently + only supported for raid4/5/6. + + Each active md device may also have attributes specific to the personality module that manages it. These are specific to the implementation of the module and could @@ -431,3 +501,9 @@ These currently include there are upper and lower limits (32768, 16). Default is 128. strip_cache_active (currently raid5 only) number of active entries in the stripe cache + preread_bypass_threshold (currently raid5 only) + number of times a stripe requiring preread will be bypassed by + a stripe that does not require preread. For fairness defaults + to 1. Setting this to 0 disables bypass accounting and + requires preread stripes to wait until all full-width stripe- + writes are complete. Valid values are 0 to stripe_cache_size.