* extent format more robust in face of on-disk corruption due to magics,
* internal redundancy in tree
* improved file allocation (multi-block alloc)
-* fix 32000 subdirectory limit
+* lift 32000 subdirectory limit imposed by i_links_count[1]
* nsec timestamps for mtime, atime, ctime, create time
* inode version field on disk (NFSv4, Lustre)
* reduced e2fsck time via uninit_bg feature
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
the ordering)
+[1] Filesystems with a block size of 1k may see a limit imposed by the
+directory hash tree having a maximum depth of two.
+
2.2 Candidate features for future inclusion
* Online defrag (patches available but not well tested)
mount options "ro,noload" can be used to prevent
writes to the filesystem.
-extents (*) ext4 will use extents to address file data. The
- file system will no longer be mountable by ext3.
-
-noextents ext4 will not use extents for newly created files
-
journal_checksum Enable checksumming of the journal transactions.
This will allow the recovery code in e2fsck and the
kernel to detect corruption in the kernel. It is a
performance.
barrier=<0|1(*)> This enables/disables the use of write barriers in
- the jbd code. barrier=0 disables, barrier=1 enables.
- This also requires an IO stack which can support
+barrier(*) the jbd code. barrier=0 disables, barrier=1 enables.
+nobarrier This also requires an IO stack which can support
barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning.
Write barriers enforce proper on-disk ordering
safe to use, at some performance penalty. If
your disks are battery-backed in one way or another,
disabling barriers may safely improve performance.
+ The mount options "barrier" and "nobarrier" can
+ also be used to enable or disable barriers, for
+ consistency with other ext4 mount options.
inode_readahead=n This tuning parameter controls the maximum
number of inode table blocks that ext4's inode
debug Extra debugging information is sent to syslog.
+abort Simulate the effects of calling ext4_abort() for
+ debugging purposes. This is normally used while
+ remounting a filesystem which is already mounted.
+
errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.
sb=n Use alternate superblock at this location.
-quota
-noquota
-grpquota
-usrquota
+quota These options are ignored by the filesystem. They
+noquota are used only by quota tools to recognize volumes
+grpquota where quota should be turned on. See documentation
+usrquota in the quota-tools package for more details
+ (http://sourceforge.net/projects/linuxquota).
+
+jqfmt=<quota type> These options tell filesystem details about quota
+usrjquota=<file> so that quota information can be properly updated
+grpjquota=<file> during journal replay. They replace the above
+ quota options. See documentation in the quota-tools
+ package for more details
+ (http://sourceforge.net/projects/linuxquota).
bh (*) ext4 associates buffer heads to data pages to
nobh (a) cache disk block mapping information
to use for allocation size and alignment. For RAID5/6
systems this should be the number of data
disks * RAID chunk size in file system blocks.
-delalloc (*) Deferring block allocation until write-out time.
-nodelalloc Disable delayed allocation. Blocks are allocation
- when data is copied from user to page cache.
+
+delalloc (*) Defer block allocation until just before ext4
+ writes out the block(s) in question. This
+ allows ext4 to better allocation decisions
+ more efficiently.
+nodelalloc Disable delayed allocation. Blocks are allocated
+ when the data is copied from userspace to the
+ page cache, either via the write(2) system call
+ or when an mmap'ed page which was previously
+ unallocated is written for the first time.
max_batch_time=usec Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
amount of time (on average) that it takes to
finish committing a transaction. Call this time
the "commit time". If the time that the
- transactoin has been running is less than the
+ transaction has been running is less than the
commit time, ext4 will try sleeping for the
commit time to see if other operations will join
the transaction. The commit time is capped by
multi-threaded, synchronous workloads on very
fast disks, at the cost of increasing latency.
+journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the
+ highest priorty) which should be used for I/O
+ operations submitted by kjournald2 during a
+ commit operation. This defaults to 3, which is
+ a slightly higher priority than the default I/O
+ priority.
+
+auto_da_alloc(*) Many broken applications don't use fsync() when
+noauto_da_alloc replacing existing files via patterns such as
+ fd = open("foo.new")/write(fd,..)/close(fd)/
+ rename("foo.new", "foo"), or worse yet,
+ fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
+ If auto_da_alloc is enabled, ext4 will detect
+ the replace-via-rename and replace-via-truncate
+ patterns and force that any delayed allocation
+ blocks are allocated such that at the next
+ journal commit, in the default data=ordered
+ mode, the data blocks of the new file are forced
+ to disk before the rename() operation is
+ committed. This provides roughly the same level
+ of guarantees as ext3, and avoids the
+ "zero-length" problem that can happen when a
+ system crashes before the delayed allocation
+ blocks are forced to disk.
+
Data Mode
=========
There are 3 different data modes:
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it
-outperforms all others modes. Curently ext4 does not have delayed
+outperforms all others modes. Currently ext4 does not have delayed
allocation support if this data journalling mode is selected.
References