-2.1 /proc/sys/fs - File system data
------------------------------------
-
-This subdirectory contains specific file system, file handle, inode, dentry
-and quota information.
-
-Currently, these files are in /proc/sys/fs:
-
-dentry-state
-------------
-
-Status of the directory cache. Since directory entries are dynamically
-allocated and deallocated, this file indicates the current status. It holds
-six values, in which the last two are not used and are always zero. The others
-are listed in table 2-1.
-
-
-Table 2-1: Status files of the directory cache
-..............................................................................
- File Content
- nr_dentry Almost always zero
- nr_unused Number of unused cache entries
- age_limit
- in seconds after the entry may be reclaimed, when memory is short
- want_pages internally
-..............................................................................
-
-dquot-nr and dquot-max
-----------------------
-
-The file dquot-max shows the maximum number of cached disk quota entries.
-
-The file dquot-nr shows the number of allocated disk quota entries and the
-number of free disk quota entries.
-
-If the number of available cached disk quotas is very low and you have a large
-number of simultaneous system users, you might want to raise the limit.
-
-file-nr and file-max
---------------------
-
-The kernel allocates file handles dynamically, but doesn't free them again at
-this time.
-
-The value in file-max denotes the maximum number of file handles that the
-Linux kernel will allocate. When you get a lot of error messages about running
-out of file handles, you might want to raise this limit. The default value is
-10% of RAM in kilobytes. To change it, just write the new number into the
-file:
-
- # cat /proc/sys/fs/file-max
- 4096
- # echo 8192 > /proc/sys/fs/file-max
- # cat /proc/sys/fs/file-max
- 8192
-
-
-This method of revision is useful for all customizable parameters of the
-kernel - simply echo the new value to the corresponding file.
-
-Historically, the three values in file-nr denoted the number of allocated file
-handles, the number of allocated but unused file handles, and the maximum
-number of file handles. Linux 2.6 always reports 0 as the number of free file
-handles -- this is not an error, it just means that the number of allocated
-file handles exactly matches the number of used file handles.
-
-Attempts to allocate more file descriptors than file-max are reported with
-printk, look for "VFS: file-max limit <number> reached".
-
-inode-state and inode-nr
-------------------------
-
-The file inode-nr contains the first two items from inode-state, so we'll skip
-to that file...
-
-inode-state contains two actual numbers and five dummy values. The numbers
-are nr_inodes and nr_free_inodes (in order of appearance).
-
-nr_inodes
-~~~~~~~~~
-
-Denotes the number of inodes the system has allocated. This number will
-grow and shrink dynamically.
-
-nr_free_inodes
---------------
-
-Represents the number of free inodes. Ie. The number of inuse inodes is
-(nr_inodes - nr_free_inodes).
-
-aio-nr and aio-max-nr
----------------------
-
-aio-nr is the running total of the number of events specified on the
-io_setup system call for all currently active aio contexts. If aio-nr
-reaches aio-max-nr then io_setup will fail with EAGAIN. Note that
-raising aio-max-nr does not result in the pre-allocation or re-sizing
-of any kernel data structures.
-
-2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
------------------------------------------------------------
-
-Besides these files, there is the subdirectory /proc/sys/fs/binfmt_misc. This
-handles the kernel support for miscellaneous binary formats.
-
-Binfmt_misc provides the ability to register additional binary formats to the
-Kernel without compiling an additional module/kernel. Therefore, binfmt_misc
-needs to know magic numbers at the beginning or the filename extension of the
-binary.
-
-It works by maintaining a linked list of structs that contain a description of
-a binary format, including a magic with size (or the filename extension),
-offset and mask, and the interpreter name. On request it invokes the given
-interpreter with the original program as argument, as binfmt_java and
-binfmt_em86 and binfmt_mz do. Since binfmt_misc does not define any default
-binary-formats, you have to register an additional binary-format.
-
-There are two general files in binfmt_misc and one file per registered format.
-The two general files are register and status.
-
-Registering a new binary format
--------------------------------
-
-To register a new binary format you have to issue the command
-
- echo :name:type:offset:magic:mask:interpreter: > /proc/sys/fs/binfmt_misc/register
-
-
-
-with appropriate name (the name for the /proc-dir entry), offset (defaults to
-0, if omitted), magic, mask (which can be omitted, defaults to all 0xff) and
-last but not least, the interpreter that is to be invoked (for example and
-testing /bin/echo). Type can be M for usual magic matching or E for filename
-extension matching (give extension in place of magic).
-
-Check or reset the status of the binary format handler
-------------------------------------------------------
-
-If you do a cat on the file /proc/sys/fs/binfmt_misc/status, you will get the
-current status (enabled/disabled) of binfmt_misc. Change the status by echoing
-0 (disables) or 1 (enables) or -1 (caution: this clears all previously
-registered binary formats) to status. For example echo 0 > status to disable
-binfmt_misc (temporarily).
-
-Status of a single handler
---------------------------
-
-Each registered handler has an entry in /proc/sys/fs/binfmt_misc. These files
-perform the same function as status, but their scope is limited to the actual
-binary format. By cating this file, you also receive all related information
-about the interpreter/magic of the binfmt.
-
-Example usage of binfmt_misc (emulate binfmt_java)
---------------------------------------------------
-
- cd /proc/sys/fs/binfmt_misc
- echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/java/bin/javawrapper:' > register
- echo ':HTML:E::html::/usr/local/java/bin/appletviewer:' > register
- echo ':Applet:M::<!--applet::/usr/local/java/bin/appletviewer:' > register
- echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
-
-
-These four lines add support for Java executables and Java applets (like
-binfmt_java, additionally recognizing the .html extension with no need to put
-<!--applet> to every applet file). You have to install the JDK and the
-shell-script /usr/local/java/bin/javawrapper too. It works around the
-brokenness of the Java filename handling. To add a Java binary, just create a
-link to the class-file somewhere in the path.
-
-2.3 /proc/sys/kernel - general kernel parameters
-------------------------------------------------
-
-This directory reflects general kernel behaviors. As I've said before, the
-contents depend on your configuration. Here you'll find the most important
-files, along with descriptions of what they mean and how to use them.
-
-acct
-----
-
-The file contains three values; highwater, lowwater, and frequency.
-
-It exists only when BSD-style process accounting is enabled. These values
-control its behavior. If the free space on the file system where the log lives
-goes below lowwater percentage, accounting suspends. If it goes above
-highwater percentage, accounting resumes. Frequency determines how often you
-check the amount of free space (value is in seconds). Default settings are: 4,
-2, and 30. That is, suspend accounting if there is less than 2 percent free;
-resume it if we have a value of 3 or more percent; consider information about
-the amount of free space valid for 30 seconds
-
-ctrl-alt-del
-------------
-
-When the value in this file is 0, ctrl-alt-del is trapped and sent to the init
-program to handle a graceful restart. However, when the value is greater that
-zero, Linux's reaction to this key combination will be an immediate reboot,
-without syncing its dirty buffers.
-
-[NOTE]
- When a program (like dosemu) has the keyboard in raw mode, the
- ctrl-alt-del is intercepted by the program before it ever reaches the
- kernel tty layer, and it is up to the program to decide what to do with
- it.
-
-domainname and hostname
------------------------
-
-These files can be controlled to set the NIS domainname and hostname of your
-box. For the classic darkstar.frop.org a simple:
-
- # echo "darkstar" > /proc/sys/kernel/hostname
- # echo "frop.org" > /proc/sys/kernel/domainname
-
-
-would suffice to set your hostname and NIS domainname.
-
-osrelease, ostype and version
------------------------------
-
-The names make it pretty obvious what these fields contain:
-
- > cat /proc/sys/kernel/osrelease
- 2.2.12
-
- > cat /proc/sys/kernel/ostype
- Linux
-
- > cat /proc/sys/kernel/version
- #4 Fri Oct 1 12:41:14 PDT 1999
-
-
-The files osrelease and ostype should be clear enough. Version needs a little
-more clarification. The #4 means that this is the 4th kernel built from this
-source base and the date after it indicates the time the kernel was built. The
-only way to tune these values is to rebuild the kernel.
-
-panic
------
-
-The value in this file represents the number of seconds the kernel waits
-before rebooting on a panic. When you use the software watchdog, the
-recommended setting is 60. If set to 0, the auto reboot after a kernel panic
-is disabled, which is the default setting.
-
-printk
-------
-
-The four values in printk denote
-* console_loglevel,
-* default_message_loglevel,
-* minimum_console_loglevel and
-* default_console_loglevel
-respectively.
-
-These values influence printk() behavior when printing or logging error
-messages, which come from inside the kernel. See syslog(2) for more
-information on the different log levels.
-
-console_loglevel
-----------------
-
-Messages with a higher priority than this will be printed to the console.
-
-default_message_level
----------------------
-
-Messages without an explicit priority will be printed with this priority.
-
-minimum_console_loglevel
-------------------------
-
-Minimum (highest) value to which the console_loglevel can be set.
-
-default_console_loglevel
-------------------------
-
-Default value for console_loglevel.
-
-sg-big-buff
------------
-
-This file shows the size of the generic SCSI (sg) buffer. At this point, you
-can't tune it yet, but you can change it at compile time by editing
-include/scsi/sg.h and changing the value of SG_BIG_BUFF.
-
-If you use a scanner with SANE (Scanner Access Now Easy) you might want to set
-this to a higher value. Refer to the SANE documentation on this issue.
-
-modprobe
---------
-
-The location where the modprobe binary is located. The kernel uses this
-program to load modules on demand.
-
-unknown_nmi_panic
------------------
-
-The value in this file affects behavior of handling NMI. When the value is
-non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel
-debugging information is displayed on console.
-
-NMI switch that most IA32 servers have fires unknown NMI up, for example.
-If a system hangs up, try pressing the NMI switch.
-
-nmi_watchdog
-------------
-
-Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero
-the NMI watchdog is enabled and will continuously test all online cpus to
-determine whether or not they are still functioning properly.
-
-Because the NMI watchdog shares registers with oprofile, by disabling the NMI
-watchdog, oprofile may have more registers to utilize.
-
-maps_protect
-------------
-
-Enables/Disables the protection of the per-process proc entries "maps" and
-"smaps". When enabled, the contents of these files are visible only to
-readers that are allowed to ptrace() the given process.
-
-
-2.4 /proc/sys/vm - The virtual memory subsystem
------------------------------------------------
-
-The files in this directory can be used to tune the operation of the virtual
-memory (VM) subsystem of the Linux kernel.
-
-vfs_cache_pressure
-------------------
-
-Controls the tendency of the kernel to reclaim the memory which is used for
-caching of directory and inode objects.
-
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
-
-dirty_background_ratio
-----------------------
-
-Contains, as a percentage of total system memory, the number of pages at which
-the pdflush background writeback daemon will start writing out dirty data.
-
-dirty_ratio
------------------
-
-Contains, as a percentage of total system memory, the number of pages at which
-a process which is generating disk writes will itself start writing out dirty
-data.
-
-dirty_writeback_centisecs
--------------------------
-
-The pdflush writeback daemons will periodically wake up and write `old' data
-out to disk. This tunable expresses the interval between those wakeups, in
-100'ths of a second.
-
-Setting this to zero disables periodic writeback altogether.
-
-dirty_expire_centisecs
-----------------------
-
-This tunable is used to define when dirty data is old enough to be eligible
-for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
-Data which has been dirty in-memory for longer than this interval will be
-written out next time a pdflush daemon wakes up.
-
-legacy_va_layout
-----------------
-
-If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel
-will use the legacy (2.4) layout for all processes.
-
-lower_zone_protection
----------------------
-
-For some specialised workloads on highmem machines it is dangerous for
-the kernel to allow process memory to be allocated from the "lowmem"
-zone. This is because that memory could then be pinned via the mlock()
-system call, or by unavailability of swapspace.
-
-And on large highmem machines this lack of reclaimable lowmem memory
-can be fatal.
-
-So the Linux page allocator has a mechanism which prevents allocations
-which _could_ use highmem from using too much lowmem. This means that
-a certain amount of lowmem is defended from the possibility of being
-captured into pinned user memory.
-
-(The same argument applies to the old 16 megabyte ISA DMA region. This
-mechanism will also defend that region from allocations which could use
-highmem or lowmem).
-
-The `lower_zone_protection' tunable determines how aggressive the kernel is
-in defending these lower zones. The default value is zero - no
-protection at all.
-
-If you have a machine which uses highmem or ISA DMA and your
-applications are using mlock(), or if you are running with no swap then
-you probably should increase the lower_zone_protection setting.
-
-The units of this tunable are fairly vague. It is approximately equal
-to "megabytes," so setting lower_zone_protection=100 will protect around 100
-megabytes of the lowmem zone from user allocations. It will also make
-those 100 megabytes unavailable for use by applications and by
-pagecache, so there is a cost.
-
-The effects of this tunable may be observed by monitoring
-/proc/meminfo:LowFree. Write a single huge file and observe the point
-at which LowFree ceases to fall.
-
-A reasonable value for lower_zone_protection is 100.
-
-page-cluster
-------------
-
-page-cluster controls the number of pages which are written to swap in
-a single attempt. The swap I/O size.
-
-It is a logarithmic value - setting it to zero means "1 page", setting
-it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
-
-The default value is three (eight pages at a time). There may be some
-small benefits in tuning this to a different value if your workload is
-swap-intensive.
-
-overcommit_memory
------------------
-
-Controls overcommit of system memory, possibly allowing processes
-to allocate (but not use) more memory than is actually available.
-
-
-0 - Heuristic overcommit handling. Obvious overcommits of
- address space are refused. Used for a typical system. It
- ensures a seriously wild allocation fails while allowing
- overcommit to reduce swap usage. root is allowed to
- allocate slightly more memory in this mode. This is the
- default.
-
-1 - Always overcommit. Appropriate for some scientific
- applications.
-
-2 - Don't overcommit. The total address space commit
- for the system is not permitted to exceed swap plus a
- configurable percentage (default is 50) of physical RAM.
- Depending on the percentage you use, in most situations
- this means a process will not be killed while attempting
- to use already-allocated memory but will receive errors
- on memory allocation as appropriate.
-
-overcommit_ratio
-----------------
-
-Percentage of physical memory size to include in overcommit calculations
-(see above.)
-
-Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100)
-
- swapspace = total size of all swap areas
- physmem = size of physical memory in system
-
-nr_hugepages and hugetlb_shm_group
-----------------------------------
-
-nr_hugepages configures number of hugetlb page reserved for the system.
-
-hugetlb_shm_group contains group id that is allowed to create SysV shared
-memory segment using hugetlb page.
-
-hugepages_treat_as_movable
---------------------------
-
-This parameter is only useful when kernelcore= is specified at boot time to
-create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
-are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
-value written to hugepages_treat_as_movable allows huge pages to be allocated
-from ZONE_MOVABLE.
-
-Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
-pages pool can easily grow or shrink within. Assuming that applications are
-not running that mlock() a lot of memory, it is likely the huge pages pool
-can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
-into nr_hugepages and triggering page reclaim.
-
-laptop_mode
------------
-
-laptop_mode is a knob that controls "laptop mode". All the things that are
-controlled by this knob are discussed in Documentation/laptop-mode.txt.
-
-block_dump
-----------
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/laptop-mode.txt.
-
-swap_token_timeout
-------------------
-
-This file contains valid hold time of swap out protection token. The Linux
-VM has token based thrashing control mechanism and uses the token to prevent
-unnecessary page faults in thrashing situation. The unit of the value is
-second. The value would be useful to tune thrashing behavior.
-
-drop_caches
------------
-
-Writing to this will cause the kernel to drop clean caches, dentries and
-inodes from memory, causing that memory to become free.
-
-To free pagecache:
- echo 1 > /proc/sys/vm/drop_caches
-To free dentries and inodes:
- echo 2 > /proc/sys/vm/drop_caches
-To free pagecache, dentries and inodes:
- echo 3 > /proc/sys/vm/drop_caches
-
-As this is a non-destructive operation and dirty objects are not freeable, the
-user should run `sync' first.
-
-
-2.5 /proc/sys/dev - Device specific parameters
-----------------------------------------------
-
-Currently there is only support for CDROM drives, and for those, there is only
-one read-only file containing information about the CD-ROM drives attached to
-the system:
-
- >cat /proc/sys/dev/cdrom/info
- CD-ROM information, Id: cdrom.c 2.55 1999/04/25
-
- drive name: sr0 hdb
- drive speed: 32 40
- drive # of slots: 1 0
- Can close tray: 1 1
- Can open tray: 1 1
- Can lock tray: 1 1
- Can change speed: 1 1
- Can select disk: 0 1
- Can read multisession: 1 1
- Can read MCN: 1 1
- Reports media changed: 1 1
- Can play audio: 1 1
-
-
-You see two drives, sr0 and hdb, along with a list of their features.
-
-2.6 /proc/sys/sunrpc - Remote procedure calls
----------------------------------------------
-
-This directory contains four files, which enable or disable debugging for the
-RPC functions NFS, NFS-daemon, RPC and NLM. The default values are 0. They can
-be set to one to turn debugging on. (The default value is 0 for each)
-
-2.7 /proc/sys/net - Networking stuff
-------------------------------------
-
-The interface to the networking parts of the kernel is located in
-/proc/sys/net. Table 2-3 shows all possible subdirectories. You may see only
-some of them, depending on your kernel's configuration.
-
-
-Table 2-3: Subdirectories in /proc/sys/net
-..............................................................................
- Directory Content Directory Content
- core General parameter appletalk Appletalk protocol
- unix Unix domain sockets netrom NET/ROM
- 802 E802 protocol ax25 AX25
- ethernet Ethernet protocol rose X.25 PLP layer
- ipv4 IP version 4 x25 X.25 protocol
- ipx IPX token-ring IBM token ring
- bridge Bridging decnet DEC net
- ipv6 IP version 6
-..............................................................................
-
-We will concentrate on IP networking here. Since AX15, X.25, and DEC Net are
-only minor players in the Linux world, we'll skip them in this chapter. You'll
-find some short info on Appletalk and IPX further on in this chapter. Review
-the online documentation and the kernel source to get a detailed view of the
-parameters for those protocols. In this section we'll discuss the
-subdirectories printed in bold letters in the table above. As default values
-are suitable for most needs, there is no need to change these values.
-
-/proc/sys/net/core - Network core options
------------------------------------------
-
-rmem_default
-------------
-
-The default setting of the socket receive buffer in bytes.
-
-rmem_max
---------
-
-The maximum receive socket buffer size in bytes.
-
-wmem_default
-------------
-
-The default setting (in bytes) of the socket send buffer.
-
-wmem_max
---------
-
-The maximum send socket buffer size in bytes.
-
-message_burst and message_cost
-------------------------------