vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC
authorKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tue, 16 Jun 2009 22:31:40 +0000 (15:31 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 17 Jun 2009 02:47:31 +0000 (19:47 -0700)
Commit 33c120ed2843090e2bd316de1588b8bf8b96cbde ("more aggressively use
lumpy reclaim") increased how aggressive lumpy reclaim was by isolating
both active and inactive pages for asynchronous lumpy reclaim on
costly-high-order pages and for cheap-high-order when memory pressure is
high.  However, if the system is under heavy pressure and there are dirty
pages, asynchronous IO may not be sufficient to reclaim a suitable page in
time.

This patch causes the caller to enter synchronous lumpy reclaim for
costly-high-order pages and for cheap-high-order pages when under memory
pressure.

Minchan.kim@gmail.com said:

Andy added synchronous lumpy reclaim with
c661b078fd62abe06fd11fab4ac5e4eeafe26b6d.  At that time, lumpy reclaim is
not agressive.  His intension is just for high-order users.(above
PAGE_ALLOC_COSTLY_ORDER).

After some time, Rik added aggressive lumpy reclaim with
33c120ed2843090e2bd316de1588b8bf8b96cbde.  His intention was to do lumpy
reclaim when high-order users and trouble getting a small set of
contiguous pages.

So we also have to add synchronous pageout for small set of contiguous
pages.

Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <Minchan.kim@gmail.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/vmscan.c

index 95c08a8..a6b7d14 100644 (file)
@@ -1061,6 +1061,19 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
        unsigned long nr_scanned = 0;
        unsigned long nr_reclaimed = 0;
        struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+       int lumpy_reclaim = 0;
+
+       /*
+        * If we need a large contiguous chunk of memory, or have
+        * trouble getting a small set of contiguous pages, we
+        * will reclaim both active and inactive pages.
+        *
+        * We use the same threshold as pageout congestion_wait below.
+        */
+       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+               lumpy_reclaim = 1;
+       else if (sc->order && priority < DEF_PRIORITY - 2)
+               lumpy_reclaim = 1;
 
        pagevec_init(&pvec, 1);
 
@@ -1073,19 +1086,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
                unsigned long nr_freed;
                unsigned long nr_active;
                unsigned int count[NR_LRU_LISTS] = { 0, };
-               int mode = ISOLATE_INACTIVE;
-
-               /*
-                * If we need a large contiguous chunk of memory, or have
-                * trouble getting a small set of contiguous pages, we
-                * will reclaim both active and inactive pages.
-                *
-                * We use the same threshold as pageout congestion_wait below.
-                */
-               if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-                       mode = ISOLATE_BOTH;
-               else if (sc->order && priority < DEF_PRIORITY - 2)
-                       mode = ISOLATE_BOTH;
+               int mode = lumpy_reclaim ? ISOLATE_BOTH : ISOLATE_INACTIVE;
 
                nr_taken = sc->isolate_pages(sc->swap_cluster_max,
                             &page_list, &nr_scan, sc->order, mode,
@@ -1122,7 +1123,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
                 * but that should be acceptable to the caller
                 */
                if (nr_freed < nr_taken && !current_is_kswapd() &&
-                                       sc->order > PAGE_ALLOC_COSTLY_ORDER) {
+                   lumpy_reclaim) {
                        congestion_wait(WRITE, HZ/10);
 
                        /*