block: remove the anticipatory IO scheduler AS is mostly a subset of CFQ, so there's little point in still providing this separate IO scheduler. Hopefully at some point we can get down to one single IO scheduler again, at least this brings us closer by having only one intelligent IO scheduler. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
percpu: clean up percpu variable definitions Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
block: convert to pos and nr_sectors accessors With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: kill blk_start_queueing() blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: Tejun Heo <tj@kernel.org>
as-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: get rid of elevator_t typedef Just use struct elevator_queue everywhere instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: use cancel_work_sync() instead of kblockd_flush_work() After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: as/cfq ssd idle check update We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: add queue flag for SSD/non-rotational devices We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: make kblockd_schedule_work() take the queue as parameter Preparatory patch for checking queuing affinity. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use WARN() in block/ Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
as-iosched: properly protect ioc_gone and ioc count If we have multiple tasks freeing io contexts when as-iosched is being unloaded, we could complete() ioc_gone twice. Fix that by protecting ioc_gone complete() and clearing with a spinlock for just that purpose. Doesn't matter from a performance perspective, since it'll only enter that path when ioc_gone != NULL (when as-iosched is being rmmod'ed). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: Fix the starving writes bug in the anticipatory IO scheduler AS scheduler alternates between issuing read and write batches. It does the batch switch only after all requests from the previous batch are completed. When switching to a write batch, if there is an on-going read request, it waits for its completion and indicates its intention of switching by setting ad->changed_batch and the new direction but does not update the batch_expire_time for the new write batch which it does in the case of no previous pending requests. On completion of the read request, it sees that we were waiting for the switch and schedules work for kblockd right away and resets the ad->changed_data flag. Now when kblockd enters dispatch_request where it is expected to pick up a write request, it in turn ends the write batch because the batch_expire_timer was not updated and shows the expire timestamp for the previous batch. This results in the write starvation for all the cases where there is the intention for switching to a write batch, but there is a previous in-flight read request and the batch gets reverted to a read_batch right away. This also holds true in the reverse case (switching from a write batch to a read batch with an in-flight write request). I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and linux-2.6-block git HEAD. I've tested the fix on x86 platforms with SCSI drives where the driver asks for the next request while a current request is in-flight. This patch is based off linux-2.6-block git HEAD. Bug reproduction: A simple scenario which reproduces this bug is: - dd if=/dev/hda3 of=/dev/null & - lilo The lilo takes forever to complete. This can also be reproduced fairly easily with the earlier dd and another test program doing msync(). The example test program below should print out a message after every iteration but it simply hangs forever. With this bugfix it makes forward progress. ==== Example test program using msync() (thanks to suleiman AT google DOT com) inline uint64_t rdtsc(void) { int64_t tsc; __asm __volatile("rdtsc" : "=A" (tsc)); return (tsc); } int main(int argc, char **argv) { struct stat st; uint64_t e, s, t; char *p, q; long i; int fd; if (argc < 2) { printf("Usage: %s <file>\n", argv[0]); return (1); } if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0) err(1, "open"); if (fstat(fd, &st) < 0) err(1, "fstat"); p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); t = 0; for (i = 0; i < 1000; i++) { *p = 0; msync(p, 4096, MS_SYNC); s = rdtsc(); *p = 0; __asm __volatile(""::: "memory"); e = rdtsc(); if (argc > 2) printf("%d: %lld cycles %jd %jd\n", i, e - s, (intmax_t)s, (intmax_t)e); t += e - s; } printf("average time: %lld cycles\n", t / 1000); return (0); } Cc: <stable@kernel.org> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: kill swap_io_context() It blindly copies everything in the io_context, including the lock. That doesn't work so well for either lock ordering or lockdep. There seems zero point in swapping io contexts on a request to request merge, so the best point of action is to just remove it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
as-iosched: fix inconsistent ioc->lock context Since it's acquired from irq context, all locking must be of the irq safe variant. Most are already inside the queue lock (which already disables interrupts), but the io scheduler rmmod path always has irqs enabled and the put_io_context() path may legally be called with irqs enabled (even if it isn't usually). So fixup those two. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
as-iosched: fix double locking bug in as_merged_requests() If the two requests belong to the same io context, we will attempt to lock the same lock twice. But swapping contexts is pointless in that case, so just check for rioc == nioc before doing the double lock and copy. Tested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
io_context sharing - anticipatory changes changes to anticipatory io scheduler for io_context sharing Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
block: let elv_register() return void elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
as-iosched: fix write batch start point New write batches currently start from where the last one completed. We have no idea where the head is after switching batches, so this makes little sense. Instead, start the next batch from the request with the earliest deadline in the hope that we avoid a deadline expiry later on. Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
as-iosched: fix incorrect comments Two comments refer to deadlines applying to reads only. This is not the case. Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>