Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix null pointer deref with SEND_SIG_FORCED perf: Fix signed comparison in perf_adjust_period() powerpc/oprofile: fix potential buffer overrun in op_model_cell.c perf symbols: Set the DSO long name when using symbol_conf.vmlinux_name
perf: Fix signed comparison in perf_adjust_period() Frederic reported that frequency driven swevents didn't work properly and even caused a division-by-zero error. It turns out there are two bugs, the division-by-zero comes from a failure to deal with that in perf_calculate_period(). The other was more interesting and turned out to be a wrong comparison in perf_adjust_period(). The comparison was between an s64 and u64 and got implicitly converted to an unsigned comparison. The problem is that period_left is typically < 0, so it ended up being always true. Cure this by making the local period variables s64. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: fix bne2 "gave up waiting for init of module libcrc32c" module: verify_export_symbols under the lock module: move find_module check to end module: make locking more fine-grained. module: Make module sysfs functions private. module: move sysfs exposure to end of load_module module: fix kdb's illicit use of struct module_use. module: Make the 'usage' lists be two-way
module: fix bne2 "gave up waiting for init of module libcrc32c" Problem: it's hard to avoid an init routine stumbling over a request_module these days. And it's not clear it's always a bad idea: for example, a module like kvm with dynamic dependencies on kvm-intel or kvm-amd would be neater if it could simply request_module the right one. In this particular case, it's libcrc32c: libcrc32c_mod_init crypto_alloc_shash crypto_alloc_tfm crypto_find_alg crypto_alg_mod_lookup crypto_larval_lookup request_module If another module is waiting inside resolve_symbol() for libcrc32c to finish initializing (ie. bne2 depends on libcrc32c) then it does so holding the module lock, and our request_module() can't make progress until that is released. Waiting inside resolve_symbol() without the lock isn't all that hard: we just need to pass the -EBUSY up the call chain so we can sleep where we don't hold the lock. Error reporting is a bit trickier: we need to copy the name of the unfinished module before releasing the lock. Other notes: 1) This also fixes a theoretical issue where a weak dependency would allow symbol version mismatches to be ignored. 2) We rename use_module to ref_module to make life easier for the only external user (the out-of-tree ksplice patches). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tim Abbot <tabbott@ksplice.com> Tested-by: Brandon Philips <bphilips@suse.de>
module: verify_export_symbols under the lock It disabled preempt so it was "safe", but nothing stops another module slipping in before this module is added to the global list now we don't hold the lock the whole time. So we check this just after we check for duplicate modules, and just before we put the module in the global list. (find_symbol finds symbols in coming and going modules, too). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module: move find_module check to end I think Rusty may have made the lock a bit _too_ finegrained there, and didn't add it to some places that needed it. It looks, for example, like PATCH 1/2 actually drops the lock in places where it's needed ("find_module()" is documented to need it, but now load_module() didn't hold it at all when it did the find_module()). Rather than adding a new "module_loading" list, I think we should be able to just use the existing "modules" list, and just fix up the locking a bit. In fact, maybe we could just move the "look up existing module" a bit later - optimistically assuming that the module doesn't exist, and then just undoing the work if it turns out that we were wrong, just before adding ourselves to the list. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module: make locking more fine-grained. Kay Sievers <kay.sievers@vrfy.org> reports that we still have some contention over module loading which is slowing boot. Linus also disliked a previous "drop lock and regrab" patch to fix the bne2 "gave up waiting for init of module libcrc32c" message. This is more ambitious: we only grab the lock where we need it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Brandon Philips <brandon@ifup.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
module: Make module sysfs functions private. These were placed in the header in ef665c1a06 to get the various SYSFS/MODULE config combintations to compile. That may have been necessary then, but it's not now. These functions are all local to module.c. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Randy Dunlap <randy.dunlap@oracle.com>
module: move sysfs exposure to end of load_module This means a little extra work, but is more logical: we don't put anything in sysfs until we're about to put the module into the global list an parse its parameters. This also gives us a logical place to put duplicate module detection in the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module: fix kdb's illicit use of struct module_use. Linus changed the structure, and luckily this didn't compile any more. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Martin Hicks <mort@sgi.com>
module: Make the 'usage' lists be two-way When adding a module that depends on another one, we used to create a one-way list of "modules_which_use_me", so that module unloading could see who needs a module. It's actually quite simple to make that list go both ways: so that we not only can see "who uses me", but also see a list of modules that are "used by me". In fact, we always wanted that list in "module_unload_free()": when we unload a module, we want to also release all the other modules that are used by that module. But because we didn't have that list, we used to first iterate over all modules, and then iterate over each "used by me" list of that module. By making the list two-way, we simplify module_unload_free(), and it allows for some trivial fixes later too. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned & rebased)
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: make blk_init_free_list and elevator_init idempotent block: avoid unconditionally freeing previously allocated request_queue pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface pipe: change the privilege required for growing a pipe beyond system max pipe: adjust minimum pipe size to 1 page block: disable preemption before using sched_clock() cciss: call BUG() earlier Preparing 8.3.8rc2 drbd: Reduce verbosity drbd: use drbd specific ratelimit instead of global printk_ratelimit drbd: fix hang on local read errors while disconnected drbd: Removed the now empty w_io_error() function drbd: removed duplicated #includes drbd: improve usage of MSG_MORE drbd: need to set socket bufsize early to take effect drbd: improve network latency, TCP_QUICKACK drbd: Revert "drbd: Create new current UUID as late as possible" brd: support discard Revert "writeback: fix WB_SYNC_NONE writeback from umount" Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync" ...
kernel/: fix BUG_ON checks for cpu notifier callbacks direct call The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu notifier to return encapsulate errno value") changed the return value of cpu notifier callbacks. Those callbacks don't return NOTIFY_BAD on failures anymore. But there are a few callbacks which are called directly at init time and checking the return value. I forgot to change BUG_ON checking by the direct callers in the commit. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cgroups: alloc_css_id() increments hierarchy depth Child groups should have a greater depth than their parents. Prior to this change, the parent would incorrectly report zero memory usage for child cgroups when use_hierarchy is enabled. test script: mount -t cgroup none /cgroups -o memory cd /cgroups mkdir cg1 echo 1 > cg1/memory.use_hierarchy mkdir cg1/cg11 echo $$ > cg1/cg11/tasks dd if=/dev/zero of=/tmp/foo bs=1M count=1 echo echo CHILD grep cache cg1/cg11/memory.stat echo echo PARENT grep cache cg1/memory.stat echo $$ > tasks rmdir cg1/cg11 cg1 cd / umount /cgroups Using fae9c79, a recent patch that changed alloc_css_id() depth computation, the parent incorrectly reports zero usage: root@ubuntu:~# ./test 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s CHILD cache 1048576 total_cache 1048576 PARENT cache 0 total_cache 0 With this patch, the parent correctly includes child usage: root@ubuntu:~# ./test 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s CHILD cache 1052672 total_cache 1052672 PARENT cache 0 total_cache 1052672 Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: <stable@kernel.org> [2.6.34.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_personality: change sys_personality() to accept "unsigned int" instead of u_long task_struct->pesonality is "unsigned int", but sys_personality() paths use "unsigned long pesonality". This means that every assignment or comparison is not right. In particular, if this argument does not fit into "unsigned int" __set_personality() changes the caller's personality and then sys_personality() returns -EINVAL. Turn this argument into "unsigned int" and avoid overflows. Obviously, this is the user-visible change, we just ignore the upper bits. But this can't break the sane application. There is another thing which can confuse the poorly written applications. User-space thinks that this syscall returns int, not long. This means that the returned value can be negative and look like the error code. But note that libc won't be confused and thus errno won't be set, and with this patch the user-space can never get -1 unless sys_personality() really fails. And, most importantly, the negative RET != -1 is only possible if that app previously called personality(RET). Pointed-out-by: Wenming Zhang <wezhang@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched, trace: Fix sched_switch() prev_state argument sched: Fix wake_affine() vs RT tasks sched: Make sure timers have migrated before killing the migration_thread
Merge branch 'perf-fixes-for-linus-2' of git://git./linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix crash in swevents perf buildid-list: Fix --with-hits event processing perf scripts python: Give field dict to unhandled callback perf hist: fix objdump output parsing perf-record: Check correct pid when forking perf: Do the comm inheritance per thread in event__process_task perf: Use event__process_task from perf sched perf: Process comm events by tid blktrace: Fix new kernel-doc warnings perf_events: Fix unincremented buffer base on partial copy perf_events: Fix event scheduling issues introduced by transactional API perf_events, trace: Fix perf_trace_destroy(), mutex went missing perf_events, trace: Fix probe unregister race perf_events: Fix races in group composition perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
perf: Fix crash in swevents Frederic reported that because swevents handling doesn't disable IRQs anymore, we can get a recursion of perf_adjust_period(), once from overflow handling and once from the tick. If both call ->disable, we get a double hlist_del_rcu() and trigger a LIST_POISON2 dereference. Since we don't actually need to stop/start a swevent to re-programm the hardware (lack of hardware to program), simply nop out these callbacks for the swevent pmu. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1275557609.27810.35218.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>