safe/jmp/linux-2.6
15 years agoring-buffer: only allocate buffers for online cpus
Steven Rostedt [Thu, 12 Mar 2009 02:00:13 +0000 (22:00 -0400)]
ring-buffer: only allocate buffers for online cpus

Impact: save on memory

Currently, a ring buffer was allocated for each "possible_cpus". On
some systems, this is the same as NR_CPUS. Thus, if a system defined
NR_CPUS = 64 but it only had 1 CPU, we could have possibly 63 useless
ring buffers taking up space. With a default buffer of 3 megs, this
could be quite drastic.

This patch changes the ring buffer code to only allocate ring buffers
for online CPUs.  If a CPU goes off line, we do not free the buffer.
This is because the user may still have trace data in that buffer
that they would like to look at.

Perhaps in the future we could add code to delete a ring buffer if
the CPU is offline and the ring buffer becomes empty.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: fix trace_wait to know to wait on all cpus or just one
Steven Rostedt [Wed, 11 Mar 2009 23:52:30 +0000 (19:52 -0400)]
tracing: fix trace_wait to know to wait on all cpus or just one

Impact: fix to task live locking on reading trace_pipe on one CPU

The same code is used for both trace_pipe (all CPUS) and the per_cpu
trace_pipe file. When there is no data to read, it will check for
signals and wait on the trace wait queue.

The problem happens with the per_cpu wait. The trace_wait code checks
all CPUs. Thus, if there's data in another CPU buffer, then it will
exit the wait, without checking for signals or waiting on the wait queue.

It would then try to read the empty buffer, and since that will just
return nothing, then it will try to wait again. Unfortunately, that will
again fail due to there still being data in the other buffers. This
ends up with a live lock for the task.

This patch fixes the trace_wait to be aware that the iterator may only
be waiting on a single buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: expand the ring buffers when an event is activated
Steven Rostedt [Wed, 11 Mar 2009 18:33:00 +0000 (14:33 -0400)]
tracing: expand the ring buffers when an event is activated

To save memory, the tracer ring buffers are set to a minimum.
The activating of a trace expands the ring buffer size. This patch
adds this expanding, when an event is activated.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: keep ring buffer to minimum size till used
Steven Rostedt [Wed, 11 Mar 2009 17:42:01 +0000 (13:42 -0400)]
tracing: keep ring buffer to minimum size till used

Impact: less memory impact on systems not using tracer

When the kernel boots up that has tracing configured, it allocates
the default size of the ring buffer. This currently happens to be
1.4Megs per possible CPU. This is quite a bit of wasted memory if
the system is never using the tracer.

The current solution is to keep the ring buffers to a minimum size
until the user uses them. Once a tracer is piped into the current_tracer
the ring buffer will be expanded to the default size. If the user
changes the size of the ring buffer, it will take the size given
by the user immediately.

If the user adds a "ftrace=" to the kernel command line, then the ring
buffers will be set to the default size on initialization.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: use raw spinlocks for trace_vprintk
Steven Rostedt [Tue, 10 Mar 2009 21:16:35 +0000 (17:16 -0400)]
tracing: use raw spinlocks for trace_vprintk

Impact: prevent locking up by lockdep tracer

The lockdep tracer uses trace_vprintk and thus trace_vprintk can not
call back into lockdep without locking up.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: remove funky whitespace in the trace code
Steven Rostedt [Tue, 10 Mar 2009 18:10:56 +0000 (14:10 -0400)]
tracing: remove funky whitespace in the trace code

Impact: clean up

There existed a lot of <space><tab>'s in the tracing code. This
patch removes them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: update comments to match event code macros
Steven Rostedt [Tue, 10 Mar 2009 17:12:58 +0000 (13:12 -0400)]
tracing: update comments to match event code macros

Impact: clean up / comments

The comments that described the ftrace macros to manipulate the
TRACE_EVENT and TRACE_FORMAT macros no longer match the code.
This patch updates them.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: document TRACE_EVENT macro in tracepoint.h
Steven Rostedt [Tue, 10 Mar 2009 16:58:51 +0000 (12:58 -0400)]
tracing: document TRACE_EVENT macro in tracepoint.h

Impact: clean up / comments

Kosaki Motohiro asked about an explanation to the TRACE_EVENT macro.
Ingo Molnar replied with a nice description.

This patch takes the description that Ingo wrote (with some slight
modifications) and adds it to the tracepoint.h file.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: flip the TP_printk and TP_fast_assign in the TRACE_EVENT macro
Steven Rostedt [Tue, 10 Mar 2009 16:41:38 +0000 (12:41 -0400)]
tracing: flip the TP_printk and TP_fast_assign in the TRACE_EVENT macro

Impact: clean up

In trying to stay consistant with the C style format in the TRACE_EVENT
macro, it makes more sense to do the printk after the assigning of
the variables.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: add back the available_events file
Steven Rostedt [Tue, 10 Mar 2009 16:04:02 +0000 (12:04 -0400)]
tracing: add back the available_events file

The event directory files type and available_types were no longer
needed with the new TRACE_EVENT_FORMAT macros, they were deleted.
But by accident the available_events file was also removed.
This patch brings it back.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: do not allow modifying the ftrace events via the event files
Steven Rostedt [Tue, 10 Mar 2009 15:32:40 +0000 (11:32 -0400)]
tracing: do not allow modifying the ftrace events via the event files

Impact: fix to prevent crash on calling NULL function pointer

The ftrace internal records have their format exported via the event
system under the ftrace subsystem. These are only for exporting the
format to allow binary readers to be able to parse them in a binary
output.

The ftrace subsystem events can only be enabled via the ftrace tracers
and do not have a registering function. The event files expect the
event record to have registering function and will call it directly.
Passing in a ftrace subsystem event will cause the kernel to crash
because it will execute a NULL pointer.

This patch prevents the ftrace subsystem from being viewable to the
event enabling files.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: fix printk format specifier
Steven Rostedt [Tue, 10 Mar 2009 14:14:35 +0000 (10:14 -0400)]
tracing: fix printk format specifier

Impact: clean up

The offsetof and sizeof are of type size_t, and instead of typecasting
them to unsigned int for printk formatting, one could just use %zu.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: remove obsolete TRACE_EVENT_FORMAT macro
Steven Rostedt [Tue, 10 Mar 2009 04:15:34 +0000 (00:15 -0400)]
tracing: remove obsolete TRACE_EVENT_FORMAT macro

Impact: clean up

The TRACE_EVENT_FORMAT macro is no longer used by trace points
and only the DECLARE_TRACE, TRACE_FORMAT or TRACE_EVENT macros should
be used by them. Although the TRACE_EVENT_FORMAT macro is still used
by the internal tracing utility, it should not be used in core
kernel code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: convert irq trace points to new macros
Steven Rostedt [Tue, 10 Mar 2009 03:23:30 +0000 (23:23 -0400)]
tracing: convert irq trace points to new macros

Impact: enhancement

Converted the two irq trace point macros. The entry macro copies
the name of the irq handler, thus it is better to simply use the
TRACE_FORMAT macro which uses the trace_printk.

The return of the handler does not need to record the name, thus
the faster C style handler is more approriate.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: convert the sched trace points to the TRACE_EVENT macros
Steven Rostedt [Tue, 10 Mar 2009 03:03:44 +0000 (23:03 -0400)]
tracing: convert the sched trace points to the TRACE_EVENT macros

Impact: enhancement

This patch converts the rest of the sched trace points to use the new
more powerful TRACE_EVENT macro.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: new format for specialized trace points
Steven Rostedt [Mon, 9 Mar 2009 21:14:30 +0000 (17:14 -0400)]
tracing: new format for specialized trace points

Impact: clean up and enhancement

The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
ability to save data as well as to print the record out. Working with
Ingo Molnar, we came up with a new format that is much more pleasing to
the eye of C developers. This new macro is more C style than the old
macro, and is more obvious to what it does.

Here's the example. The only updated macro in this patch is the
sched_switch trace point.

The old method looked like this:

 TRACE_EVENT_FORMAT(sched_switch,
        TP_PROTO(struct rq *rq, struct task_struct *prev,
                struct task_struct *next),
        TP_ARGS(rq, prev, next),
        TP_FMT("task %s:%d ==> %s:%d",
              prev->comm, prev->pid, next->comm, next->pid),
        TRACE_STRUCT(
                TRACE_FIELD(pid_t, prev_pid, prev->pid)
                TRACE_FIELD(int, prev_prio, prev->prio)
                TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
                                    next_comm,
                                    TP_CMD(memcpy(TRACE_ENTRY->next_comm,
                                                 next->comm,
                                                 TASK_COMM_LEN)))
                TRACE_FIELD(pid_t, next_pid, next->pid)
                TRACE_FIELD(int, next_prio, next->prio)
        ),
        TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d")
        );

The above method is hard to read and requires two format fields.

The new method:

 /*
  * Tracepoint for task switches, performed by the scheduler:
  *
  * (NOTE: the 'rq' argument is not used by generic trace events,
  *        but used by the latency tracer plugin. )
  */
 TRACE_EVENT(sched_switch,

TP_PROTO(struct rq *rq, struct task_struct *prev,
 struct task_struct *next),

TP_ARGS(rq, prev, next),

TP_STRUCT__entry(
__array( char, prev_comm, TASK_COMM_LEN )
__field( pid_t, prev_pid )
__field( int, prev_prio )
__array( char, next_comm, TASK_COMM_LEN )
__field( pid_t, next_pid )
__field( int, next_prio )
),

TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
__entry->next_comm, __entry->next_pid, __entry->next_prio),

TP_fast_assign(
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
)
 );

This macro is called TRACE_EVENT, it is broken up into 5 parts:

 TP_PROTO:        the proto type of the trace point
 TP_ARGS:         the arguments of the trace point
 TP_STRUCT_entry: the structure layout of the entry in the ring buffer
 TP_printk:       the printk format
 TP_fast_assign:  the method used to write the entry into the ring buffer

The structure is the definition of how the event will be saved in the
ring buffer. The printk is used by the internal tracing in case of
an oops, and the kernel needs to print out the format of the record
to the console. This the TP_printk gives a means to show the records
in a human readable format. It is also used to print out the data
from the trace file.

The TP_fast_assign is executed directly. It is basically like a C function,
where the __entry is the handle to the record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: use generic __stringify
Steven Rostedt [Mon, 9 Mar 2009 20:00:22 +0000 (16:00 -0400)]
tracing: use generic __stringify

Impact: clean up

This removes the custom made STR(x) macros in the tracer and uses
the generic __stringify macro instead.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: replace TP<var> with TP_<var>
Steven Rostedt [Mon, 9 Mar 2009 19:47:18 +0000 (15:47 -0400)]
tracing: replace TP<var> with TP_<var>

Impact: clean up

The macros TPPROTO, TPARGS, TPFMT, TPRAWFMT, and TPCMD all look a bit
ugly. This patch adds an underscore to their names.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: typecast sizeof and offsetof to unsigned int
Steven Rostedt [Fri, 6 Mar 2009 15:50:53 +0000 (10:50 -0500)]
tracing: typecast sizeof and offsetof to unsigned int

Impact: fix compiler warnings

On x86_64 sizeof and offsetof are treated as long, where as on x86_32
they are int. This patch typecasts them to unsigned int to avoid
one arch giving warnings while the other does not.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: optimize trace_printk()
Ingo Molnar [Mon, 9 Mar 2009 09:11:36 +0000 (10:11 +0100)]
tracing: optimize trace_printk()

Impact: micro-optimization

trace_printk() does this unconditionally:

trace_printk_fmt = fmt;

Where trace_printk_fmt is an entry into a global array. This is
very SMP-unfriendly.

So only write it once per bootup.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: trace_printk() fix, move format array to data section
Ingo Molnar [Mon, 9 Mar 2009 09:09:06 +0000 (10:09 +0100)]
tracing: trace_printk() fix, move format array to data section

Impact: fix kernel crash when using trace_printk()

trace_printk_fmt section is defined into the readonly section.
But we do:

trace_printk_fmt = fmt;

to fill in that table of format strings - which is not read-only.
Under CONFIG_DEBUG_RODATA=y this crashes ...

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: trace_bprintk() cleanups
Ingo Molnar [Fri, 6 Mar 2009 16:52:03 +0000 (17:52 +0100)]
tracing: trace_bprintk() cleanups

Impact: cleanup

Remove a few leftovers and clean up the code a bit.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()
Frederic Weisbecker [Fri, 6 Mar 2009 16:21:49 +0000 (17:21 +0100)]
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()

Impact: faster and lighter tracing

Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.

Some changes result of this:

- Previously, trace_bprintk depended of a single tracer and couldn't
  work without. This tracer has been dropped and the whole implementation
  of trace_printk() (like the module formats management) is now integrated
  in the tracing core (comes with CONFIG_TRACING), though we keep the file
  trace_printk (previously trace_bprintk.c) where we can find the module
  management. Thus we don't overflow trace.c

- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
  constants, and fix 'const' qualifiers warnings. But this is all transparent for
  developers.

- etc...

V2:

- Rebase against last changes
- Fix mispell on the changelog

V3:

- Rebase against last changes (moving trace_printk() to kernel.h)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: add trace_bprintk()
Lai Jiangshan [Fri, 6 Mar 2009 16:21:48 +0000 (17:21 +0100)]
tracing: add trace_bprintk()

Impact: add a generic printk() for tracing, like trace_printk()

trace_bprintk() uses the infrastructure to record events on ring_buffer.

[ fweisbec@gmail.com: ported to latest -tip, made it work if
  !CONFIG_MODULES, never free the format strings from modules
  because we can't keep track of them and conditionnaly create
  the ftrace format strings section (reported by Steven Rostedt) ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: infrastructure for supporting binary record
Lai Jiangshan [Fri, 6 Mar 2009 16:21:47 +0000 (17:21 +0100)]
tracing: infrastructure for supporting binary record

Impact: save on memory for tracing

Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.

So we need a generic binary record for events. This infrastructure
is for this purpose.

[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'core/printk' into tracing/ftrace
Ingo Molnar [Fri, 6 Mar 2009 16:45:42 +0000 (17:45 +0100)]
Merge branch 'core/printk' into tracing/ftrace

15 years agovsprintf: unify the format decoding layer for its 3 users
Frederic Weisbecker [Fri, 6 Mar 2009 16:21:50 +0000 (17:21 +0100)]
vsprintf: unify the format decoding layer for its 3 users

An new optimization is making its way to ftrace. Its purpose is to
make trace_printk() consuming less memory and be faster.

Written by Lai Jiangshan, the approach is to delay the formatting
job from tracing time to output time.

Currently, a call to trace_printk() will format the whole string and
insert it into the ring buffer. Then you can read it on /debug/tracing/trace
file.

The new implementation stores the address of the format string and
the binary parameters into the ring buffer, making the packet more compact
and faster to insert.
Later, when the user exports the traces, the format string is retrieved
with the binary parameters and the formatting job is eventually done.

The new implementation rewrites a lot of format decoding bits from
vsnprintf() function, making now 3 differents functions to maintain
in their duplicated parts of printf format decoding bits.

Suggested by Ingo Molnar, this patch tries to factorize the most
possible common bits from these functions.
The real common part between them is the format decoding. Although
they do somewhat similar jobs, their way to export or import the parameters
is very different. Thus, only the decoding layer is extracted, unless you see
other parts that could be worth factorized.

Changes in V2:

- Address a suggestion from Linus to group the format_decode() parameters inside
  a structure.

Changes in v3:

- Address other cleanups suggested by Ingo and Linus such as passing the
  printf_spec struct to the format helpers: pointer()/number()/string()
  Note that this struct is passed by copy and not by address. This is to
  avoid side effects because these functions often change these values and the
  changes shoudn't be persistant when a callee helper returns.
  It would be too risky.

- Various cleanups (code alignement, switch/case instead of if/else fountains).

- Fix a bug that printed the first format specifier following a %p

Changes in v4:

- drop unapropriate const qualifier loss while casting fmt to a char *
  (thanks to Vegard Nossum for having pointed this out).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-6-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agovsprintf: add binary printf
Lai Jiangshan [Fri, 6 Mar 2009 16:21:46 +0000 (17:21 +0100)]
vsprintf: add binary printf

Impact: add new APIs for binary trace printk infrastructure

vbin_printf(): write args to binary buffer, string is copied
when "%s" is occurred.

bstr_printf(): read from binary buffer for args and format a string

[fweisbec@gmail.com: rebase]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1236356510-8381-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing, power-trace: make it build even if the power-tracer is turned off
Ingo Molnar [Fri, 6 Mar 2009 11:47:08 +0000 (12:47 +0100)]
tracing, power-trace: make it build even if the power-tracer is turned off

Impact: build fix

The 'struct power_trace' definition is needed (for the event tracer) even if
the power-tracer plugin is turned off in the .config.

Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <20090306104106.GF31042@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: fix deadlock when setting set_ftrace_pid
KOSAKI Motohiro [Fri, 6 Mar 2009 06:29:04 +0000 (15:29 +0900)]
tracing: fix deadlock when setting set_ftrace_pid

Impact: fix deadlock while using set_ftrace_pid

Reproducer:

# cd /sys/kernel/debug/tracing
# echo $$ > set_ftrace_pid

then, console becomes hung.

Details:

when writing set_ftracepid, kernel callstack is following

ftrace_pid_write()
mutex_lock(&ftrace_lock);
ftrace_update_pid_func()
mutex_lock(&ftrace_lock);
mutex_unlock(&ftrace_lock);
mutex_unlock(&ftrace_lock);

then, system always deadlocks when ftrace_pid_write() is called.

In past days, ftrace_pid_write() used ftrace_start_lock, but
commit e6ea44e9b4c12325337cd1c06103cd515a1c02b2 consolidated
ftrace_start_lock to ftrace_lock.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <20090306151155.0778.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: current tip/master can't enable ftrace
KOSAKI Motohiro [Fri, 6 Mar 2009 01:40:53 +0000 (10:40 +0900)]
tracing: current tip/master can't enable ftrace

After commit 40ada30f9621fbd831ac2437b9a2a399aad34b00,
"make menuconfig" doesn't display "Tracer" item.

Following modification restores it.

15 years agoMerge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Fri, 6 Mar 2009 10:40:37 +0000 (11:40 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agoMerge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing...
Ingo Molnar [Fri, 6 Mar 2009 10:39:18 +0000 (11:39 +0100)]
Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core

15 years agotracing: add format files for ftrace default entries
Steven Rostedt [Fri, 6 Mar 2009 02:35:29 +0000 (21:35 -0500)]
tracing: add format files for ftrace default entries

Impact: allow user apps to read binary format of basic ftrace entries

Currently, only defined raw events export their formats so a binary
reader can parse them. There's no reason that the default ftrace entries
can't export their formats.

This patch adds a subsystem called "ftrace" in the events directory
that includes the ftrace entries for basic ftrace recorded items.

These only have three files in the events directory:

 type             : printf
 available_types  : printf
 format           : format for the event entry

For example:

 # cat /debug/tracing/events/ftrace/wakeup/format
name: wakeup
ID: 3
format:
        field:unsigned char type;       offset:0;       size:1;
        field:unsigned char flags;      offset:1;       size:1;
        field:unsigned char preempt_count;      offset:2;       size:1;
        field:int pid;  offset:4;       size:4;
        field:int tgid; offset:8;       size:4;

        field:unsigned int prev_pid;    offset:12;      size:4;
        field:unsigned char prev_prio;  offset:16;      size:1;
        field:unsigned char prev_state; offset:17;      size:1;
        field:unsigned int next_pid;    offset:20;      size:4;
        field:unsigned char next_prio;  offset:24;      size:1;
        field:unsigned char next_state; offset:25;      size:1;
        field:unsigned int next_cpu;    offset:28;      size:4;

print fmt: "%u:%u:%u  ==+ %u:%u:%u [%03u]"

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: move print of event format to separate file
Steven Rostedt [Thu, 5 Mar 2009 16:45:43 +0000 (11:45 -0500)]
tracing: move print of event format to separate file

Impact: clean up

Move the macro that creates the event format file to a separate header.
This will allow the default ftrace events to use this same macro
to create the formats to read those events.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: make all file_operations const
Steven Rostedt [Fri, 6 Mar 2009 02:44:55 +0000 (21:44 -0500)]
tracing: make all file_operations const

Impact: cleanup

All file_operations structures should be constant. No one is going to
change them.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: clean up menu
Ingo Molnar [Thu, 5 Mar 2009 20:19:55 +0000 (21:19 +0100)]
tracing: clean up menu

Clean up menu structure, introduce TRACING_SUPPORT switch that signals
whether an architecture supports various instrumentation mechanisms.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: add tracing_on/tracing_off to kernel.h
Steven Rostedt [Thu, 5 Mar 2009 15:35:56 +0000 (10:35 -0500)]
tracing: add tracing_on/tracing_off to kernel.h

Impact: cleanup

The functions tracing_start/tracing_stop have been moved to kernel.h.
These are not the functions a developer most likely wants to use
when they want to insert a place to stop tracing and restart it from
user space.

tracing_start/tracing_stop was created to work with things like
suspend to ram, where even calling smp_processor_id() can crash the
system. The tracing_start/tracing_stop was used to stop the tracer from
doing anything. These are still light weight functions, but add a bit
more overhead to be able to stop the tracers. They also have no interface
back to userland. That is, if the kernel calls tracing_stop, userland
can not start tracing.

What a developer most likely wants to use is tracing_on/tracing_off.
These are very light weight functions (simply sets or clears a bit).
These functions just stop recording into the ring buffer. The tracers
don't even know that this happens except that they would receive NULL
from the ring_buffer_lock_reserve function.

Also, there's a way for the user land to enable or disable this bit.
In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
or echo "1" (same as tracing_on()) into this file. This becomes handy when
a kernel developer is debugging and wants tracing to turn off when it
hits an anomaly. Then the developer can examine the trace, and restart
tracing if they want to try again (echo 1 > tracing_on).

This patch moves the prototypes for tracing_on/tracing_off to kernel.h
and comments their use, so that a kernel developer will know how
to use them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing/function-graph-tracer: use the more lightweight local clock
Frederic Weisbecker [Thu, 5 Mar 2009 00:49:22 +0000 (01:49 +0100)]
tracing/function-graph-tracer: use the more lightweight local clock

Impact: decrease hangs risks with the graph tracer on slow systems

Since the function graph tracer can spend too much time on timer
interrupts, it's better now to use the more lightweight local
clock. Anyway, the function graph traces are more reliable on a
per cpu trace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: move utility functions from ftrace.h to kernel.h
Ingo Molnar [Thu, 5 Mar 2009 09:28:45 +0000 (10:28 +0100)]
tracing: move utility functions from ftrace.h to kernel.h

Make common utility functions such as trace_printk() and
tracing_start()/tracing_stop() generally available to kernel
code.

Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agotracing: rename ftrace_printk() => trace_printk()
Ingo Molnar [Thu, 5 Mar 2009 09:24:48 +0000 (10:24 +0100)]
tracing: rename ftrace_printk() => trace_printk()

Impact: cleanup

Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Thu, 5 Mar 2009 09:21:49 +0000 (10:21 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agoMerge branch 'tracing/ftrace' into tracing/core
Ingo Molnar [Thu, 5 Mar 2009 09:20:47 +0000 (10:20 +0100)]
Merge branch 'tracing/ftrace' into tracing/core

15 years agotracing: have latency tracers set the latency format
Steven Rostedt [Thu, 5 Mar 2009 03:15:30 +0000 (22:15 -0500)]
tracing: have latency tracers set the latency format

The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup)
are pretty useless with the default output format. This patch makes them
automatically enable the latency format when they are selected. They
also record the state of the latency option, and if it was not enabled
when selected, they disable it on reset.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: consolidate print_lat_fmt and print_trace_fmt
Steven Rostedt [Thu, 5 Mar 2009 02:57:29 +0000 (21:57 -0500)]
tracing: consolidate print_lat_fmt and print_trace_fmt

Impact: clean up

Both print_lat_fmt and print_trace_fmt do pretty much the same thing
except for one different function call. This patch consolidates the
two functions and adds an if statement to perform the difference.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: remove extra latency_trace method from trace structure
Steven Rostedt [Thu, 5 Mar 2009 02:42:04 +0000 (21:42 -0500)]
tracing: remove extra latency_trace method from trace structure

Impact: clean up

The trace and latency_trace function pointers are identical for
every tracer but the function tracer. The differences in the function
tracer are trivial (latency output puts paranthesis around parent).

This patch removes the latency_trace pointer and all prints will
now just use the trace output function pointer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: add latency output format option
Steven Rostedt [Thu, 5 Mar 2009 01:34:24 +0000 (20:34 -0500)]
tracing: add latency output format option

With the removal of the latency_trace file, we lost the ability
to see some of the finer details in a trace. Like the state of
interrupts enabled, the preempt count, need resched, and if we
are in an interrupt handler, softirq handler or not.

This patch simply creates an option to bring back the old format.
This also removes the warning about an unused variable that held
the latency_trace file operations.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: fix seq read from trace files
Steven Rostedt [Thu, 5 Mar 2009 01:31:11 +0000 (20:31 -0500)]
tracing: fix seq read from trace files

The buffer used by trace_seq was updated incorrectly. Instead
of consuming what was actually read, it consumed the rest of the
buffer on reads.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: do not return EFAULT if read copied anything
Steven Rostedt [Thu, 5 Mar 2009 00:10:05 +0000 (19:10 -0500)]
tracing: do not return EFAULT if read copied anything

Impact: fix trace read to conform to standards

Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention
that a userspace read should not return -EFAULT if it succeeded in
copying anything. It should only return -EFAULT if it failed to copy
at all.

This patch modifies the check of copy_from_user and updates the return
code appropriately.

I also used H. Peter Anvin's short cut rule to just test ret == count.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoring-buffer: fix timestamp in partial ring_buffer_page_read
Steven Rostedt [Wed, 4 Mar 2009 04:52:42 +0000 (23:52 -0500)]
ring-buffer: fix timestamp in partial ring_buffer_page_read

If a partial ring_buffer_page_read happens, then some of the
incremental timestamps may be lost. This patch writes the
recent timestamp into the page that is passed back to the caller.

A partial ring_buffer_page_read is where the full page would not
be written back to the user, and instead, just part of the page
is copied to the user. A full page would be a page swap with the
ring buffer and the timestamps would be correct.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: add cpu_file intialization for ftrace_dump
Steven Rostedt [Wed, 4 Mar 2009 23:20:36 +0000 (18:20 -0500)]
tracing: add cpu_file intialization for ftrace_dump

Impact: fix to ftrace_dump output corruption

The commit: b04cc6b1f6398b0e0b60d37e27ce51b4899672ec
  tracing/core: introduce per cpu tracing files

added a new field to the iterator called cpu_file. This was a handle
to differentiate between the per cpu trace output files and the
all cpu "trace" file. The all cpu "trace" file required setting this
to TRACE_PIPE_ALL_CPU.

The problem is that the ftrace_dump sets up its own iterator but was
not updated to handle this change. The result was only CPU 0 printing
out on crash and a lot of "<0>"'s also being printed.

Reported-by: Thomas Gleixner <tglx@linuxtronix.de>
Tested-by: Darren Hart <dvhtc@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agotracing: add lockdep tracepoints for lock acquire/release
Peter Zijlstra [Wed, 4 Mar 2009 11:32:55 +0000 (12:32 +0100)]
tracing: add lockdep tracepoints for lock acquire/release

Augment the traces with lock names when lockdep is available:

 1)               |  down_read_trylock() {
 1)               |    _spin_lock_irqsave() {
 1)               |      /* lock_acquire: &sem->wait_lock */
 1)   4.201 us    |    }
 1)               |    _spin_unlock_irqrestore() {
 1)               |      /* lock_release: &sem->wait_lock */
 1)   3.523 us    |    }
 1)               |  /* lock_acquire: try read &mm->mmap_sem */
 1) + 13.386 us   |  }
 1)   1.635 us    |  find_vma();
 1)               |  handle_mm_fault() {
 1)               |    __do_fault() {
 1)               |      filemap_fault() {
 1)               |        find_lock_page() {
 1)               |          find_get_page() {
 1)               |            /* lock_acquire: read rcu_read_lock */
 1)               |            /* lock_release: rcu_read_lock */
 1)   5.697 us    |          }
 1)   8.158 us    |        }
 1) + 11.079 us   |      }
 1)               |      _spin_lock() {
 1)               |        /* lock_acquire: __pte_lockptr(page) */
 1)   3.949 us    |      }
 1)   1.460 us    |      page_add_file_rmap();
 1)               |      _spin_unlock() {
 1)               |        /* lock_release: __pte_lockptr(page) */
 1)   3.115 us    |      }
 1)               |      unlock_page() {
 1)   1.421 us    |        page_waitqueue();
 1)   1.220 us    |        __wake_up_bit();
 1)   6.519 us    |      }
 1) + 34.328 us   |    }
 1) + 37.452 us   |  }
 1)               |  up_read() {
 1)               |  /* lock_release: &mm->mmap_sem */
 1)               |    _spin_lock_irqsave() {
 1)               |      /* lock_acquire: &sem->wait_lock */
 1)   3.865 us    |    }
 1)               |    _spin_unlock_irqrestore() {
 1)               |      /* lock_release: &sem->wait_lock */
 1)   8.562 us    |    }
 1) + 17.370 us   |  }

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1236166375.5330.7209.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'core/locking' into tracing/ftrace
Ingo Molnar [Wed, 4 Mar 2009 17:49:19 +0000 (18:49 +0100)]
Merge branch 'core/locking' into tracing/ftrace

15 years agolockdep: require framepointers for x86
Peter Zijlstra [Wed, 4 Mar 2009 11:27:27 +0000 (12:27 +0100)]
lockdep: require framepointers for x86

Require framepointers for x86, because otherwise we'll be having
empty stack traces, which is useless.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236167295.5330.7240.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: remove extra "irq" string
Peter Zijlstra [Wed, 4 Mar 2009 13:53:24 +0000 (14:53 +0100)]
lockdep: remove extra "irq" string

Impact: clarify lockdep printk text

print_irq_inversion_bug() gets handed state strings of the form

  "HARDIRQ", "SOFTIRQ", "RECLAIM_FS"

and appends "-irq-{un,}safe" to them, which is either redudant for *IRQ or
confusing in the RECLAIM_FS case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236175192.5330.7585.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agolockdep: fix incorrect state name
Peter Zijlstra [Wed, 4 Mar 2009 12:51:13 +0000 (13:51 +0100)]
lockdep: fix incorrect state name

In the recent mark_lock_irq() rework a bug snuck in that would report the
state of write locks causing irq inversion under a read lock as a read
lock.

Fix this by masking the read bit of the state when validating write
dependencies.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236172646.5330.7450.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'rfc/splice/tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Wed, 4 Mar 2009 10:15:42 +0000 (11:15 +0100)]
Merge branch 'rfc/splice/tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agoMerge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core
Ingo Molnar [Wed, 4 Mar 2009 10:14:47 +0000 (11:14 +0100)]
Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core

15 years agotracing: add binary buffer files for use with splice
Steven Rostedt [Tue, 2 Dec 2008 03:20:19 +0000 (22:20 -0500)]
tracing: add binary buffer files for use with splice

Impact: new feature

This patch creates a directory of files that correspond to the
per CPU ring buffers. These are binary files and are made to
be used with splice. This is the fastest way to extract data from
the ftrace ring buffers.

Thanks to Jiaying Zhang for pushing me to get this code fixed,
 and to Eduard - Gabriel Munteanu for his splice code that helped
 me debug my code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoring-buffer: make ring_buffer_read_page read from start on partial page
Steven Rostedt [Wed, 4 Mar 2009 00:51:40 +0000 (19:51 -0500)]
ring-buffer: make ring_buffer_read_page read from start on partial page

Impact: dont leave holes in read buffer page

The ring_buffer_read_page swaps a given page with the reader page
of the ring buffer, if certain conditions are set:

 1) requested length is big enough to hold entire page data

 2) a writer is not currently on the page

 3) the page is not partially consumed.

Instead of swapping with the supplied page. It copies the data to
the supplied page instead. But currently the data is copied in the
same offset as the source page. This causes a hole at the start
of the reader page. This complicates the use of this function.
Instead, it should copy the data at the beginning of the function
and update the index fields accordingly.

Other small clean ups are also done in this patch.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoring-buffer: replace sizeof of event header with offsetof
Steven Rostedt [Tue, 3 Mar 2009 18:53:07 +0000 (13:53 -0500)]
ring-buffer: replace sizeof of event header with offsetof

Impact: fix to possible alignment problems on some archs.

Some arch compilers include an NULL char array in the sizeof field.
Since the ring_buffer_event type includes one of these, it is better
to use the "offsetof" instead, to avoid strange bugs on these archs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoring-buffer: fix ring_buffer_read_page
Steven Rostedt [Tue, 3 Mar 2009 05:27:49 +0000 (00:27 -0500)]
ring-buffer: fix ring_buffer_read_page

The ring_buffer_read_page was broken if it were to only copy part
of the page. This patch fixes that up as well as adds a parameter
to allow a length field, in order to only copy part of the buffer page.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoring-buffer: reset write field for ring_buffer_read_page
Steven Rostedt [Tue, 3 Mar 2009 01:56:48 +0000 (20:56 -0500)]
ring-buffer: reset write field for ring_buffer_read_page

Impact: fix ring_buffer_read_page

After a page is swapped into the ring buffer, the write field must
also be reset.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoLinux 2.6.29-rc7 v2.6.29-rc7
Linus Torvalds [Wed, 4 Mar 2009 01:05:22 +0000 (17:05 -0800)]
Linux 2.6.29-rc7

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 4 Mar 2009 01:05:08 +0000 (17:05 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ixp4xx - Fix qmgr_request_queue build failure
  crypto: api - Fix module load deadlock with fallback algorithms

15 years agocrypto: ixp4xx - Fix qmgr_request_queue build failure
Krzysztof Hałasa [Wed, 4 Mar 2009 00:01:22 +0000 (08:01 +0800)]
crypto: ixp4xx - Fix qmgr_request_queue build failure

There is another user of IXP4xx queue manager, fix it.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
15 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 3 Mar 2009 22:33:20 +0000 (14:33 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: don't allow setuid to succeed if the user does not have rt bandwidth
  sched_rt: don't start timer when rt bandwidth disabled

15 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 3 Mar 2009 22:32:55 +0000 (14:32 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: oprofile: don't set counter width from cpuid on Core2
  x86: fix init_memory_mapping() to handle small ranges

15 years agoMerge branch 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 3 Mar 2009 22:32:37 +0000 (14:32 -0800)]
Merge branch 'tracing/mmiotrace' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86 mmiotrace: fix race with release_kmmio_fault_page()
  x86 mmiotrace: improve handling of secondary faults
  x86 mmiotrace: split set_page_presence()
  x86 mmiotrace: fix save/restore page table state
  x86 mmiotrace: WARN_ONCE if dis/arming a page fails
  x86: add far read test to testmmiotrace
  x86: count errors in testmmiotrace.ko

15 years agoMerge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 3 Mar 2009 22:32:04 +0000 (14:32 -0800)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Teach RCU that idle task is not quiscent state at boot

15 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Tue, 3 Mar 2009 22:12:41 +0000 (14:12 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] fix lots of ARM __devexit sillyness
  [ARM] 5417/1: Set the correct cacheid for ARMv6 CPUs with ARMv7 style MMU
  [ARM] 5416/1: Use unused address in v6_early_abort
  [ARM] 5411/1: S3C64XX: Fix EINT unmask
  [ARM] at91: fix for Atmel AT91 powersaving
  [ARM] RiscPC: Fix etherh oops

15 years ago[ARM] fix lots of ARM __devexit sillyness
Russell King [Tue, 3 Mar 2009 13:43:47 +0000 (13:43 +0000)]
[ARM] fix lots of ARM __devexit sillyness

`iop_adma_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv_xor_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv64xxx_i2c_unmap_regs' referenced in section `.devinit.text' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`mv64xxx_i2c_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`orion_nand_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
`pxafb_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
15 years agoMerge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Tue, 3 Mar 2009 15:02:16 +0000 (16:02 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agotracing: fix return value to registering events
Steven Rostedt [Tue, 3 Mar 2009 14:43:50 +0000 (09:43 -0500)]
tracing: fix return value to registering events

The registering of events had the return value check backwards.
A zero returned is success, the check had it as a failure.

This patch also fixes a missing "\n" in the warning that the check
failed.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years ago[ARM] 5417/1: Set the correct cacheid for ARMv6 CPUs with ARMv7 style MMU
Catalin Marinas [Tue, 3 Mar 2009 10:44:12 +0000 (11:44 +0100)]
[ARM] 5417/1: Set the correct cacheid for ARMv6 CPUs with ARMv7 style MMU

The cacheid_init() function assumes that if cpu_architecture() returns
7, the caches are VIPT_NONALIASING. The cpu_architecture() function
returns the version of the supported MMU features (e.g. TEX remapping)
but it doesn't make any assumptions about the cache type. The patch adds
the checking of the Cache Type Register for the ARMv7 format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
15 years ago[ARM] 5416/1: Use unused address in v6_early_abort
Seth Forshee [Mon, 2 Mar 2009 21:39:36 +0000 (22:39 +0100)]
[ARM] 5416/1: Use unused address in v6_early_abort

The target of the strex instruction to clear the exlusive monitor
is currently the top of the stack.  If the store succeeeds this
corrupts r0 in pt_regs.  Use the next stack location instead of
the current one to prevent any chance of corrupting an in-use
address.

Signed-off-by: Seth Forshee <seth.forshee@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
15 years agox86: oprofile: don't set counter width from cpuid on Core2
Tim Blechmann [Thu, 19 Feb 2009 16:34:03 +0000 (17:34 +0100)]
x86: oprofile: don't set counter width from cpuid on Core2

Impact: fix stuck NMIs and non-working oprofile on certain CPUs

Resetting the counter width of the performance counters on Intel's
Core2 CPUs, breaks the delivery of NMIs, when running in x86_64 mode.

This should fix bug #12395:

  http://bugzilla.kernel.org/show_bug.cgi?id=12395

Signed-off-by: Tim Blechmann <tim@klingt.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <20090303100412.GC10085@erda.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: fix init_memory_mapping() to handle small ranges
Yinghai Lu [Tue, 3 Mar 2009 07:36:13 +0000 (23:36 -0800)]
x86: fix init_memory_mapping() to handle small ranges

Impact: fix failed EFI bootup in certain circumstances

Ying Huang found init_memory_mapping() has problem with small ranges
less than 2M when he tried to direct map the EFI runtime code out of
max_low_pfn_mapped.

It turns out we never considered that case and didn't check the range...

Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Maly <bmaly@redhat.com>
LKML-Reference: <49ACDDED.1060508@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoRevert "menu: fix embedded menu snafu"
Linus Torvalds [Tue, 3 Mar 2009 00:23:33 +0000 (16:23 -0800)]
Revert "menu: fix embedded menu snafu"

This reverts commit 155b25bcc28631a5b5230191aa3f56c40dfffa3f, which was
totally wrong - the "embedded" options still exists (very much so) even
on non-embedded platforms.

It's just that we don't bother with actually asking about them when
we're not embedded, we just take their default values (which is usually
'y' - the options add features that may not be worth it in a constrained
environment).

Noticed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Tue, 3 Mar 2009 00:11:36 +0000 (16:11 -0800)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/i915: Fix use-before-null-check in i915_irq_emit().
  drm: Avoid client deadlocks when the master disappears.
  drm: Wake up all lock waiters when the master disappears.
  drm: Don't return ERESTARTSYS to user-space.

15 years agodrm/i915: Fix use-before-null-check in i915_irq_emit().
Eric Anholt [Wed, 25 Feb 2009 06:14:12 +0000 (22:14 -0800)]
drm/i915: Fix use-before-null-check in i915_irq_emit().

This could be triggered by a client asking to emit an irq when the device
wasn't initialized.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
15 years agodrm: Avoid client deadlocks when the master disappears.
Thomas Hellstrom [Mon, 2 Mar 2009 10:10:56 +0000 (11:10 +0100)]
drm: Avoid client deadlocks when the master disappears.

This is done by
1) Wake up lock waiters when we close the master file descriptor.
   Not when the master structure is removed, since the latter
   requires the waiters themselves to release the refcount on the
   master structure -> Deadlock.
2) Send a SIGTERM to all clients waiting for the lock.
   Normally these clients will get a SIGPIPE when the X server dies,
   but clients may also spin trying to grab the DRM lock, without
   getting any sort of notification.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
15 years agodrm: Wake up all lock waiters when the master disappears.
Thomas Hellstrom [Mon, 2 Mar 2009 10:10:55 +0000 (11:10 +0100)]
drm: Wake up all lock waiters when the master disappears.

Currently only one waiter is woken up, leaving other waiters
hanging waiting for the DRM lock.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
15 years agodrm: Don't return ERESTARTSYS to user-space.
Thomas Hellstrom [Mon, 2 Mar 2009 10:10:54 +0000 (11:10 +0100)]
drm: Don't return ERESTARTSYS to user-space.

That return code is for in-kernel use only.
Use EINTR instead.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
15 years agomenu: fix embedded menu snafu
Randy Dunlap [Mon, 2 Mar 2009 22:14:06 +0000 (14:14 -0800)]
menu: fix embedded menu snafu

The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED
menu.  Fix by moving it to just after all of the EMBEDDED menu symbols.

Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so
that this EMBEDDED block is clearer.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
Linus Torvalds [Mon, 2 Mar 2009 23:48:00 +0000 (15:48 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/drzeus/mmc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  sdhci: Add NO_BUSY_IRQ quirk for Marvell CAFE host chip
  sdhci: Add quirk for controllers with no end-of-busy IRQ

15 years agoMerge branch 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Mon, 2 Mar 2009 23:47:19 +0000 (15:47 -0800)]
Merge branch 'fix/hda' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Add probe_mask default for Toshiba laptop with ALC268
  ALSA: hda - Add quirk for new HP xw series
  ALSA: hda - Fix digital mic on dell-m4-1 and dell-m4-3

15 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 2 Mar 2009 23:47:01 +0000 (15:47 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  fix warning in io_mapping_map_wc()
  x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Mon, 2 Mar 2009 23:46:09 +0000 (15:46 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  zaurus: add usb id for motomagx phones
  usbnet: make usbnet_get_link() fall back to ethtool_op_get_link()
  veth: Fix carrier detect
  cdc_ether: add usb id for Ericsson F3507g
  r8169: read MAC address from EEPROM on init (2nd attempt)
  tcp: fix retrans_out leaks
  net headers: export dcbnl.h
  net headers: cleanup dcbnl.h
  netpoll: Add drop checks to all entry points
  gianfar: Do right check on num_txbdfree
  pkt_sched: sch_drr: Fix oops in drr_change_class.
  b44: Disable device on shutdown
  b44: Unconditionally enable interrupt routing on reset
  net: fix hp-plus build error
  libertas: fix misuse of netdev_priv() and dev->ml_priv
  ipv6: don't use tw net when accounting for recycled tw
  asix: new device ids
  tcp_scalable: Update malformed & dead url
  netfilter: xt_recent: fix proc-file addition/removal of IPv4 addresses
  netxen: handle pci bar 0 mapping failure
  ...

15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Mon, 2 Mar 2009 23:44:08 +0000 (15:44 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  selinux: Fix a panic in selinux_netlbl_inode_permission()

15 years agoChange email address
Karsten Keil [Sun, 1 Mar 2009 17:04:53 +0000 (18:04 +0100)]
Change email address

Since I will loose the old address soon, please change it.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 2 Mar 2009 23:43:03 +0000 (15:43 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elantech - touchpad driver miss-recognising logitech mice
  Input: synaptics - ensure we reset the device on resume
  Input: usbtouchscreen - fix eGalax HID ignoring
  Input: ambakmi - fix timeout handling in amba_kmi_write()
  Input: pxa930_trkball - fix write timeout handling
  Input: struct device - replace bus_id with dev_name(), dev_set_name()
  Input: bf54x-keys - fix debounce time validation
  Input: spitzkbd - mark probe function as __devinit
  Input: omap-keypad - mark probe function as __devinit
  Input: corgi_ts - mark probe function as __devinit
  Input: corgikbd - mark probe function as __devinit
  Input: uvc - the button on the camera is KEY_CAMERA
  Input: psmouse - make MOUSE_PS2_LIFEBOOK depend on X86
  Input: atkbd - make forced_release_keys[] static
  Input: usbtouchscreen - allow reporting calibrated data

15 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Linus Torvalds [Mon, 2 Mar 2009 23:42:26 +0000 (15:42 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't call jbd2_journal_force_commit_nested without journal
  ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2
  ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze()

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
Linus Torvalds [Mon, 2 Mar 2009 23:41:59 +0000 (15:41 -0800)]
Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] mpt: fix disable lsi sas to use msi as default
  [SCSI] fix ABORTED_COMMAND looping forever problem
  [SCSI] sd: revive sd_index_lock
  [SCSI] cxgb3i: update the driver version to 1.0.1
  [SCSI] cxgb3i: Fix spelling errors in documentation
  [SCSI] cxgb3i: added missing include in cxgb3i_ddp.h
  [SCSI] cxgb3i: Outgoing pdus need to observe skb's MAX_SKB_FRAGS
  [SCSI] cxgb3i: added per-task data to track transmit progress
  [SCSI] cxgb3i: transmit work-request fixes
  [SCSI] hptiop: Add new PCI device ID

15 years agox86-64: seccomp: fix 32/64 syscall hole
Roland McGrath [Sat, 28 Feb 2009 07:25:54 +0000 (23:25 -0800)]
x86-64: seccomp: fix 32/64 syscall hole

On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

/* test case for seccomp circumvention on x86-64

   There are two failure modes: compile with -m64 or compile with -m32.

   The -m64 case is the worst one, because it does "chmod 777 ." (could
   be any chmod call).  The -m32 case demonstrates it was able to do
   stat(), which can glean information but not harm anything directly.

   A buggy kernel will let the test do something, print, and exit 1; a
   fixed kernel will make it exit with SIGKILL before it does anything.
*/

#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/prctl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>

int
main (int argc, char **argv)
{
  char buf[100];
  static const char dot[] = ".";
  long ret;
  unsigned st[24];

  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

#ifdef __x86_64__
  assert ((uintptr_t) dot < (1UL << 32));
  asm ("int $0x80 # %0 <- %1(%2 %3)"
       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
  ret = snprintf (buf, sizeof buf,
  "result %ld (check mode on .!)\n", ret);
#elif defined __i386__
  asm (".code32\n"
       "pushl %%cs\n"
       "pushl $2f\n"
       "ljmpl $0x33, $1f\n"
       ".code64\n"
       "1: syscall # %0 <- %1(%2 %3)\n"
       "lretl\n"
       ".code32\n"
       "2:"
       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
  if (ret == 0)
    ret = snprintf (buf, sizeof buf,
    "stat . -> st_uid=%u\n", st[7]);
  else
    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
#else
# error "not this one"
#endif

  write (1, buf, ret);

  syscall (__NR_exit, 1);
  return 2;
}

Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agox86-64: syscall-audit: fix 32/64 syscall hole
Roland McGrath [Sat, 28 Feb 2009 03:03:24 +0000 (19:03 -0800)]
x86-64: syscall-audit: fix 32/64 syscall hole

On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases, audit_syscall_entry() will use the wrong system
call number table and the wrong system call argument registers.  This
could be used to circumvent a syscall audit configuration that filters
based on the syscall numbers or argument details.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Mon, 2 Mar 2009 22:14:02 +0000 (23:14 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agotracing: make CALLER_ADDRx overwriteable
Uwe Kleine-Koenig [Fri, 27 Feb 2009 20:30:03 +0000 (21:30 +0100)]
tracing: make CALLER_ADDRx overwriteable

The current definition of CALLER_ADDRx isn't suitable for all platforms.
E.g. for ARM __builtin_return_address(N) doesn't work for N > 0 and
AFAIK for powerpc there are no frame pointers needed to have a working
__builtin_return_address.  This patch allows defining the CALLER_ADDRx
macros in <asm/ftrace.h> and let these take precedence.

Because now <asm/ftrace.h> is included unconditionally in
<linux/ftrace.h> all archs that don't already had this include get an
empty one for free.

Signed-off-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
15 years agoMerge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Mon, 2 Mar 2009 21:38:31 +0000 (22:38 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

15 years agoMerge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core
Ingo Molnar [Mon, 2 Mar 2009 21:37:35 +0000 (22:37 +0100)]
Merge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core