safe/jmp/linux-2.6
15 years agoKVM: s390: dont allocate dirty bitmap
Carsten Otte [Fri, 27 Jun 2008 13:05:31 +0000 (15:05 +0200)]
KVM: s390: dont allocate dirty bitmap

This patch #ifdefs the bitmap array for dirty tracking. We don't have dirty
tracking on s390 today, and we'd love to use our storage keys to store the
dirty information for migration. Therefore, we won't need this array at all,
and due to our limited amount of vmalloc space this limits the amount of guests
we can run.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: move slots_lock acquision down to vapic_exit
Marcelo Tosatti [Mon, 23 Jun 2008 15:04:25 +0000 (12:04 -0300)]
KVM: move slots_lock acquision down to vapic_exit

There is no need to grab slots_lock if the vapic_page will not
be touched.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: Fake emulate Intel perfctr MSRs
Chris Lalancette [Fri, 20 Jun 2008 07:51:30 +0000 (09:51 +0200)]
KVM: VMX: Fake emulate Intel perfctr MSRs

Older linux guests (in this case, 2.6.9) can attempt to
access the performance counter MSRs without a fixup section, and injecting
a GPF kills the guest.  Work around by allowing the guest to write those MSRs.

Tested by me on RHEL-4 i386 and x86_64 guests, as well as F-9 guests.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: Fix a wrong usage of vmcs_config
Sheng Yang [Wed, 18 Jun 2008 06:43:38 +0000 (14:43 +0800)]
KVM: VMX: Fix a wrong usage of vmcs_config

The function ept_update_paging_mode_cr0() write to
CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's
wrong because the variable may not consistent with the content in the
CPU_BASE_VM_EXEC_CONTROL MSR.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: Fix printk format
Avi Kivity [Sun, 22 Jun 2008 13:46:22 +0000 (16:46 +0300)]
KVM: MMU: Fix printk format

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: When debug is enabled, make it a run-time parameter
Avi Kivity [Sun, 22 Jun 2008 13:45:24 +0000 (16:45 +0300)]
KVM: MMU: When debug is enabled, make it a run-time parameter

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: lazily evaluate segment registers
Avi Kivity [Sun, 22 Jun 2008 13:22:51 +0000 (16:22 +0300)]
KVM: x86 emulator: lazily evaluate segment registers

Instead of prefetching all segment bases before emulation, read them at the
last moment.  Since most of them are unneeded, we save some cycles on
Intel machines where this is a bit expensive.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: avoid segment base adjust for lea
Avi Kivity [Mon, 16 Jun 2008 05:45:54 +0000 (22:45 -0700)]
KVM: x86 emulator: avoid segment base adjust for lea

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: simplify rip relative decoding
Avi Kivity [Mon, 16 Jun 2008 05:09:11 +0000 (22:09 -0700)]
KVM: x86 emulator: simplify rip relative decoding

rip relative decoding is relative to the instruction pointer of the next
instruction; by moving address adjustment until after decoding is complete,
we remove the need to determine the instruction size.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: simplify r/m decoding
Avi Kivity [Mon, 16 Jun 2008 04:53:26 +0000 (21:53 -0700)]
KVM: x86 emulator: simplify r/m decoding

Consolidate the duplicated code when not in any special case.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: simplify sib decoding
Avi Kivity [Mon, 16 Jun 2008 04:23:17 +0000 (21:23 -0700)]
KVM: x86 emulator: simplify sib decoding

Instead of using sparse switches, use simpler if/else sequences.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain cases
Avi Kivity [Mon, 16 Jun 2008 04:13:41 +0000 (21:13 -0700)]
KVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain cases

x86_64 does not decode rex.b in certain cases, where the r/m field = 5.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)
Mohammed Gamal [Sun, 15 Jun 2008 16:37:38 +0000 (19:37 +0300)]
KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Use printk_rlimit() instead of reporting emulation failures just once
Avi Kivity [Fri, 13 Jun 2008 19:45:42 +0000 (22:45 +0300)]
KVM: Use printk_rlimit() instead of reporting emulation failures just once

Emulation failure reports are useful, so allow more than one per the lifetime
of the module.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Support mixed endian machines
Tan, Li [Fri, 23 May 2008 06:54:09 +0000 (14:54 +0800)]
KVM: Support mixed endian machines

Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.

Signed-off-by: Tan Li <li.tan@intel.com>
Acked-by: Jerone Young <jyoung5@us.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Do not calculate linear rip in emulation failure report
Glauber Costa [Tue, 10 Jun 2008 13:46:53 +0000 (10:46 -0300)]
KVM: Do not calculate linear rip in emulation failure report

If we're not gonna do anything (case in which failure is already
reported), we do not need to even bother with calculating the linear rip.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: only abort guest entry if timer count goes from 0->1
Marcelo Tosatti [Wed, 11 Jun 2008 22:52:53 +0000 (19:52 -0300)]
KVM: only abort guest entry if timer count goes from 0->1

Only abort guest entry if the timer count went from 0->1, since for 1->2
or larger the bit will either be set already or a timer irq will have
been injected.

Using atomic_inc_and_test() for it also introduces an SMP barrier
to the LAPIC version (thought it was unecessary because of timer
migration, but guest can be scheduled to a different pCPU between exit
and kvm_vcpu_block(), so there is the possibility for a race).

Noticed by Avi.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Add coalesced MMIO support (ia64 part)
Laurent Vivier [Fri, 30 May 2008 14:05:57 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (ia64 part)

This patch enables coalesced MMIO for ia64 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

[akpm: fix compile error on ia64]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Add coalesced MMIO support (powerpc part)
Laurent Vivier [Fri, 30 May 2008 14:05:56 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (powerpc part)

This patch enables coalesced MMIO for powerpc architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Add coalesced MMIO support (x86 part)
Laurent Vivier [Fri, 30 May 2008 14:05:55 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (x86 part)

This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Add coalesced MMIO support (common part)
Laurent Vivier [Fri, 30 May 2008 14:05:54 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (common part)

This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.

Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:

- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
  It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
  a memory area where MMIOs can be coalesced until the next switch to
  user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
  the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.

After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.

[akio: fix oops during guest shutdown]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: kvm_io_device: extend in_range() to manage len and write attribute
Laurent Vivier [Fri, 30 May 2008 14:05:53 +0000 (16:05 +0200)]
KVM: kvm_io_device: extend in_range() to manage len and write attribute

Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).

This modification allows to use kvm_io_device with coalesced MMIO.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: Avoid page prefetch on SVM
Avi Kivity [Thu, 29 May 2008 11:56:28 +0000 (14:56 +0300)]
KVM: MMU: Avoid page prefetch on SVM

SVM cannot benefit from page prefetching since guest page fault bypass
cannot by made to work there.  Avoid accessing the guest page table in
this case.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: Move nonpaging_prefetch_page()
Avi Kivity [Thu, 29 May 2008 11:55:03 +0000 (14:55 +0300)]
KVM: MMU: Move nonpaging_prefetch_page()

In preparation for next patch. No code change.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: implement 'push imm' (opcode 0x68)
Avi Kivity [Thu, 29 May 2008 11:38:38 +0000 (14:38 +0300)]
KVM: x86 emulator: implement 'push imm' (opcode 0x68)

Encountered in FC6 boot sequence, now that we don't force ss.rpl = 0 during
the protected mode transition.  Not really necessary, but nice to have.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: simplify push imm8 emulation
Avi Kivity [Thu, 29 May 2008 11:26:29 +0000 (14:26 +0300)]
KVM: x86 emulator: simplify push imm8 emulation

Instead of fetching the data explicitly, use SrcImmByte.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: Optimize prefetch_page()
Avi Kivity [Thu, 29 May 2008 11:20:16 +0000 (14:20 +0300)]
KVM: MMU: Optimize prefetch_page()

Instead of reading each pte individually, read 256 bytes worth of ptes and
batch process them.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: Add support for mov r, sreg (0x8c) instruction
Guillaume Thouvenin [Tue, 27 May 2008 13:13:28 +0000 (15:13 +0200)]
KVM: x86 emulator: Add support for mov r, sreg (0x8c) instruction

Add support for mov r, sreg (0x8c) instruction

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: Add support for mov seg, r (0x8e) instruction
Guillaume Thouvenin [Tue, 27 May 2008 12:49:15 +0000 (14:49 +0200)]
KVM: x86 emulator: Add support for mov seg, r (0x8e) instruction

Add support for mov r, sreg (0x8c) instruction.

[avi: drop the sreg decoding table in favor of 1:1 encoding]

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instruction
Guillaume Thouvenin [Tue, 27 May 2008 08:19:16 +0000 (10:19 +0200)]
KVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instruction

Add support to mov r, imm (0xb8) instruction.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: add support for jmp far 0xea
Guillaume Thouvenin [Tue, 27 May 2008 08:19:08 +0000 (10:19 +0200)]
KVM: x86 emulator: add support for jmp far 0xea

Add support for jmp far (opcode 0xea) instruction.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: x86 emulator: Update c->dst.bytes in decode instruction
Guillaume Thouvenin [Tue, 27 May 2008 08:22:20 +0000 (10:22 +0200)]
KVM: x86 emulator: Update c->dst.bytes in decode instruction

Update c->dst.bytes in decode instruction instead of instruction
itself.  It's needed because if c->dst.bytes is equal to 0, the
instruction is not emulated.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Prefixes segment functions that will be exported with "kvm_"
Guillaume Thouvenin [Tue, 27 May 2008 08:18:46 +0000 (10:18 +0200)]
KVM: Prefixes segment functions that will be exported with "kvm_"

Prefixes functions that will be exported with kvm_.
We also prefixed set_segment() even if it still static
to be coherent.

signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MTRR support
Avi Kivity [Mon, 26 May 2008 17:06:35 +0000 (20:06 +0300)]
KVM: MTRR support

Add emulation for the memory type range registers, needed by VMware esx 3.5,
and by pci device assignment.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Order segment register constants in the same way as cpu operand encoding
Avi Kivity [Tue, 27 May 2008 13:26:01 +0000 (16:26 +0300)]
KVM: Order segment register constants in the same way as cpu operand encoding

This can be used to simplify the x86 instruction decoder.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: Enable NMI with in-kernel irqchip
Sheng Yang [Thu, 15 May 2008 10:23:25 +0000 (18:23 +0800)]
KVM: VMX: Enable NMI with in-kernel irqchip

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: IOAPIC/LAPIC: Enable NMI support
Sheng Yang [Thu, 15 May 2008 01:52:48 +0000 (09:52 +0800)]
KVM: IOAPIC/LAPIC: Enable NMI support

[avi: fix ia64 build breakage]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Remove unnecessary ->decache_regs() call
Avi Kivity [Sun, 25 May 2008 11:38:15 +0000 (14:38 +0300)]
KVM: Remove unnecessary ->decache_regs() call

Since we aren't modifying any register, there's no need to decache
the register state.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Remove decache_vcpus_on_cpu() and related callbacks
Avi Kivity [Tue, 13 May 2008 13:29:20 +0000 (16:29 +0300)]
KVM: Remove decache_vcpus_on_cpu() and related callbacks

Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: Add list of potentially locally cached vcpus
Avi Kivity [Tue, 13 May 2008 13:22:47 +0000 (16:22 +0300)]
KVM: VMX: Add list of potentially locally cached vcpus

VMX hardware can cache the contents of a vcpu's vmcs.  This cache needs
to be flushed when migrating a vcpu to another cpu, or (which is the case
that interests us here) when disabling hardware virtualization on a cpu.

The current implementation of decaching iterates over the list of all vcpus,
picks the ones that are potentially cached on the cpu that is being offlined,
and flushes the cache.  The problem is that it uses mutex_trylock() to gain
exclusive access to the vcpu, which fires off a (benign) warning about using
the mutex in an interrupt context.

To avoid this, and to make things generally nicer, add a new per-cpu list
of potentially cached vcus.  This makes the decaching code much simpler.  The
list is vmx-specific since other hardware doesn't have this issue.

[andrea: fix crash on suspend/resume]

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Handle virtualization instruction #UD faults during reboot
Avi Kivity [Tue, 13 May 2008 10:23:38 +0000 (13:23 +0300)]
KVM: Handle virtualization instruction #UD faults during reboot

KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.

Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: MMU: Fix false flooding when a pte points to page table
Avi Kivity [Thu, 15 May 2008 10:51:35 +0000 (13:51 +0300)]
KVM: MMU: Fix false flooding when a pte points to page table

The KVM MMU tries to detect when a speculative pte update is not actually
used by demand fault, by checking the accessed bit of the shadow pte.  If
the shadow pte has not been accessed, we deem that page table flooded and
remove the shadow page table, allowing further pte updates to proceed
without emulation.

However, if the pte itself points at a page table and only used for write
operations, the accessed bit will never be set since all access will happen
through the emulator.

This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels.
The kernel points a kmap_atomic() pte at a page table, and then
proceeds with read-modify-write operations to look at the dirty and accessed
bits.  We get a false flood trigger on the kmap ptes, which results in the
mmu spending all its time setting up and tearing down shadows.

Fix by setting the shadow accessed bit on emulated accesses.

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: Trivial vmcs_write64() code simplification
Avi Kivity [Mon, 12 May 2008 16:25:43 +0000 (19:25 +0300)]
KVM: VMX: Trivial vmcs_write64() code simplification

Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: SVM: Fake MSR_K7 performance counters
Chris Lalancette [Mon, 5 May 2008 17:05:16 +0000 (13:05 -0400)]
KVM: SVM: Fake MSR_K7 performance counters

Attached is a patch that fixes a guest crash when booting older Linux kernels.
The problem stems from the fact that we are currently emulating
MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3].  Because of this,
setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to
write into MSR_K7_PERFCTR, which causes an OOPs.

The patch fixes it by just "fake" emulating the appropriate MSRs, throwing
away the data in the process.  This causes the NMI watchdog to not actually
work, but it's not such a big deal in a virtualized environment.

When we get a write to one of these counters, we printk_ratelimit() a warning.
I decided to print it out for all writes, even if the data is 0; it doesn't
seem to make sense to me to special case when data == 0.

Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit
guest.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: PIT: support mode 3
Aurelien Jarno [Fri, 2 May 2008 15:02:23 +0000 (17:02 +0200)]
KVM: PIT: support mode 3

The in-kernel PIT emulation ignores pending timers if operating
under mode 3, which for example Hurd uses.

This mode should output a square wave, high for (N+1)/2 counts and low
for (N-1)/2 counts. As we only care about the resulting interrupts, the
period is N, and mode 3 is the same as mode 2 with regard to
interrupts.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: Handle vma regions with no backing page
Anthony Liguori [Wed, 30 Apr 2008 20:37:07 +0000 (15:37 -0500)]
KVM: Handle vma regions with no backing page

This patch allows VMAs that contain no backing page to be used for guest
memory.  This is useful for assigning mmio regions to a guest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: SVM: add tracing support for TDP page faults
Joerg Roedel [Wed, 30 Apr 2008 15:56:04 +0000 (17:56 +0200)]
KVM: SVM: add tracing support for TDP page faults

To distinguish between real page faults and nested page faults they should be
traced as different events. This is implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: SVM: add missing kvmtrace markers
Joerg Roedel [Wed, 30 Apr 2008 15:56:03 +0000 (17:56 +0200)]
KVM: SVM: add missing kvmtrace markers

This patch adds the missing kvmtrace markers to the svm
module of kvm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: add missing kvmtrace bits
Joerg Roedel [Wed, 30 Apr 2008 15:56:02 +0000 (17:56 +0200)]
KVM: add missing kvmtrace bits

This patch adds some kvmtrace bits to the generic x86 code
where it is instrumented from SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: SVM: implement dedicated INTR exit handler
Joerg Roedel [Wed, 30 Apr 2008 15:56:01 +0000 (17:56 +0200)]
KVM: SVM: implement dedicated INTR exit handler

With an exit handler for INTR intercepts its possible to account them using
kvmtrace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: SVM: implement dedicated NMI exit handler
Joerg Roedel [Wed, 30 Apr 2008 15:56:00 +0000 (17:56 +0200)]
KVM: SVM: implement dedicated NMI exit handler

With an exit handler for NMI intercepts its possible to account them using
kvmtrace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: VMX: move APIC_ACCESS trace entry to generic code
Joerg Roedel [Wed, 30 Apr 2008 15:55:59 +0000 (17:55 +0200)]
KVM: VMX: move APIC_ACCESS trace entry to generic code

This patch moves the trace entry for APIC accesses from the VMX code to the
generic lapic code. This way APIC accesses from SVM will also be traced.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: add statics were possible, function definition in lapic.h
Harvey Harrison [Sun, 27 Apr 2008 19:14:13 +0000 (12:14 -0700)]
KVM: add statics were possible, function definition in lapic.h

Noticed by sparse:
arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static?
arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static?
arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static?
arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static?
arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static?
arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static?
arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static?
arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoKVM: remove long -> void *user -> long cast
Christian Borntraeger [Mon, 21 Apr 2008 11:48:24 +0000 (13:48 +0200)]
KVM: remove long -> void *user -> long cast

kvm_dev_ioctl casts the arg value to void __user *, just to recast it
again to long. This seems unnecessary.

According to objdump the binary code on x86 is unchanged by this patch.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
15 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Thu, 17 Jul 2008 17:55:51 +0000 (10:55 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [PATCH] ocfs2: fix oops in mmap_truncate testing
  configfs: call drop_link() to cleanup after create_link() failure
  configfs: Allow ->make_item() and ->make_group() to return detailed errors.
  configfs: Fix failing mkdir() making racing rmdir() fail
  configfs: Fix deadlock with racing rmdir() and rename()
  configfs: Make configfs_new_dirent() return error code instead of NULL
  configfs: Protect configfs_dirent s_links list mutations
  configfs: Introduce configfs_dirent_lock
  ocfs2: Don't snprintf() without a format.
  ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs
  ocfs2/net: Silence build warnings on sparc64
  ocfs2: Handle error during journal load
  ocfs2: Silence an error message in ocfs2_file_aio_read()
  ocfs2: use simple_read_from_buffer()
  ocfs2: fix printk format warnings with OCFS2_FS_STATS=n
  [PATCH 2/2] ocfs2: Instrument fs cluster locks
  [PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-fixes-2.6
Linus Torvalds [Thu, 17 Jul 2008 17:55:07 +0000 (10:55 -0700)]
Merge git://git./linux/kernel/git/brodo/pcmcia-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-fixes-2.6:
  pcmcia: ide-cs: Remove outdated comment
  pcmcia: fix cisinfo_t removal
  pcmcia: fix return value in cm4000_cs.c

15 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 17 Jul 2008 17:38:59 +0000 (10:38 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix asm/e820.h for userspace inclusion
  x86: fix numaq_tsc_disable
  x86: fix kernel_physical_mapping_init() for large x86 systems

15 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 17 Jul 2008 17:37:10 +0000 (10:37 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: do not trace library functions
  ftrace: do not trace scheduler functions
  ftrace: fix lockup with MAXSMP
  ftrace: fix merge buglet

15 years agox86: fix asm/e820.h for userspace inclusion
Rusty Russell [Tue, 15 Jul 2008 05:02:27 +0000 (15:02 +1000)]
x86: fix asm/e820.h for userspace inclusion

asm-x86/e820.h is included from userspace.  'x86: make e820.c to have
common functions' (b79cd8f1268bab57ff85b19d131f7f23deab2dee) broke it:

make -C Documentation/lguest
cc -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include
lguest.c  -lz -o lguest
In file included from ../../include/asm-x86/bootparam.h:8,
                 from lguest.c:45:
../../include/asm/e820.h:66: error: expected â€˜)’ before â€˜start’
../../include/asm/e820.h:67: error: expected â€˜)’ before â€˜start’
../../include/asm/e820.h:68: error: expected â€˜)’ before â€˜start’
../../include/asm/e820.h:72: error: expected â€˜=’, â€˜,’, â€˜;’, â€˜asm’
or â€˜__attribute__’ before â€˜e820_update_range’
...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: fix numaq_tsc_disable
Yinghai Lu [Tue, 15 Jul 2008 06:29:01 +0000 (23:29 -0700)]
x86: fix numaq_tsc_disable

fix:

 arch/x86/kernel/numaq_32.c: In function â€˜numaq_tsc_disable’:
 arch/x86/kernel/numaq_32.c:99: warning: â€˜return’ with a value, in function returning void

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'linus' into x86/urgent
Ingo Molnar [Thu, 17 Jul 2008 17:24:56 +0000 (19:24 +0200)]
Merge branch 'linus' into x86/urgent

15 years agofix build error of arch/ia64/kvm/*
Takashi Iwai [Thu, 17 Jul 2008 16:09:12 +0000 (18:09 +0200)]
fix build error of arch/ia64/kvm/*

Fix calls of smp_call_function*() in arch/ia64/kvm for recent API
changes.

    CC [M]  arch/ia64/kvm/kvm-ia64.o
  arch/ia64/kvm/kvm-ia64.c: In function 'handle_global_purge':
  arch/ia64/kvm/kvm-ia64.c:398: error: too many arguments to function 'smp_call_function_single'
  arch/ia64/kvm/kvm-ia64.c: In function 'kvm_vcpu_kick':
  arch/ia64/kvm/kvm-ia64.c:1696: error: too many arguments to function 'smp_call_function_single'

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'ptrace-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/frob...
Linus Torvalds [Thu, 17 Jul 2008 16:15:23 +0000 (09:15 -0700)]
Merge branch 'ptrace-cleanup' of git://git./linux/kernel/git/frob/linux-2.6-utrace

* 'ptrace-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace:
  fix dangling zombie when new parent ignores children
  do_wait: return security_task_wait() error code in place of -ECHILD
  ptrace children revamp
  do_wait reorganization

15 years agoUpdate scripts/Makefile.fwinst to cope with older make
David Woodhouse [Thu, 17 Jul 2008 06:44:32 +0000 (23:44 -0700)]
Update scripts/Makefile.fwinst to cope with older make

Also fix unwanted rebuilds of the firmware/ihex2fw tool by including
the .ihex2fw.cmd file when present.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reported-and-tested-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Thu, 17 Jul 2008 16:05:38 +0000 (09:05 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] dasd: use -EOPNOTSUPP instead of -ENOTSUPP
  [S390] qdio: new qdio driver.
  [S390] cio: Export chsc_error_from_response().
  [S390] vmur: Fix return code handling.
  [S390] Fix stacktrace compile bug.
  [S390] Increase default warning stacksize.
  [S390] dasd: Fix cleanup in dasd_{fba,diag}_check_characteristics().
  [S390] chsc headers userspace cleanup
  [S390] dasd: fix unsolicited SIM handling.
  [S390] zfcpdump: Make SCSI disk dump tool recognize storage holes

15 years agoFix collateral damage to top level Makefile
Grant Likely [Thu, 17 Jul 2008 07:06:55 +0000 (01:06 -0600)]
Fix collateral damage to top level Makefile

The patch named "powerpc/mpc5121: Add clock driver", also contained
an unrelated and bogus change to the top-level makefile.  This patch
backs out the bad bit.

SHA1 of offending patch: 137e95906e294913fab02162e8a1948ade49acb5)

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Repented-by: John Rigby <jrigby@freescale.com>
[ Heh. Normally I pick these out from the diffstats, but I guess
  I've grown to trust the ppc tree too much ;)   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoftrace: do not trace library functions
Ingo Molnar [Thu, 17 Jul 2008 15:40:48 +0000 (17:40 +0200)]
ftrace: do not trace library functions

make function tracing more robust: do not trace library functions.

We've already got a sizable list of exceptions:

 ifdef CONFIG_FTRACE
 # Do not profile string.o, since it may be used in early boot or vdso
 CFLAGS_REMOVE_string.o = -pg
 # Also do not profile any debug utilities
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
 CFLAGS_REMOVE_find_next_bit.o = -pg
 CFLAGS_REMOVE_cpumask.o = -pg
 CFLAGS_REMOVE_bitmap.o = -pg
 endif

... and the pattern has been that random library functionality showed
up in ftrace's critical path (outside of its recursion check), causing
hard to debug lockups.

So be a bit defensive about it and exclude all lib/*.o functions by
default. It's not that they are overly interesting for tracing purposes
anyway. Specific ones can still be traced, in an opt-in manner.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoftrace: do not trace scheduler functions
Ingo Molnar [Tue, 15 Apr 2008 20:39:31 +0000 (22:39 +0200)]
ftrace: do not trace scheduler functions

do not trace scheduler functions - it's still a bit fragile
and can lock up with:

  http://redhat.com/~mingo/misc/config-Thu_Jul_17_13_34_52_CEST_2008

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoftrace: fix lockup with MAXSMP
Ingo Molnar [Thu, 17 Jul 2008 15:38:17 +0000 (17:38 +0200)]
ftrace: fix lockup with MAXSMP

MAXSMP brings in lots of use of various bitops in smp_processor_id()
and friends - causing ftrace to lock up during bootup:

  calling  anon_inode_init+0x0/0x130
  initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
  calling  acpi_event_init+0x0/0x57
  [ hard hang ]

So exclude the bitops facilities from tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years ago[S390] dasd: use -EOPNOTSUPP instead of -ENOTSUPP
Stefan Haberland [Thu, 17 Jul 2008 15:16:49 +0000 (17:16 +0200)]
[S390] dasd: use -EOPNOTSUPP instead of -ENOTSUPP

return value -ENOTSUPP is not valid in userspace context, use
-EOPNOTSUPP instead

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
15 years ago[S390] qdio: new qdio driver.
Jan Glauber [Thu, 17 Jul 2008 15:16:48 +0000 (17:16 +0200)]
[S390] qdio: new qdio driver.

List of major changes:
- split qdio driver into several files
- seperation of thin interrupt code
- improved handling for multiple thin interrupt devices
- inbound and outbound processing now always runs in tasklet context
- significant less tasklet schedules per interrupt needed
- merged qebsm with non-qebsm handling
- cleanup qdio interface and added kerneldoc
- coding style

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
15 years ago[S390] cio: Export chsc_error_from_response().
Cornelia Huck [Thu, 17 Jul 2008 15:16:47 +0000 (17:16 +0200)]
[S390] cio: Export chsc_error_from_response().

Make chsc_error_from_response() available to chsc callers outside
of chsc.c (namely qdio) to avoid duplicating error checking code.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
15 years ago[S390] vmur: Fix return code handling.
Frank Munzert [Thu, 17 Jul 2008 15:16:46 +0000 (17:16 +0200)]
[S390] vmur: Fix return code handling.

Use -EOPNOTSUPP instead of -ENOTSUPP.

Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] Fix stacktrace compile bug.
Heiko Carstens [Thu, 17 Jul 2008 15:16:45 +0000 (17:16 +0200)]
[S390] Fix stacktrace compile bug.

Add missing module.h include to fix this:

  CC      arch/s390/kernel/stacktrace.o
arch/s390/kernel/stacktrace.c:84: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:84: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:84: warning: parameter names (without types) in function declaration
arch/s390/kernel/stacktrace.c:97: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:97: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:97: warning: parameter names (without types) in function declaration

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] Increase default warning stacksize.
Heiko Carstens [Thu, 17 Jul 2008 15:16:44 +0000 (17:16 +0200)]
[S390] Increase default warning stacksize.

Compiling a kernel with allmodconfig or allyesconfig results in tons
of gcc warnings, because the default maximum stacksize from which on
gcc will emit a warning is just 256 bytes.
Increase this to 2048, so these warnings don't distract from the real
warnings that we need to watch at.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] dasd: Fix cleanup in dasd_{fba,diag}_check_characteristics().
Cornelia Huck [Thu, 17 Jul 2008 15:16:43 +0000 (17:16 +0200)]
[S390] dasd: Fix cleanup in dasd_{fba,diag}_check_characteristics().

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] chsc headers userspace cleanup
Adrian Bunk [Thu, 17 Jul 2008 15:16:42 +0000 (17:16 +0200)]
[S390] chsc headers userspace cleanup

Kernel headers shouldn't expose functions to userspace.

Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] dasd: fix unsolicited SIM handling.
Stefan Haberland [Thu, 17 Jul 2008 15:16:41 +0000 (17:16 +0200)]
[S390] dasd: fix unsolicited SIM handling.

Add missing schedule_bh and check that there is 32 bit sense data.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years ago[S390] zfcpdump: Make SCSI disk dump tool recognize storage holes
Frank Munzert [Thu, 17 Jul 2008 15:16:40 +0000 (17:16 +0200)]
[S390] zfcpdump: Make SCSI disk dump tool recognize storage holes

The kernel part of zfcpdump establishes a new debugfs file zcore/memmap
which exports information on memory layout (start address and length of each
memory chunk) to its userspace counterpart.

Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
15 years agoftrace: fix merge buglet
Ingo Molnar [Thu, 17 Jul 2008 11:26:50 +0000 (13:26 +0200)]
ftrace: fix merge buglet

-tip testing found a bootup hang here:

  initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
  calling  acpi_event_init+0x0/0x57

the bootup should have continued with:

  initcall acpi_event_init+0x0/0x57 returned 0 after 45 msecs

but it hung hard there instead.

bisection led to this commit:

| commit 5806b81ac1c0c52665b91723fd4146a4f86e386b
| Merge: d14c8a6... 6712e29...
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Mon Jul 14 16:11:52 2008 +0200
|     Merge branch 'auto-ftrace-next' into tracing/for-linus

turns out that i made this mistake in the merge:

  ifdef CONFIG_FTRACE
  # Do not profile debug utilities
  CFLAGS_REMOVE_tsc_64.o = -pg
  CFLAGS_REMOVE_tsc_32.o = -pg

those two files got unified meanwhile - so the dont-profile annotation
got lost. The proper rule is:

  CFLAGS_REMOVE_tsc.o = -pg

i guess this could have been caught sooner if the CFLAGS_REMOVE* kbuild
rule aborted the build if it met a target that does not exist anymore?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agofix dangling zombie when new parent ignores children
Roland McGrath [Wed, 9 Apr 2008 06:12:30 +0000 (23:12 -0700)]
fix dangling zombie when new parent ignores children

This fixes an arcane bug that we think was a regression introduced
by commit b2b2cbc4b2a2f389442549399a993a8306420baf.  When a parent
ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap
but they don't if it's using ptrace on them.  When the parent thread
later exits and ceases to ptrace a child but leaves other live
threads in the parent's thread group, any zombie children are left
dangling.  The fix makes them self-reap then, as they would have
done earlier if ptrace had not been in use.

Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agodo_wait: return security_task_wait() error code in place of -ECHILD
Roland McGrath [Mon, 31 Mar 2008 01:41:25 +0000 (18:41 -0700)]
do_wait: return security_task_wait() error code in place of -ECHILD

This reverts the effect of commit f2cc3eb133baa2e9dc8efd40f417106b2ee520f3
"do_wait: fix security checks".  That change reverted the effect of commit
73243284463a761e04d69d22c7516b2be7de096c.  The rationale for the original
commit still stands.  The inconsistent treatment of children hidden by
ptrace was an unintended omission in the original change and in no way
invalidates its purpose.

This makes do_wait return the error returned by security_task_wait()
(usually -EACCES) in place of -ECHILD when there are some children the
caller would be able to wait for if not for the permission failure.  A
permission error will give the user a clue to look for security policy
problems, rather than for mysterious wait bugs.

Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agoptrace children revamp
Roland McGrath [Tue, 25 Mar 2008 01:36:23 +0000 (18:36 -0700)]
ptrace children revamp

ptrace no longer fiddles with the children/sibling links, and the
old ptrace_children list is gone.  Now ptrace, whether of one's own
children or another's via PTRACE_ATTACH, just uses the new ptraced
list instead.

There should be no user-visible difference that matters.  The only
change is the order in which do_wait() sees multiple stopped
children and stopped ptrace attachees.  Since wait_task_stopped()
was changed earlier so it no longer reorders the children list, we
already know this won't cause any new problems.

Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agodo_wait reorganization
Roland McGrath [Thu, 20 Mar 2008 02:24:59 +0000 (19:24 -0700)]
do_wait reorganization

This breaks out the guts of do_wait into three subfunctions.
The control flow is less nonobvious without so much goto.
do_wait_thread and ptrace_do_wait contain the main work of the outer loop.
wait_consider_task contains the main work of the inner loop.

Signed-off-by: Roland McGrath <roland@redhat.com>
15 years agoscsi_dh: Verify "dev" is a sdev before accessing it.
Chandra Seetharaman [Thu, 17 Jul 2008 00:35:08 +0000 (17:35 -0700)]
scsi_dh: Verify "dev" is a sdev before accessing it.

Before accessing the device data structure in hardware handlers,
make sure it is a indeed a sdev device.

Yinghai Lu <yhlu.kernel@gmail.com> found the bug on Jul 16, 2008,
and later tested/verified the following fix.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes...
Linus Torvalds [Thu, 17 Jul 2008 00:25:46 +0000 (17:25 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
  Revert "x86/PCI: ACPI based PCI gap calculation"
  PCI: remove unnecessary volatile in PCIe hotplug struct controller
  x86/PCI: ACPI based PCI gap calculation
  PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
  PCI PM: Fix pci_prepare_to_sleep
  x86/PCI: Fix PCI config space for domains > 0
  Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
  PCI: Simplify PCI device PM code
  PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
  PCI ACPI: Rework PCI handling of wake-up
  ACPI: Introduce new device wakeup flag 'prepared'
  ACPI: Introduce acpi_device_sleep_wake function
  PCI: rework pci_set_power_state function to call platform first
  PCI: Introduce platform_pci_power_manageable function
  ACPI: Introduce acpi_bus_power_manageable function
  PCI: make pci_name use dev_name
  PCI: handle pci_name() being const
  PCI: add stub for pci_set_consistent_dma_mask()
  PCI: remove unused arch pcibios_update_resource() functions
  PCI: fix pci_setup_device()'s sprinting into a const buffer
  ...

Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.

15 years agoRevert "x86/PCI: ACPI based PCI gap calculation"
Jesse Barnes [Wed, 16 Jul 2008 23:21:47 +0000 (16:21 -0700)]
Revert "x86/PCI: ACPI based PCI gap calculation"

This reverts commit 809d9a8f93bd8504dcc34b16bbfdfd1a8c9bb1ed.

This one isn't quite ready for prime time.  It needs more testing and
additional feedback from the ACPI guys.

15 years ago[PATCH] ocfs2: fix oops in mmap_truncate testing
Coly Li [Mon, 30 Jun 2008 10:45:45 +0000 (18:45 +0800)]
[PATCH] ocfs2: fix oops in mmap_truncate testing

This patch fixes a mmap_truncate bug which was found by ocfs2 test suite.

In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races
mmap writes and truncates from multiple processes. While the test is
running, a stat from another node forces writeout, causing an oops in
ocfs2_get_block() because it sees a buffer to write which isn't allocated.

This patch fixed the bug by clear dirty and uptodate bits in buffer, leave
the buffer unmapped and return.

Fix is suggested by Mark Fasheh, and I code up the patch.

Signed-off-by: Coly Li <coyli@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
Linus Torvalds [Wed, 16 Jul 2008 22:11:07 +0000 (15:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/drzeus/mmc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (68 commits)
  sdio_uart: Fix SDIO break control to now return success or an error
  mmc: host driver for Ricoh Bay1Controllers
  sdio: sdio_io.c Fix sparse warnings
  sdio: fix the use of hard coded timeout value.
  mmc: OLPC: update vdd/powerup quirk comment
  mmc: fix spares errors of sdhci.c
  mmc: remove multiwrite capability
  wbsd: fix bad dma_addr_t conversion
  atmel-mci: Driver for Atmel on-chip MMC controllers
  mmc: fix sdio_io sparse errors
  mmc: wbsd.c fix shadowing of 'dma' variable
  MMC: S3C24XX: Refuse incorrectly aligned transfers
  MMC: S3C24XX: Add maintainer entry
  MMC: S3C24XX: Update error debugging.
  MMC: S3C24XX: Add media presence test to request handling.
  MMC: S3C24XX: Fix use of msecs where jiffies are needed
  MMC: S3C24XX: Add MODULE_ALIAS() entries for the platform devices
  MMC: S3C24XX: Fix s3c2410_dma_request() return code check.
  MMC: S3C24XX: Allow card-detect on non-IRQ capable pin
  MMC: S3C24XX: Ensure host->mrq->data is valid
  ...

Manually fixed up bogus executable bits on drivers/mmc/core/sdio_io.c
and include/linux/mmc/sdio_func.h when merging.

15 years agoMerge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
Linus Torvalds [Wed, 16 Jul 2008 22:02:57 +0000 (15:02 -0700)]
Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6

* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
Linus Torvalds [Wed, 16 Jul 2008 21:53:54 +0000 (14:53 -0700)]
Merge git://git./linux/kernel/git/bart/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits)
  IDE: Report errors during drive reset back to user space
  Update documentation of HDIO_DRIVE_RESET ioctl
  IDE: Remove unused code
  IDE: Fix HDIO_DRIVE_RESET handling
  hd.c: remove the #include <linux/mc146818rtc.h>
  update the BLK_DEV_HD help text
  move ide/legacy/hd.c to drivers/block/
  ide/legacy/hd.c: use late_initcall()
  remove BLK_DEV_HD_ONLY
  ide: endian annotations in ide-floppy.c
  ide-floppy: zero out the whole struct ide_atapi_pc on init
  ide-floppy: fold idefloppy_create_test_unit_ready_cmd into idefloppy_open
  ide-cd: move request prep chunk from cdrom_do_newpc_cont to rq issue path
  ide-cd: move request prep from cdrom_start_rw_cont to rq issue path
  ide-cd: move request prep from cdrom_start_seek_continuation to rq issue path
  ide-cd: fold cdrom_start_seek into ide_cd_do_request
  ide-cd: simplify request issuing path
  ide-cd: mv ide_do_rw_cdrom ide_cd_do_request
  ide-cd: cdrom_start_seek: remove unused argument block
  ide-cd: ide_do_rw_cdrom: add the catch-all bad request case to the if-else block
  ...

15 years agoMerge branch 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak...
Linus Torvalds [Wed, 16 Jul 2008 21:52:12 +0000 (14:52 -0700)]
Merge branch 'release-2.6.27' of git://git./linux/kernel/git/ak/linux-acpi-merge-2.6

* 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-merge-2.6: (87 commits)
  Fix FADT parsing
  Add the ability to reset the machine using the RESET_REG in ACPI's FADT table.
  ACPI: use dev_printk when possible
  PNPACPI: add support for HP vendor-specific CCSR descriptors
  PNP: avoid legacy IDE IRQs
  PNP: convert resource options to single linked list
  ISAPNP: handle independent options following dependent ones
  PNP: remove extra 0x100 bit from option priority
  PNP: support optional IRQ resources
  PNP: rename pnp_register_*_resource() local variables
  PNPACPI: ignore _PRS interrupt numbers larger than PNP_IRQ_NR
  PNP: centralize resource option allocations
  PNP: remove redundant pnp_can_configure() check
  PNP: make resource assignment functions return 0 (success) or -EBUSY (failure)
  PNP: in debug resource dump, make empty list obvious
  PNP: improve resource assignment debug
  PNP: increase I/O port & memory option address sizes
  PNP: introduce pnp_irq_mask_t typedef
  PNP: make resource option structures private to PNP subsystem
  PNP: define PNP-specific IORESOURCE_IO_* flags alongside IRQ, DMA, MEM
  ...

15 years agoblock: Trivial fix for blk_integrity_rq()
Martin K. Petersen [Wed, 16 Jul 2008 20:09:06 +0000 (16:09 -0400)]
block: Trivial fix for blk_integrity_rq()

Fail integrity check gracefully when request does not have a bio
attached (BLOCK_PC).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
Linus Torvalds [Wed, 16 Jul 2008 21:49:49 +0000 (14:49 -0700)]
Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6

* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (82 commits)
  NFSv4: Remove BKL from the nfsv4 state recovery
  SUNRPC: Remove the BKL from the callback functions
  NFS: Remove BKL from the readdir code
  NFS: Remove BKL from the symlink code
  NFS: Remove BKL from the sillydelete operations
  NFS: Remove the BKL from the rename, rmdir and unlink operations
  NFS: Remove BKL from NFS lookup code
  NFS: Remove the BKL from nfs_link()
  NFS: Remove the BKL from the inode creation operations
  NFS: Remove BKL usage from open()
  NFS: Remove BKL usage from the write path
  NFS: Remove the BKL from the permission checking code
  NFS: Remove attribute update related BKL references
  NFS: Remove BKL requirement from attribute updates
  NFS: Protect inode->i_nlink updates using inode->i_lock
  nfs: set correct fl_len in nlmclnt_test()
  SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
  SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
  SUNRPC: None of rpcb_create's callers wants a privileged source port
  SUNRPC: Introduce a specific rpcb_create for contacting localhost
  ...

15 years agoFix FADT parsing
Jan Beulich [Wed, 16 Jul 2008 21:27:08 +0000 (23:27 +0200)]
Fix FADT parsing

The (1.0 inherited) separate length fields in the FADT are byte granular.
Further, PM1a/b may have distinct lengths and live in distinct address spaces.
 acpi_tb_convert_fadt() should account for all of these conditions.

Apart from these changes I'm puzzled by the fact that, not just for
acpi_gbl_xpm1{a,b}_enable, acpi_hw_low_level_{read,write}() get an explicit
size passed rather than using the size found in the passed GAS.  What happens
on a platform that defines PM1{a,b} wider than 16 bits?  Of course,
acpi_hw_low_level_{read,write}() at present are entirely un-prepared to deal
with sizes other than 8, 16, or 32, not to speak of a non-zero bit_offset or
access_width...

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 years agoAdd the ability to reset the machine using the RESET_REG in ACPI's FADT table.
Aaron Durbin [Wed, 16 Jul 2008 21:27:08 +0000 (23:27 +0200)]
Add the ability to reset the machine using the RESET_REG in ACPI's FADT table.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 years agoACPI: use dev_printk when possible
Bjorn Helgaas [Fri, 27 Jun 2008 14:45:39 +0000 (08:45 -0600)]
ACPI: use dev_printk when possible

Convert printks to use dev_printk().  The most obvious change will
be messages like this:

   -ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 31 (level, low) -> IRQ 31
   +cciss 0000:00:04.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
15 years agoPNPACPI: add support for HP vendor-specific CCSR descriptors
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:19 +0000 (16:57 -0600)]
PNPACPI: add support for HP vendor-specific CCSR descriptors

The HP CCSR descriptor describes MMIO address space that should appear
as a MEM resource.  This patch adds support for parsing these descriptors
in the _CRS data.

The visible effect of this is that these MEM resources will appear
in /sys/devices/pnp0/.../resources, which means that "lspnp -v" will
report it, user applications can use this to locate device CSR space,
and kernel drivers can use the normal PNP resource accessors to
locate them.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
15 years agoPNP: avoid legacy IDE IRQs
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:18 +0000 (16:57 -0600)]
PNP: avoid legacy IDE IRQs

If an IDE controller is in compatibility mode, it expects to use
IRQs 14 and 15, so PNP should avoid them.

This patch should resolve this problem report:
  parallel driver grabs IRQ14 preventing legacy SFF ATA controller from working
  https://bugzilla.novell.com/show_bug.cgi?id=375836

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
15 years agoPNP: convert resource options to single linked list
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:17 +0000 (16:57 -0600)]
PNP: convert resource options to single linked list

ISAPNP, PNPBIOS, and ACPI describe the "possible resource settings" of
a device, i.e., the possibilities an OS bus driver has when it assigns
I/O port, MMIO, and other resources to the device.

PNP used to maintain this "possible resource setting" information in
one independent option structure and a list of dependent option
structures for each device.  Each of these option structures had lists
of I/O, memory, IRQ, and DMA resources, for example:

  dev
    independent options
      ind-io0  -> ind-io1  ...
      ind-mem0 -> ind-mem1 ...
      ...
    dependent option set 0
      dep0-io0  -> dep0-io1  ...
      dep0-mem0 -> dep0-mem1 ...
      ...
    dependent option set 1
      dep1-io0  -> dep1-io1  ...
      dep1-mem0 -> dep1-mem1 ...
      ...
    ...

This data structure was designed for ISAPNP, where the OS configures
device resource settings by writing directly to configuration
registers.  The OS can write the registers in arbitrary order much
like it writes PCI BARs.

However, for PNPBIOS and ACPI devices, the OS uses firmware interfaces
that perform device configuration, and it is important to pass the
desired settings to those interfaces in the correct order.  The OS
learns the correct order by using firmware interfaces that return the
"current resource settings" and "possible resource settings," but the
option structures above doesn't store the ordering information.

This patch replaces the independent and dependent lists with a single
list of options.  For example, a device might have possible resource
settings like this:

  dev
    options
      ind-io0 -> dep0-io0 -> dep1->io0 -> ind-io1 ...

All the possible settings are in the same list, in the order they
come from the firmware "possible resource settings" list.  Each entry
is tagged with an independent/dependent flag.  Dependent entries also
have a "set number" and an optional priority value.  All dependent
entries must be assigned from the same set.  For example, the OS can
use all the entries from dependent set 0, or all the entries from
dependent set 1, but it cannot mix entries from set 0 with entries
from set 1.

Prior to this patch PNP didn't keep track of the order of this list,
and it assigned all independent options first, then all dependent
ones.  Using the example above, that resulted in a "desired
configuration" list like this:

  ind->io0 -> ind->io1 -> depN-io0 ...

instead of the list the firmware expects, which looks like this:

  ind->io0 -> depN-io0 -> ind-io1 ...

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>