safe/jmp/linux-2.6
13 years agowriteback: fix mixed up arguments to bdi_start_writeback()
Jens Axboe [Fri, 21 May 2010 18:01:54 +0000 (20:01 +0200)]
writeback: fix mixed up arguments to bdi_start_writeback()

The laptop mode timer had the nr_pages and sb_locked arguments
mixed up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: fix problem with !CONFIG_BLOCK compilation
Jens Axboe [Thu, 20 May 2010 07:18:47 +0000 (09:18 +0200)]
writeback: fix problem with !CONFIG_BLOCK compilation

When CONFIG_BLOCK isn't enabled:

mm/page-writeback.c: In function 'laptop_mode_timer_fn':
mm/page-writeback.c:708: error: dereferencing pointer to incomplete type
mm/page-writeback.c:709: error: dereferencing pointer to incomplete type

Fix this by essentially eliminating the laptop sync handlers when
CONFIG_BLOCK isn't set, as most are only used from the block layer code.
The exception is laptop_sync_completion() which is used from sys_sync(),
make that an empty declaration in that case.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: improve automatic native capacity unlocking
Tejun Heo [Sat, 15 May 2010 18:09:31 +0000 (20:09 +0200)]
block: improve automatic native capacity unlocking

Currently, native capacity unlocking is initiated only when a
recognized partition extends beyond the end of the disk.  However,
there are several other unhandled cases where truncated capacity can
lead to misdetection of partitions.

* Partition table is fully beyond EOD.

* Partition table is partially beyond EOD (daisy chained ones).

* Recognized partition starts beyond EOD.

This patch updates generic partition check code such that all the
above three cases are handled too.  For the first two, @state tracks
whether low level partition check code tried to read beyond EOD during
partition scan and triggers native capacity unlocking accordingly.
The third is now handled similarly to the original unlocking case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: use struct parsed_partitions *state universally in partition check code
Tejun Heo [Sat, 15 May 2010 18:09:30 +0000 (20:09 +0200)]
block: use struct parsed_partitions *state universally in partition check code

Make the following changes to partition check code.

* Add ->bdev to struct parsed_partitions.

* Introduce read_part_sector() which is a simple wrapper around
  read_dev_sector() which takes struct parsed_partitions *state
  instead of @bdev.

* For functions which used to take @state and @bdev, drop @bdev.  For
  functions which used to take @bdev, replace it with @state.

* While updating, drop superflous checks on NULL state/bdev in ldm.c.

This cleans up the API a bit and enables better handling of IO errors
during partition check as the generic partition check code now has
much better visibility into what went wrong in the low level code
paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()
Tejun Heo [Sat, 15 May 2010 18:09:29 +0000 (20:09 +0200)]
block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()

bdops->set_capacity() is unnecessarily generic.  All that's required
is a simple one way notification to lower level driver telling it to
try to unlock native capacity.  There's no reason to pass in target
capacity or return the new capacity.  The former is always the
inherent native capacity and the latter can be handled via the usual
device resize / revalidation path.  In fact, the current API is always
used that way.

Replace ->set_capacity() with ->unlock_native_capacity() which take
only @disk and doesn't return anything.  IDE which is the only current
user of the API is converted accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: restart partition scan after resizing a device
Tejun Heo [Sat, 15 May 2010 18:09:28 +0000 (20:09 +0200)]
block: restart partition scan after resizing a device

Device resize via ->set_capacity() can reveal new partitions (e.g. in
chained partition table formats such as dos extended parts).  Restart
partition scan from the beginning after resizing a device.  This
change also makes libata always revalidate the disk after resize which
makes lower layer native capacity unlocking implementation simpler and
more robust as resize can be handled in the usual path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agobuffer: make invalidate_bdev() drain all percpu LRU add caches
Tejun Heo [Sat, 15 May 2010 18:09:27 +0000 (20:09 +0200)]
buffer: make invalidate_bdev() drain all percpu LRU add caches

invalidate_bdev() should release all page cache pages which are clean
and not being used; however, if some pages are still in the percpu LRU
add caches on other cpus, those pages are considered in used and don't
get released.  Fix it by calling lru_add_drain_all() before trying to
invalidate pages.

This problem was discovered while testing block automatic native
capacity unlocking.  Null pages which were read before automatic
unlocking didn't get released by invalidate_bdev() and ended up
interfering with partition scan after unlocking.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: remove all rcu head initializations
Paul E. McKenney [Wed, 19 May 2010 06:27:30 +0000 (08:27 +0200)]
block: remove all rcu head initializations

Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: fixups for !dirty_writeback_centisecs
Jens Axboe [Fri, 21 May 2010 18:00:35 +0000 (20:00 +0200)]
writeback: fixups for !dirty_writeback_centisecs

Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.

bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: bdi_writeback_task() must set task state before calling schedule()
Jens Axboe [Tue, 18 May 2010 12:31:45 +0000 (14:31 +0200)]
writeback: bdi_writeback_task() must set task state before calling schedule()

Calling schedule without setting the task state to non-running will
return immediately, so ensure that we set it properly and check our
sleep conditions after doing so.

This is a fixup for commit 69b62d01.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
Jens Axboe [Tue, 18 May 2010 12:29:29 +0000 (14:29 +0200)]
writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync

Even if the writeout itself isn't a data integrity operation, we need
to ensure that the caller doesn't drop the sb umount sem before we
have actually done the writeback.

This is a fixup for commit e913fc82.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agodrivers/block/drbd: Use kzalloc
Julia Lawall [Thu, 13 May 2010 20:02:21 +0000 (22:02 +0200)]
drivers/block/drbd: Use kzalloc

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Create new current UUID as late as possible
Philipp Reisner [Mon, 17 May 2010 14:10:43 +0000 (16:10 +0200)]
drbd: Create new current UUID as late as possible

The choice was to either delay creation of the new UUID until
IO got thawed or to delay it until the first IO request.

Both are correct, the later is more friendly to users of
dual-primary setups, that actually only write on one side.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: If we detect late that IO got frozen, retry after we thawed.
Philipp Reisner [Mon, 10 May 2010 14:42:23 +0000 (16:42 +0200)]
drbd: If we detect late that IO got frozen, retry after we thawed.

If we detect late (= after grabing mdev->req_lock) that IO got frozen, we
return 1 to generic_make_request(), which simply will retry to make a
request for that bio.

In the subsequent call of generic_make_request() into drbd_make_request_26()
we sleep in inc_ap_bio().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: always use_bmbv, ignore setting
Lars Ellenberg [Fri, 14 May 2010 17:16:41 +0000 (19:16 +0200)]
drbd: always use_bmbv, ignore setting

Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: allow resync requests to be larger than max_segment_size
Lars Ellenberg [Fri, 14 May 2010 17:08:55 +0000 (19:08 +0200)]
drbd: allow resync requests to be larger than max_segment_size

this should allow for better background resync performance.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Allow drbd_epoch_entries to use multiple bios.
Lars Ellenberg [Fri, 14 May 2010 15:10:48 +0000 (17:10 +0200)]
drbd: Allow drbd_epoch_entries to use multiple bios.
This should allow for better performance if the lower level IO stack
of the peers differs in limits exposed either via the queue,
or via some merge_bvec_fn.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning members
Lars Ellenberg [Mon, 3 May 2010 08:38:57 +0000 (10:38 +0200)]
drbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning members

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixes to the new delay_probes code
Philipp Reisner [Thu, 6 May 2010 13:19:30 +0000 (15:19 +0200)]
drbd: Fixes to the new delay_probes code

* Only send delay_probes with protocol 93 or newer
* drbd_send_delay_probes() is called only from worker context,
  no atomic_t needed for delay_seq

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: A fixes to the new resync speed code
Philipp Reisner [Wed, 5 May 2010 18:53:33 +0000 (20:53 +0200)]
drbd: A fixes to the new resync speed code

* Mention P_DELAY_PROBE in the packet naming array
* Do not corrupt the mdev->data.work list in case the timer goes
  off before delay_probe_work got handled by the worker
* Do not mod_timer() twice for a single delay_probe pair

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Proc bits of new resync speed stuff
Philipp Reisner [Tue, 4 May 2010 14:31:03 +0000 (16:31 +0200)]
drbd: Proc bits of new resync speed stuff

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Control the actual resync rate based on the queuing delay of data packets
Philipp Reisner [Tue, 4 May 2010 14:57:18 +0000 (16:57 +0200)]
drbd: Control the actual resync rate based on the queuing delay of data packets

In a setup with a high bandwidth and high latency network, eventually
involving deep queues in routers, it is beneficial to only fill those
queues up to an limited extend with resync data.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Actually send delay probes
Philipp Reisner [Tue, 4 May 2010 10:33:58 +0000 (12:33 +0200)]
drbd: Actually send delay probes

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Four new configuration settings for resync speed control
Philipp Reisner [Tue, 4 May 2010 09:12:00 +0000 (11:12 +0200)]
drbd: Four new configuration settings for resync speed control

To reasonably control resync speed over drbd-proxy connections,
drbd has to measure the current delay of packets transmitted over
the (possibly congested) data socket vs the meta-data socket.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Sending of delay_probes
Philipp Reisner [Mon, 3 May 2010 13:10:47 +0000 (15:10 +0200)]
drbd: Sending of delay_probes

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Receiving of delay_probes
Philipp Reisner [Fri, 30 Apr 2010 13:26:20 +0000 (15:26 +0200)]
drbd: Receiving of delay_probes

Delay_probes are new packets in the DRBD protocol, which allow
DRBD to know the current delay packets have on the data socket.
(relative to the meta data socket)

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixed bitmap in case of online-grow without resync
Philipp Reisner [Wed, 28 Apr 2010 12:46:57 +0000 (14:46 +0200)]
drbd: Fixed bitmap in case of online-grow without resync

The "surplus" bits of the old (smaller) bitmap must be clean
in case of online-grow without resync.

Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as
well, since the lines added by this patch are redundant. The
bits get set by the bm_set_surplus(b) call before that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Added transmission faults to the fault injection code
Philipp Reisner [Mon, 26 Apr 2010 12:11:45 +0000 (14:11 +0200)]
drbd: Added transmission faults to the fault injection code

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: bugfix: Make resize work, if remote's size was limiting and increased in the...
Philipp Reisner [Fri, 26 Mar 2010 12:49:56 +0000 (13:49 +0100)]
drbd: bugfix: Make resize work, if remote's size was limiting and increased in the meantime

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Implemented the --assume-clean option for drbdsetup resize
Philipp Reisner [Wed, 24 Mar 2010 15:07:04 +0000 (16:07 +0100)]
drbd: Implemented the --assume-clean option for drbdsetup resize

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Added some missing statics
Philipp Reisner [Thu, 1 Apr 2010 07:57:40 +0000 (09:57 +0200)]
drbd: Added some missing statics

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Make sure to resync all of the new storage upon online resize
Philipp Reisner [Thu, 1 Apr 2010 07:57:40 +0000 (09:57 +0200)]
drbd: Make sure to resync all of the new storage upon online resize

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Implemented flags for the resize packet
Philipp Reisner [Wed, 24 Mar 2010 16:11:33 +0000 (17:11 +0100)]
drbd: Implemented flags for the resize packet

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Implemented the set_new_bits parameter for drbd_bm_resize()
Philipp Reisner [Wed, 24 Mar 2010 15:23:03 +0000 (16:23 +0100)]
drbd: Implemented the set_new_bits parameter for drbd_bm_resize()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: made determin_dev_size's parameter an flag enum
Philipp Reisner [Wed, 24 Mar 2010 14:51:26 +0000 (15:51 +0100)]
drbd: made determin_dev_size's parameter an flag enum

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: New handler: initial-split-brain
Adam Gandelman [Thu, 8 Apr 2010 23:48:23 +0000 (16:48 -0700)]
drbd: New handler: initial-split-brain

Some wish to be notified of all instances of split brain, not just those that
go unresolved.  The initial-split-brain handler is called to notify someone
upon  detection of all split brain conditions even if auto-recovery policies
are configured.

Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fail_requests_early: remove incorrect and unnecessary optimization
Lars Ellenberg [Tue, 6 Apr 2010 12:15:06 +0000 (14:15 +0200)]
drbd: fail_requests_early: remove incorrect and unnecessary optimization

The condition does not fit the commend (I may well be Primary,
even if I lost the disk earlier and now the connection).

And this is catched below anyways, where it also gets logged.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: check for corrupt or malicous sector addresses when receiving data
Lars Ellenberg [Tue, 6 Apr 2010 10:15:04 +0000 (12:15 +0200)]
drbd: check for corrupt or malicous sector addresses when receiving data

Even if it should never happen if the peer does behave, we need to
double check, and not even attempt access beyond end of device.
It usually would be caught by lower layers, resulting in "IO error",
but may also end up in the internal meta data area.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: cleanup: This code path to trigger a resync is no longer needed
Philipp Reisner [Thu, 1 Apr 2010 07:57:40 +0000 (09:57 +0200)]
drbd: cleanup: This code path to trigger a resync is no longer needed

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: don't start a resync without access to up-to-date Data
Lars Ellenberg [Thu, 1 Apr 2010 14:59:32 +0000 (16:59 +0200)]
drbd: don't start a resync without access to up-to-date Data

In case both nodes are "inconsistent", invalidate would
have started a resync anyways, without a chance to ever
succeed, just filling the logs with warning messages.

Simply disallow that state change,
re-using the SS_NO_UP_TO_DATE_DISK return value.

This also changes the corresponding error string to
"Need access to UpToDate Data" -- I found the
"Refusing to be Primary without at least one UpToDate disk"
answer misleading in some situations anyways.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix potential protocol error
Lars Ellenberg [Thu, 1 Apr 2010 14:57:19 +0000 (16:57 +0200)]
drbd: fix potential protocol error

Don't forget to drain the digest in case we cannot satisfy a
checksum based resync or online-verify request.

It would additionally cause a protocoll error,
dropping the connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: remove bogus ASSERT
Lars Ellenberg [Thu, 1 Apr 2010 14:55:18 +0000 (16:55 +0200)]
drbd: remove bogus ASSERT

block_id may be ID_SYNCER,
as well as checksum based resync request magic, or online verify magic.

Let's just drop that ASSERT.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix regression: attach while connected failed
Lars Ellenberg [Thu, 1 Apr 2010 13:13:19 +0000 (15:13 +0200)]
drbd: fix regression: attach while connected failed

commit e4f925e12ea5daaa9baf2dd5af9c4951721dae95
Author: Philipp Reisner <philipp.reisner@linbit.com>
Date:   Wed Mar 17 14:18:41 2010 +0100

    drbd: Do not upgrade state to Outdated if already Inconsistent

prevented the necessary state transition for attaching while connected
(Diskless -> Consistent respectively Outdated).
This is the fix for the fix.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not upgrade state to Outdated if already Inconsistent [Bugz 277]
Philipp Reisner [Wed, 17 Mar 2010 13:18:41 +0000 (14:18 +0100)]
drbd: Do not upgrade state to Outdated if already Inconsistent [Bugz 277]

There was a race condition:
  In a situation with a SyncSource+Primary and a SyncTarget+Secondary node,
  and a resync dependency to some other device. After both nodes decided
  to do the resync, the other device finishes its resync process.
  At that time SyncSource already sent the P_SYNC_UUID packet, and
  already updated its peer disk state to Inconsistent.
  The SyncTarget node waits for the P_SYNC_UUID and sends a state packet
  to report the resync dependency change. That packet still carries
  a disk state of Outdated.

Impact:
  If application writes come in, during that time on the Primary node,
  those do not get replicated, and the out-of-sync counter gets increased.
  => The completion of resync is not detected on the primary node.
  => stalled.
  Those blocks get resync'ed with the next resync, since the are get
  marked as out-of-sync in the bitmap.

In order to fix this, we filter out that wrong state change in the
sanitize_state() function.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: use proc_create_data with explicit NULL argument
Lars Ellenberg [Thu, 11 Mar 2010 15:47:58 +0000 (16:47 +0100)]
drbd: use proc_create_data with explicit NULL argument

To document that we know about deprecation of proc_create,
even though we are not affected, as we don't use the ->data member,
open code proc_create_data(..., NULL);

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agowriteback: Update dirty flags in two steps
Dmitry Monakhov [Fri, 7 May 2010 09:35:44 +0000 (13:35 +0400)]
writeback: Update dirty flags in two steps

Filesystems with delalloc support may dirty inode during writepages.
As result inode will have dirty metadata flags even after write_inode.
In fact we have two dedicated functions for proper data and metadata
writeback. It is reasonable to separate flags updates in two stages.

https://bugzilla.kernel.org/show_bug.cgi?id=15906

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: fix WB_SYNC_NONE writeback from umount
Jens Axboe [Mon, 17 May 2010 10:55:07 +0000 (12:55 +0200)]
writeback: fix WB_SYNC_NONE writeback from umount

When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agowriteback: disable periodic old data writeback for !dirty_writeback_centisecs
Jens Axboe [Mon, 17 May 2010 10:51:03 +0000 (12:51 +0200)]
writeback: disable periodic old data writeback for !dirty_writeback_centisecs

Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().

Cc: stable@kernel.org
Reported-by: Justin Maggard <jmaggard10@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoparide: fix menu indentation
Randy Dunlap [Tue, 11 May 2010 07:02:55 +0000 (09:02 +0200)]
paride: fix menu indentation

Make the PARIDE menu be displayed correctly, with proper/expected
indentation, by moving the GDROM kconfig symbol, which was
splitting the PARIDE kconfig symbol from its dependent symbols.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: allow initialization of previously allocated request_queue
Mike Snitzer [Tue, 11 May 2010 06:57:42 +0000 (08:57 +0200)]
block: allow initialization of previously allocated request_queue

blk_init_queue() allocates the request_queue structure and then
initializes it as needed (request_fn, elevator, etc).

Split initialization out to blk_init_allocated_queue_node.
Introduce blk_init_allocated_queue wrapper function to model existing
blk_init_queue and blk_init_queue_node interfaces.

Export elv_register_queue to allow a newly added elevator to be
registered with sysfs.  Export elv_unregister_queue for symmetry.

These changes allow DM to initialize a device's request_queue with more
precision.  In particular, DM no longer unconditionally initializes a
full request_queue (elevator et al).  It only does so for a
request-based DM device.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoblock: kill some useless goto's in blk-cgroup.c
Jens Axboe [Mon, 3 May 2010 12:28:55 +0000 (14:28 +0200)]
block: kill some useless goto's in blk-cgroup.c

goto has its place, but lets cut back on some of the more
frivolous uses of it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoMerge branch 'master' into for-2.6.35
Jens Axboe [Thu, 29 Apr 2010 07:36:24 +0000 (09:36 +0200)]
Merge branch 'master' into for-2.6.35

Conflicts:
fs/block_dev.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agonilfs: fix breakage caused by barrier flag changes
Stephen Rothwell [Thu, 29 Apr 2010 07:32:00 +0000 (09:32 +0200)]
nilfs: fix breakage caused by barrier flag changes

After merging the block tree, today's linux-next build (powerpc ppc64_defconfig)
failed like this:

fs/nilfs2/the_nilfs.c: In function 'nilfs_discard_segments':
fs/nilfs2/the_nilfs.c:673: error: 'DISCARD_FL_BARRIER' undeclared (first use in this function)

Caused by commit fbd9b09a177a481eda256447c881f014f29034fe ("blkdev:
generalize flags for blkdev_issue_fn functions") interacting with commit
e902ec9906e844f4613fa6190c6fa65f162dc86e ("nilfs2: issue discard request
after cleaning segments") (which netered Linus' tree on about March 4 -
before v2.6.34-rc1).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoblock: fix bad use of min() on different types
Jens Axboe [Thu, 29 Apr 2010 07:28:21 +0000 (09:28 +0200)]
block: fix bad use of min() on different types

Just cast the page size to sector_t, that will always fit.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes...
Linus Torvalds [Thu, 29 Apr 2010 03:40:17 +0000 (20:40 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: compute Address Space length rather than using _LEN
  x86/PCI: never allocate PCI MMIO resources below BIOS_END

14 years agonfs d_revalidate() is too trigger-happy with d_drop()
Al Viro [Thu, 29 Apr 2010 02:10:43 +0000 (03:10 +0100)]
nfs d_revalidate() is too trigger-happy with d_drop()

If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.

Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
Linus Torvalds [Wed, 28 Apr 2010 20:37:31 +0000 (13:37 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lrg/voltage-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  regulator: fix enabling regulator issue on max8925

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Wed, 28 Apr 2010 20:37:06 +0000 (13:37 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  sfc: Change falcon_probe_board() to fail for unsupported boards
  sfc: Always close net device at the end of a disabling reset
  sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
  sctp: Fix oops when sending queued ASCONF chunks
  sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
  sctp: per_cpu variables should be in bh_disabled section
  sctp: fix potential reference of a freed pointer
  sctp: avoid irq lock inversion while call sk->sk_data_ready()
  Revert "tcp: bind() fix when many ports are bound"
  net/usb: add sierra_net.c driver
  cdc_ether: fix autosuspend for mbm devices
  bluetooth: handle l2cap_create_connless_pdu() errors
  gianfar: Wait for both RX and TX to stop
  ipheth: potential null dereferences on error path
  smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()
  drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"
  bnx2: Update version to 2.0.9.
  bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
  bnx2: Fix lost MSI-X problem on 5709 NICs.
  cxgb3: Wait longer for control packets on initialization
  ...

14 years agosfc: Change falcon_probe_board() to fail for unsupported boards
Ben Hutchings [Wed, 28 Apr 2010 09:01:50 +0000 (09:01 +0000)]
sfc: Change falcon_probe_board() to fail for unsupported boards

The driver needs specific PHY and board support code for each SFC4000
board; there is no point trying to continue if it is missing.
Currently unsupported boards can trigger an 'oops'.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Always close net device at the end of a disabling reset
Ben Hutchings [Wed, 28 Apr 2010 09:01:33 +0000 (09:01 +0000)]
sfc: Always close net device at the end of a disabling reset

This fixes a regression introduced by commit
eb9f6744cbfa97674c13263802259b5aa0034594 "sfc: Implement ethtool
reset operation".

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Wait at most 10ms for the MC to finish reading out MAC statistics
Ben Hutchings [Wed, 28 Apr 2010 09:00:35 +0000 (09:00 +0000)]
sfc: Wait at most 10ms for the MC to finish reading out MAC statistics

The original code would wait indefinitely if MAC stats DMA failed.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: Fix oops when sending queued ASCONF chunks
Vlad Yasevich [Wed, 28 Apr 2010 08:47:22 +0000 (08:47 +0000)]
sctp: Fix oops when sending queued ASCONF chunks

When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF.  This action runs the sctp state
machine recursively and it's not prepared to do so.

kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
 c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
 [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
 [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
 [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
 [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
 [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
 [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
 [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
 [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
 [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
 [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
 [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]

Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
Wei Yongjun [Wed, 28 Apr 2010 08:47:21 +0000 (08:47 +0000)]
sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set

When calculating the INIT/INIT-ACK chunk length, we should not
only account the length of parameters, but also the parameters
zero padding length, such as AUTH HMACS parameter and CHUNKS
parameter. Without the parameters zero padding length we may get
following oops.

skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:127!
invalid opcode: 0000 [#2] SMP
last sysfs file: /sys/module/aes_generic/initstate
Modules linked in: authenc ......

Pid: 4102, comm: sctp_darn Tainted: G      D    2.6.34-rc2 #6
EIP: 0060:[<c0607630>] EFLAGS: 00010282 CPU: 0
EIP is at skb_over_panic+0x37/0x3e
EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78
ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000)
Stack:
 c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80
<0> c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a
<0> cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400
Call Trace:
 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
 [<c0608dec>] ? skb_put+0x2e/0x32
 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
 [<ce208a25>] ? sctp_make_init+0x279/0x28c [sctp]
 [<c0686a92>] ? apic_timer_interrupt+0x2a/0x30
 [<ce1fdc0b>] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp]
 [<ce202823>] ? sctp_do_sm+0xa0/0x14a [sctp]
 [<ce2133b9>] ? sctp_pname+0x0/0x14 [sctp]
 [<ce211d72>] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp]
 [<ce20f3cf>] ? sctp_sendmsg+0x7a0/0x9eb [sctp]
 [<c064eb1e>] ? inet_sendmsg+0x3b/0x43
 [<c04244b7>] ? task_tick_fair+0x2d/0xd9
 [<c06031e1>] ? sock_sendmsg+0xa7/0xc1
 [<c0416afe>] ? smp_apic_timer_interrupt+0x6b/0x75
 [<c0425123>] ? dequeue_task_fair+0x34/0x19b
 [<c0446abb>] ? sched_clock_local+0x17/0x11e
 [<c052ea87>] ? _copy_from_user+0x2b/0x10c
 [<c060ab3a>] ? verify_iovec+0x3c/0x6a
 [<c06035ca>] ? sys_sendmsg+0x186/0x1e2
 [<c042176b>] ? __wake_up_common+0x34/0x5b
 [<c04240c2>] ? __wake_up+0x2c/0x3b
 [<c057e35c>] ? tty_wakeup+0x43/0x47
 [<c04430f2>] ? remove_wait_queue+0x16/0x24
 [<c0580c94>] ? n_tty_read+0x5b8/0x65e
 [<c042be02>] ? default_wake_function+0x0/0x8
 [<c0604e0e>] ? sys_socketcall+0x17f/0x1cd
 [<c040264c>] ? sysenter_do_call+0x12/0x22
Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ......
EIP: [<c0607630>] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74

To reproduce:

# modprobe sctp
# echo 1 > /proc/sys/net/sctp/addip_enable
# echo 1 > /proc/sys/net/sctp/auth_enable
# sctp_test -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 800 -l
# sctp_darn -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t
sctp_darn ready to send...
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10

------------------------------------------------------------------
eth0 has addresses: 3ffe:501:ffff:100:20c:29ff:fe4d:f37e and 192.168.0.21
eth1 has addresses: 192.168.1.21
------------------------------------------------------------------

Reported-by: George Cheimonidis <gchimon@gmail.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: per_cpu variables should be in bh_disabled section
Vlad Yasevich [Wed, 28 Apr 2010 08:47:20 +0000 (08:47 +0000)]
sctp: per_cpu variables should be in bh_disabled section

Since the change of the atomics to percpu variables, we now
have to disable BH in process context when touching percpu variables.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: fix potential reference of a freed pointer
Vlad Yasevich [Wed, 28 Apr 2010 08:47:19 +0000 (08:47 +0000)]
sctp: fix potential reference of a freed pointer

When sctp attempts to update an assocition, it removes any
addresses that were not in the updated INITs.  However, the loop
may attempt to refrence a transport with address after removing it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: avoid irq lock inversion while call sk->sk_data_ready()
Wei Yongjun [Wed, 28 Apr 2010 08:47:18 +0000 (08:47 +0000)]
sctp: avoid irq lock inversion while call sk->sk_data_ready()

sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
 (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (slock-AF_INET){+.-...}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
1 lock held by sctp_darn/1517:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoRevert "tcp: bind() fix when many ports are bound"
David S. Miller [Wed, 28 Apr 2010 18:25:59 +0000 (11:25 -0700)]
Revert "tcp: bind() fix when many ports are bound"

This reverts two commits:

fda48a0d7a8412cedacda46a9c0bf8ef9cd13559
tcp: bind() fix when many ports are bound

and a follow-on fix for it:

6443bb1fc2050ca2b6585a3fa77f7833b55329ed
ipv6: Fix inet6_csk_bind_conflict()

It causes problems with binding listening sockets when time-wait
sockets from a previous instance still are alive.

It's too late to keep fiddling with this so late in the -rc
series, and we'll deal with it in net-next-2.6 instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocfq-iosched: fix broken cfq_ref_get_cfqf() for CONFIG_BLK_CGROUP=y && CFQ_GROUP_IOSCHED=n
Dmitry Monakhov [Wed, 28 Apr 2010 17:50:33 +0000 (19:50 +0200)]
cfq-iosched: fix broken cfq_ref_get_cfqf() for CONFIG_BLK_CGROUP=y && CFQ_GROUP_IOSCHED=n

We should return the cfq_group for this case, not NULL.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoblkdev: add blkdev_issue_zeroout helper function
Dmitry Monakhov [Wed, 28 Apr 2010 13:55:09 +0000 (17:55 +0400)]
blkdev: add blkdev_issue_zeroout helper function

- Add bio_batch helper primitive. This is rather generic primitive
  for submitting/waiting a complex request which consists of several
  bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoblkdev: move blkdev_issue helper functions to separate file
Dmitry Monakhov [Wed, 28 Apr 2010 13:55:08 +0000 (17:55 +0400)]
blkdev: move blkdev_issue helper functions to separate file

Move blkdev_issue_discard from blk-barrier.c because it is
not barrier related.
Later the file will be populated by other helpers.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoblkdev: allow async blkdev_issue_flush requests
Dmitry Monakhov [Wed, 28 Apr 2010 13:55:07 +0000 (17:55 +0400)]
blkdev: allow async blkdev_issue_flush requests

In some places caller don't want to wait a request to complete.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agoblkdev: generalize flags for blkdev_issue_fn functions
Dmitry Monakhov [Wed, 28 Apr 2010 13:55:06 +0000 (17:55 +0400)]
blkdev: generalize flags for blkdev_issue_fn functions

The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agox86/PCI: compute Address Space length rather than using _LEN
Bjorn Helgaas [Tue, 27 Apr 2010 20:45:43 +0000 (14:45 -0600)]
x86/PCI: compute Address Space length rather than using _LEN

ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN.  Linux has
been computing Address Spaces as [_MIN to _MIN + _LEN - 1].  Based on the
tests in the bug reports below, Windows apparently uses [_MIN to _MAX].

Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
doesn't matter which way we compute the end.  But of course, there are
BIOSes that don't follow this rule, and we're better off if Linux handles
those exceptions the same way as Windows.

This patch makes Linux use [_MIN to _MAX], as Windows seems to do.  This
effectively reverts d558b483d5 and 03db42adfe and replaces them with
simpler code.

    https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
    https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
14 years agoMerge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
Linus Torvalds [Wed, 28 Apr 2010 14:58:36 +0000 (07:58 -0700)]
Merge branch 'urgent' of git://git./linux/kernel/git/brodo/pcmcia-2.6

* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: fix matching rules for pseudo-multi-function cards
  pcmcia: pcmcia_dev_present bugfix

14 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Wed, 28 Apr 2010 14:56:05 +0000 (07:56 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  coda: move backing-dev.h kernel include inside __KERNEL__
  mtd: ensure that bdi entries are properly initialized and registered
  Move mtd_bdi_*mappable to mtdcore.c
  btrfs: convert to using bdi_setup_and_register()
  Catch filesystems lacking s_bdi
  drbd: Terminate a connection early if sending the protocol fails
  drbd: fix memory leak
  Fix JFFS2 sync silent failure
  smbfs: add bdi backing to mount session
  ncpfs: add bdi backing to mount session
  exofs: add bdi backing to mount session
  ecryptfs: add bdi backing to mount session
  coda: add bdi backing to mount session
  cifs: add bdi backing to mount session
  afs: add bdi backing to mount session.
  9p: add bdi backing to mount session
  bdi: add helper function for doing init and register of a bdi for a file system
  block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
Linus Torvalds [Wed, 28 Apr 2010 14:55:35 +0000 (07:55 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  watchdog: booke_wdt: fix build - unconstify watchdog_info
  watchdog: sbc_fitpc2_wdt: fixed "scheduling while atomic" bug.
  watchdog: sbc_fitpc2_wdt: fixed I/O operations order
  Watchdog: sb_wdog.c: Fix sibyte watchdog initialization

14 years agoregulator: fix enabling regulator issue on max8925
Haojian Zhuang [Tue, 6 Apr 2010 10:19:15 +0000 (06:19 -0400)]
regulator: fix enabling regulator issue on max8925

Fix regulator enabling issue that is caused by typo error in is_enabled().

Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
14 years agocoda: move backing-dev.h kernel include inside __KERNEL__
Jens Axboe [Wed, 28 Apr 2010 07:20:33 +0000 (09:20 +0200)]
coda: move backing-dev.h kernel include inside __KERNEL__

Otherwise we must export backing-dev.h as well, which doesn't make
any sense.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
14 years agonet/usb: add sierra_net.c driver
Elina Pasheva [Wed, 28 Apr 2010 01:06:41 +0000 (18:06 -0700)]
net/usb: add sierra_net.c driver

Re-submitted based on comments from netdev community.
Summary of the changes:
1. Improved error handling.
2. Added the missing timeout arguments to usb_control_msg().

The following is a new Linux driver which exposes certain models of Sierra
Wireless modems to the operating system as Network Interface Cards (NICs).

This driver requires a version of the sierra.c driver which supports
blacklisting to work properly. The blacklist in sierra.c rejects the interfaces
claimed by sierra_net.c. Likewise, the sierra_net.c driver only accepts
(i.e. whitelists) the interface(s) used for USB-to-WWAN traffic.
The version of sierra.c which supports blacklisting is
available from the sierra wireless knowledge base page for older kernels. It is
also available in Linux kernel starting from version 2.6.31.

This driver works with all Sierra Wireless devices configured with PID=68A3
like USB305, USB306 provided the corresponding firmware version is I2.0
(for USB305) or M3.0 (for USB306) and later.
This driver will not work with earlier firmware versions than the ones shown
above. In this case the driver will issue an error message indicating
incompatibility and will not serve the device's USB-to-WWAN interface.

Sierra_net.c sits atop a pre-existing Linux driver called usbnet.c.
A series of hook functions are provided in sierra_net.c which are called by
usbnet.c in response to a particular condition such as receipt or transmission
of a data packet. As such, usbnet.c does most of the work of making
a modem appear to the system as a network device and for properly exchanging
traffic between the USB subsystem and the Network card interface.
Sierra_net.c is concerned with managing the data exchanged between the
USB-to-WWAN interface and the upper layers of the operating system.

Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com>
Signed-off-by: Rory Filer <rfiler@sierrawireless.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocdc_ether: fix autosuspend for mbm devices
Torgny Johansson [Wed, 28 Apr 2010 00:07:40 +0000 (17:07 -0700)]
cdc_ether: fix autosuspend for mbm devices

Autosuspend works until you bring the wwan interface up, then the
device does not enter autosuspend anymore.

The following patch fixes the problem by setting the .manage_power
field in the mbm_info struct to the same as in the cdc_info struct
(cdc_manager_power).

Signed-off-by: Torgny Johansson <torgny.johansson@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobluetooth: handle l2cap_create_connless_pdu() errors
Dan Carpenter [Wed, 21 Apr 2010 23:52:01 +0000 (23:52 +0000)]
bluetooth: handle l2cap_create_connless_pdu() errors

l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
ERR_PTR(-EFAULT).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agogianfar: Wait for both RX and TX to stop
Andy Fleming [Tue, 27 Apr 2010 23:43:31 +0000 (16:43 -0700)]
gianfar: Wait for both RX and TX to stop

When gracefully stopping the controller, the driver was continuing if
*either* RX or TX had stopped.  We need to wait for both, or the
controller could get into an invalid state.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 27 Apr 2010 23:26:46 +0000 (16:26 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  keys: don't need to use RCU in keyring_read() as semaphore is held

14 years agoMerge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Tue, 27 Apr 2010 23:26:21 +0000 (16:26 -0700)]
Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux

* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  nfsd4: bug in read_buf

14 years agokeys: the request_key() syscall should link an existing key to the dest keyring
David Howells [Tue, 27 Apr 2010 20:13:08 +0000 (13:13 -0700)]
keys: the request_key() syscall should link an existing key to the dest keyring

The request_key() system call and request_key_and_link() should make a
link from an existing key to the destination keyring (if supplied), not
just from a new key to the destination keyring.

This can be tested by:

ring=`keyctl newring fred @s`
keyctl request2 user debug:a a
keyctl request user debug:a $ring
keyctl list $ring

If it says:

keyring is empty

then it didn't work.  If it shows something like:

1 key in keyring:
1070462727: --alswrv     0     0 user: debug:a

then it did.

request_key() system call is meant to recursively search all your keyrings for
the key you desire, and, optionally, if it doesn't exist, call out to userspace
to create one for you.

If request_key() finds or creates a key, it should, optionally, create a link
to that key from the destination keyring specified.

Therefore, if, after a successful call to request_key() with a desination
keyring specified, you see the destination keyring empty, the code didn't work
correctly.

If you see the found key in the keyring, then it did - which is what the patch
is required for.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agogpio: fix pca953x set_type 'scheduling while atomic' bug
Marc Zyngier [Tue, 27 Apr 2010 20:13:07 +0000 (13:13 -0700)]
gpio: fix pca953x set_type 'scheduling while atomic' bug

Bill Gatliff reported the following bug when using the irq_chip facility
of the pca953x driver on a PPC platform:

BUG: scheduling while atomic: insmod/1530/0x00000002

He traced it back to an i2c transaction in pca953x_irq_set_type(), which
can be called with interrupt disabled (from __setup_irq()).  As the i2c
controller can sleep while sending a message, this qualifies as a bad
idea.

This patch moves the i2c transaction to pca953x_irq_bus_sync_unlock(),
where it is actually safe to send an i2c message.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Marc Zyngier <maz@misterjones.org>
Reported-by: Bill Gatliff <bgat@billgatliff.com>
Cc: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoprocfs: fix tid fdinfo
Jerome Marchand [Tue, 27 Apr 2010 20:13:06 +0000 (13:13 -0700)]
procfs: fix tid fdinfo

Correct the file_operations struct in fdinfo entry of tid_base_stuff[].

Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
/proc/*/fd/.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoarch/avr32: fix build failure caused by wrong prototype
Peter Huewe [Tue, 27 Apr 2010 20:13:04 +0000 (13:13 -0700)]
arch/avr32: fix build failure caused by wrong prototype

This patch fixes a build failure introduced by 1d8393171 ("avr32: use
generic ptrace_resume code") which had the static keyword as a leftover.

  arch/avr32/kernel/ptrace.c:32: error: static declaration of `user_enable_single_step' follows non-static declaration
  include/linux/ptrace.h:268: error: previous declaration of `user_enable_single_step' was here

References:
[1]http://kisskb.ellerman.id.au/kisskb/buildresult/2448162/

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agokeys: don't need to use RCU in keyring_read() as semaphore is held
David Howells [Tue, 27 Apr 2010 21:05:11 +0000 (14:05 -0700)]
keys: don't need to use RCU in keyring_read() as semaphore is held

keyring_read() doesn't need to use rcu_dereference() to access the keyring
payload as the caller holds the key semaphore to prevent modifications
from happening whilst the data is read out.

This should solve the following warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyring.c:204 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/2144:
 #0:  (&key->sem){+++++.}, at: [<ffffffff81177f7c>] keyctl_read_key+0x9c/0xcf

stack backtrace:
Pid: 2144, comm: keyctl Not tainted 2.6.34-rc2-cachefs #113
Call Trace:
 [<ffffffff8105121f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff811762d5>] keyring_read+0x4d/0xe7
 [<ffffffff81177f8c>] keyctl_read_key+0xac/0xcf
 [<ffffffff811788d4>] sys_keyctl+0x75/0xb9
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
14 years agoipheth: potential null dereferences on error path
Dan Carpenter [Mon, 26 Apr 2010 23:20:12 +0000 (23:20 +0000)]
ipheth: potential null dereferences on error path

The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
parameter list but those could be NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: L. Alberto Giménez <agimenez@sysvalve.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosmc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()
Ken Kawasaki [Sat, 24 Apr 2010 10:37:09 +0000 (10:37 +0000)]
smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()

smc91c92_cs:
  * spin_unlock_irqrestore before calling smc_interrupt() in media_check()
     to avoid lockup.
  * use spin_lock_irqsave for ethtool function.

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"
Andreas Hartmann [Tue, 27 Apr 2010 21:39:33 +0000 (14:39 -0700)]
drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"

akpm: reluctantly typed in from
https://bugzilla.kernel.org/show_bug.cgi?id=15599

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2: Update version to 2.0.9.
Michael Chan [Tue, 27 Apr 2010 11:28:11 +0000 (11:28 +0000)]
bnx2: Update version to 2.0.9.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
Michael Chan [Tue, 27 Apr 2010 11:28:10 +0000 (11:28 +0000)]
bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.

The bonding driver calls ndo_vlan_rx_register() while holding bond->lock.
The bnx2 driver calls bnx2_netif_stop() to stop the rx handling while
changing the vlgrp.  The call also stops the cnic driver which sleeps
while the bond->lock is held and cause the warning.

This code path only needs to stop the NAPI rx handling while we are
changing the vlgrp.  Since no reset is going to occur, there is no need
to stop cnic in this case.  By adding a parameter to bnx2_netif_stop()
to skip stopping cnic, we can avoid the warning.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2: Fix lost MSI-X problem on 5709 NICs.
Michael Chan [Tue, 27 Apr 2010 11:28:09 +0000 (11:28 +0000)]
bnx2: Fix lost MSI-X problem on 5709 NICs.

It has been reported that under certain heavy traffic conditions in MSI-X
mode, the driver can lose an MSI-X vector causing all packets in the
associated rx/tx ring pair to be dropped.  The problem is caused by
the chip dropping the write to unmask the MSI-X vector by the kernel
(when migrating the IRQ for example).

This can be prevented by increasing the GRC timeout value for these
register read and write operations.

Thanks to Dell for helping us debug this problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocxgb3: Wait longer for control packets on initialization
Andre Detsch [Mon, 26 Apr 2010 05:38:27 +0000 (05:38 +0000)]
cxgb3: Wait longer for control packets on initialization

In some Power7 platforms, when using VIOS (Virtual I/O Server), we
need to wait longer for control packets to finish transfer during
initialization.
Without this change, initialization may fail prematurely.

Signed-off-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata
Bruce Allan [Tue, 27 Apr 2010 03:33:04 +0000 (03:33 +0000)]
e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata

Prompted by a previous patch submitted by Matthew Garret <mjg@redhat.com>,
further digging into errata documentation reveals the current enabling or
disabling of ASPM L0s and L1 states for certain parts supported by this
driver are incorrect.  82571 and 82572 should always disable L1.  For
standard frames, 82573/82574/82583 can enable L1 but L0s must be disabled,
and for jumbo frames 82573/82574 must disable L1.  This allows for some
parts to enable L1 in certain configurations leading to better power
savings.

Also according to the same errata, Early Receive (ERT) should be disabled
on 82573 when using jumbo frames.

Cc: Matthew Garret <mjg@redhat.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Power down PHY during driver resets
Peter Waskiewicz [Tue, 27 Apr 2010 00:38:15 +0000 (00:38 +0000)]
ixgbe: Power down PHY during driver resets

The PHY laser is still on during driver init.  It's allowing
garbage to hit our FIFO, which eventually can cause the entire
device to die.  Power down the laser while setting up the device,
and re-enable the laser before getting link.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoRemove redundant check for CONFIG_MMU
Christoph Egger [Mon, 26 Apr 2010 14:56:36 +0000 (15:56 +0100)]
Remove redundant check for CONFIG_MMU

The checks for CONFIG_MMU at this location are duplicated as all the code is
located inside a #ifndef CONFIG_MMU block. So the first conditional block will
always be included while the second never will.

Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
Linus Torvalds [Tue, 27 Apr 2010 15:59:38 +0000 (08:59 -0700)]
Merge git://git./linux/kernel/git/pkl/squashfs-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  squashfs: fix potential buffer over-run on 4K block file systems
  squashfs: add missing buffer free
  squashfs: fix warn_on when root inode is corrupted
  squashfs: fix locking bug in zlib wrapper