safe/jmp/linux-2.6
15 years agopkt_sched: Schedule qdiscs instead of netdev_queue.
David S. Miller [Wed, 16 Jul 2008 09:15:04 +0000 (02:15 -0700)]
pkt_sched: Schedule qdiscs instead of netdev_queue.

When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopkt_sched: Add and use qdisc_root() and qdisc_root_lock().
David S. Miller [Wed, 16 Jul 2008 08:42:40 +0000 (01:42 -0700)]
pkt_sched: Add and use qdisc_root() and qdisc_root_lock().

When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.

Add qdisc_root_lock() to represent this and convert the
easiest cases.

In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopkt_sched: Make QDISC_RUNNING a qdisc state.
David S. Miller [Wed, 16 Jul 2008 07:56:32 +0000 (00:56 -0700)]
pkt_sched: Make QDISC_RUNNING a qdisc state.

Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopkt_sched: Move gso_skb into Qdisc.
David S. Miller [Wed, 16 Jul 2008 03:14:35 +0000 (20:14 -0700)]
pkt_sched: Move gso_skb into Qdisc.

We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoniu: Add TX multiqueue support.
David S. Miller [Tue, 15 Jul 2008 10:48:19 +0000 (03:48 -0700)]
niu: Add TX multiqueue support.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Kill plain netif_schedule()
David S. Miller [Tue, 15 Jul 2008 10:48:01 +0000 (03:48 -0700)]
netdev: Kill plain netif_schedule()

No more users.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Convert all drivers away from netif_schedule().
David S. Miller [Tue, 15 Jul 2008 10:47:41 +0000 (03:47 -0700)]
netdev: Convert all drivers away from netif_schedule().

They logically all want to trigger a schedule for all device
TX queues.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: Implement simple sw TX hashing.
David S. Miller [Tue, 15 Jul 2008 10:47:03 +0000 (03:47 -0700)]
net: Implement simple sw TX hashing.

It just xor hashes over IPv4/IPv6 addresses and ports of transport.

The only assumption it makes is that skb_network_header() is set
correctly.

With bug fixes from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomac80211: Reimplement WME using ->select_queue().
David S. Miller [Tue, 15 Jul 2008 10:34:57 +0000 (03:34 -0700)]
mac80211: Reimplement WME using ->select_queue().

The only behavior change is that we do not drop packets under any
circumstances.  If that is absolutely needed, we could easily add it
back.

With cleanups and help from Johannes Berg.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Add netdev->select_queue() method.
David S. Miller [Tue, 15 Jul 2008 10:03:33 +0000 (03:03 -0700)]
netdev: Add netdev->select_queue() method.

Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().

This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.

This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: netdev_priv() can now be sane again.
David S. Miller [Tue, 15 Jul 2008 09:58:39 +0000 (02:58 -0700)]
netdev: netdev_priv() can now be sane again.

The private area of a netdev is now at a fixed offset once more.

Unfortunately, some assumptions that netdev_priv() == netdev->priv
crept back into the tree.  In particular this happened in the
loopback driver.  Make it use netdev->ml_priv.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Kill struct net_device_subqueue and netdev->egress_subqueue*
David S. Miller [Tue, 15 Jul 2008 09:58:10 +0000 (02:58 -0700)]
netdev: Kill struct net_device_subqueue and netdev->egress_subqueue*

No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: Use queue aware tests throughout.
David S. Miller [Thu, 17 Jul 2008 08:56:23 +0000 (01:56 -0700)]
net: Use queue aware tests throughout.

This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomac80211: Temporarily mark QoS support BROKEN.
David S. Miller [Tue, 15 Jul 2008 09:53:04 +0000 (02:53 -0700)]
mac80211: Temporarily mark QoS support BROKEN.

We will undo this after a few changsets.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopkt_sched: Remove RR scheduler.
David S. Miller [Tue, 15 Jul 2008 09:52:19 +0000 (02:52 -0700)]
pkt_sched: Remove RR scheduler.

This actually fixes a bug added by the RR scheduler changes.  The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.

It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Kill NETIF_F_MULTI_QUEUE.
David S. Miller [Thu, 17 Jul 2008 08:52:12 +0000 (01:52 -0700)]
netdev: Kill NETIF_F_MULTI_QUEUE.

There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Allocate multiple queues for TX.
David S. Miller [Thu, 17 Jul 2008 07:34:19 +0000 (00:34 -0700)]
netdev: Allocate multiple queues for TX.

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoigb: Kill CONFIG_NETDEVICES_MULTIQUEUE references, no longer exists.
David S. Miller [Thu, 17 Jul 2008 08:50:11 +0000 (01:50 -0700)]
igb: Kill CONFIG_NETDEVICES_MULTIQUEUE references, no longer exists.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agogarp: retry sending JoinIn messages after allocation failures
Patrick McHardy [Thu, 17 Jul 2008 03:51:47 +0000 (20:51 -0700)]
garp: retry sending JoinIn messages after allocation failures

Increase reliability by retrying to send JoinIn messages after memory
allocation failures on each TRANSMIT_PDU event until it succeeds.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocore: add stat to track unresolved discards in neighbor cache
Neil Horman [Thu, 17 Jul 2008 03:50:49 +0000 (20:50 -0700)]
core: add stat to track unresolved discards in neighbor cache

in __neigh_event_send, if we have a neighbour entry which is in
NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
to the neighbours arp_queue, which is default capped to a length of 3
skbs.  If that queue exceeds its set length, it will drop an skb on
the queue to enqueue the newly arrived skb.  This results in a drop
for which we have no statistics incremented.  This patch adds an
unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
lost frames.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to NET_ADD_STATS_USER
Pavel Emelyanov [Thu, 17 Jul 2008 03:32:45 +0000 (20:32 -0700)]
mib: add net to NET_ADD_STATS_USER

Done with NET_XXX_STATS macros :)

To be continued...

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to NET_ADD_STATS_BH
Pavel Emelyanov [Thu, 17 Jul 2008 03:32:25 +0000 (20:32 -0700)]
mib: add net to NET_ADD_STATS_BH

This one is tricky.

The thing is that this macro is only used when killing tw buckets,
but since this killer is promiscuous wrt to which net each particular
tw belongs to, I have to use it only when NET_NS is off. When the net
namespaces are on, I use the INET_INC_STATS_BH for each bucket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to NET_INC_STATS_USER
Pavel Emelyanov [Thu, 17 Jul 2008 03:31:39 +0000 (20:31 -0700)]
mib: add net to NET_INC_STATS_USER

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to NET_INC_STATS_BH
Pavel Emelyanov [Thu, 17 Jul 2008 03:31:16 +0000 (20:31 -0700)]
mib: add net to NET_INC_STATS_BH

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to NET_INC_STATS
Pavel Emelyanov [Thu, 17 Jul 2008 03:30:14 +0000 (20:30 -0700)]
mib: add net to NET_INC_STATS

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotcp: replace tcp_sock argument with sock in some places
Pavel Emelyanov [Thu, 17 Jul 2008 03:29:51 +0000 (20:29 -0700)]
tcp: replace tcp_sock argument with sock in some places

These places have a tcp_sock, but we'd prefer the sock itself to
get net from it. Fortunately, tcp_sk macro is just a type cast, so
this replace is really cheap.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoinet: prepare net on the stack for NET accounting macros
Pavel Emelyanov [Thu, 17 Jul 2008 03:28:42 +0000 (20:28 -0700)]
inet: prepare net on the stack for NET accounting macros

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agosock: add net to prot->enter_memory_pressure callback
Pavel Emelyanov [Thu, 17 Jul 2008 03:28:10 +0000 (20:28 -0700)]
sock: add net to prot->enter_memory_pressure callback

The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.

I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure
callback itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to TCP_ADD_STATS_USER
Pavel Emelyanov [Thu, 17 Jul 2008 03:27:38 +0000 (20:27 -0700)]
mib: add net to TCP_ADD_STATS_USER

Now we're done with the TCP_XXX_STATS macros.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to TCP_DEC_STATS
Pavel Emelyanov [Thu, 17 Jul 2008 03:22:46 +0000 (20:22 -0700)]
mib: add net to TCP_DEC_STATS

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to TCP_INC_STATS_BH
Pavel Emelyanov [Thu, 17 Jul 2008 03:22:25 +0000 (20:22 -0700)]
mib: add net to TCP_INC_STATS_BH

Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on
the stack.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to TCP_INC_STATS
Pavel Emelyanov [Thu, 17 Jul 2008 03:22:04 +0000 (20:22 -0700)]
mib: add net to TCP_INC_STATS

Fortunately (almost) all the TCP code has a sock to get the net from :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotcp: add net to tcp_mib_init
Pavel Emelyanov [Thu, 17 Jul 2008 03:21:42 +0000 (20:21 -0700)]
tcp: add net to tcp_mib_init

This one sets TCP MIBs after zeroing them, and thus requires
the net.

The existing single caller can use init_net (temporarily).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: drop unused TCP_XXX_STATS macros
Pavel Emelyanov [Thu, 17 Jul 2008 03:21:20 +0000 (20:21 -0700)]
mib: drop unused TCP_XXX_STATS macros

TCP_INC_STATS_USER and TCP_ADD_STATS_BH are currently unused.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoinet: prepare struct net for TCP MIB accounting
Pavel Emelyanov [Thu, 17 Jul 2008 03:20:58 +0000 (20:20 -0700)]
inet: prepare struct net for TCP MIB accounting

This is the same as the first patch in the set, but preparing
the net for TCP_XXX_STATS - save the struct net on the stack
where required and possible.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to IP_ADD_STATS_BH
Pavel Emelyanov [Thu, 17 Jul 2008 03:20:33 +0000 (20:20 -0700)]
mib: add net to IP_ADD_STATS_BH

Very simple - only ip_evictor (fragments) requires such.
This patch ends up the IP_XXX_STATS patching.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to IP_INC_STATS_BH
Pavel Emelyanov [Thu, 17 Jul 2008 03:20:11 +0000 (20:20 -0700)]
mib: add net to IP_INC_STATS_BH

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add net to IP_INC_STATS
Pavel Emelyanov [Thu, 17 Jul 2008 03:19:49 +0000 (20:19 -0700)]
mib: add net to IP_INC_STATS

All the callers already have either the net itself, or the place
where to get it from.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: drop unused IP_INC_STATS_USER
Pavel Emelyanov [Thu, 17 Jul 2008 03:19:26 +0000 (20:19 -0700)]
mib: drop unused IP_INC_STATS_USER

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv4: prepare net initialization for IP accounting
Pavel Emelyanov [Thu, 17 Jul 2008 03:19:08 +0000 (20:19 -0700)]
ipv4: prepare net initialization for IP accounting

Some places, that deal with IP statistics already have where to
get a struct net from, but use it directly, without declaring
a separate variable on the stack.

So, save this net on the stack for future IP_XXX_STATS macros.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdrv intel: always enable VLAN filtering except in promiscous mode
Patrick McHardy [Thu, 17 Jul 2008 03:16:14 +0000 (20:16 -0700)]
netdrv intel: always enable VLAN filtering except in promiscous mode

Currently VLAN filtering is enabled when the first VLAN is added.
Obviously before that there's no point in receiving any VLAN packets.
Now that we disable VLAN filtering in promiscous mode, we can keep
the VLAN filters enabled the remaining time.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdrv intel: disable VLAN filtering in promiscous mode
Patrick McHardy [Thu, 17 Jul 2008 03:15:45 +0000 (20:15 -0700)]
netdrv intel: disable VLAN filtering in promiscous mode

As discussed in this thread:

http://www.mail-archive.com/netdev@vger.kernel.org/msg53976.html

promiscous mode means to disable *all* filters. Currently only unicast
and multicast filtering is disabled. This patch changes all Intel
drivers to also disable VLAN filtering.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet/ipv4/tcp.c: Fix use of PULLHUP instead of POLLHUP in comments.
Will Newton [Thu, 17 Jul 2008 03:13:43 +0000 (20:13 -0700)]
net/ipv4/tcp.c: Fix use of PULLHUP instead of POLLHUP in comments.

Change PULLHUP to POLLHUP in tcp_poll comments and clean up another
comment for grammar and coding style.

Signed-off-by: Will Newton <will.newton@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: make __skb_splice_bits static
Harvey Harrison [Thu, 17 Jul 2008 03:12:30 +0000 (20:12 -0700)]
net: make __skb_splice_bits static

net/core/skbuff.c:1335:5: warning: symbol '__skb_splice_bits' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'stealer/ipvs/sync-daemon-cleanup-for-next' of git://git.stealer.net...
David S. Miller [Thu, 17 Jul 2008 03:07:06 +0000 (20:07 -0700)]
Merge branch 'stealer/ipvs/sync-daemon-cleanup-for-next' of git://git.stealer.net/linux-2.6

15 years agoipvs: More reliable synchronization on connection close
Rumen G. Bogdanovski [Thu, 17 Jul 2008 03:04:23 +0000 (20:04 -0700)]
ipvs: More reliable synchronization on connection close

This patch enhances the synchronization of the closing connections
between the master and the backup director. It prevents the closed
connections to expire with the 15 min timeout of the ESTABLISHED
state on the backup and makes them expire as they would do on the
master with much shorter timeouts.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipvs: Use schedule_timeout_interruptible() instead of msleep_interruptible()
Sven Wegener [Wed, 16 Jul 2008 11:14:03 +0000 (11:14 +0000)]
ipvs: Use schedule_timeout_interruptible() instead of msleep_interruptible()

So that kthread_stop() can wake up the thread and we don't have to wait one
second in the worst case for the daemon to actually stop.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
15 years agoipvs: Put backup thread on mcast socket wait queue
Sven Wegener [Wed, 16 Jul 2008 11:13:56 +0000 (11:13 +0000)]
ipvs: Put backup thread on mcast socket wait queue

Instead of doing an endless loop with sleeping for one second, we now put the
backup thread onto the mcast socket wait queue and it gets woken up as soon as
we have data to process.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
15 years agoipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()
Sven Wegener [Wed, 16 Jul 2008 11:13:50 +0000 (11:13 +0000)]
ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()

This also moves the setup code out of the daemons, so that we're able to
return proper error codes to user space. The current code will return success
to user space when the daemon is started with an invald mcast interface. With
these changes we get an appropriate "No such device" error.

We longer need our own completion to be sure the daemons are actually running,
because they no longer contain code that can fail and kthread_run() takes care
of the rest.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
15 years agoipvs: Use ERR_PTR for returning errors from make_receive_sock() and make_send_sock()
Sven Wegener [Wed, 16 Jul 2008 11:13:43 +0000 (11:13 +0000)]
ipvs: Use ERR_PTR for returning errors from make_receive_sock() and make_send_sock()

The additional information we now return to the caller is currently not used,
but will be used to return errors to user space.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
15 years agoipvs: Initialize mcast addr at compile time
Sven Wegener [Wed, 16 Jul 2008 11:13:35 +0000 (11:13 +0000)]
ipvs: Initialize mcast addr at compile time

There's no need to do it at runtime, the values are constant.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
15 years agoiucv: fix memory leak in cpu hotplug error path.
Akinobu Mita [Tue, 15 Jul 2008 09:09:53 +0000 (02:09 -0700)]
iucv: fix memory leak in cpu hotplug error path.

Fix memory leak in error path in CPU_UP_PREPARE notifier.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agowireless: fix warnings from QoS patch
Johannes Berg [Tue, 15 Jul 2008 09:08:24 +0000 (02:08 -0700)]
wireless: fix warnings from QoS patch

When I removed the special "default" meaning from the QoS
parameters, I forgot to update drivers and this lead to
warnings because some drivers were checking for the special
values and putting in defaults. This fixes that by removing
the default special-casing completely.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobluetooth/hci_bcsp: fix bitrev Kconfig
Randy Dunlap [Tue, 15 Jul 2008 07:51:45 +0000 (00:51 -0700)]
bluetooth/hci_bcsp: fix bitrev Kconfig

Fix bluetooth hci_bcsp Kconfig to avoid build errors:

drivers/built-in.o: In function `bcsp_prepare_pkt':
hci_bcsp.c:(.text+0x7e9ac): undefined reference to `bitrev16'
drivers/built-in.o: In function `bcsp_recv':
hci_bcsp.c:(.text+0x7f276): undefined reference to `bitrev16'
hci_bcsp.c:(.text+0x7f293): undefined reference to `bitrev16'
make[1]: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Ackey-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: refactor tcp splice receive path to improve readability
Octavian Purdila [Tue, 15 Jul 2008 07:49:11 +0000 (00:49 -0700)]
net: refactor tcp splice receive path to improve readability

- move all of the details on offsets, lengths and buffers into a
single function instead of doing these operation from multiple places

- use a bottom up approach: try to avoid details in the high level
functions, introduce them gradually as we go deeper in the function
call stack

With helpful feedback from Jarek Poplawski.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Do not use TX lock to protect address lists.
David S. Miller [Tue, 15 Jul 2008 07:15:08 +0000 (00:15 -0700)]
netdev: Do not use TX lock to protect address lists.

Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Add netdev->addr_list_lock protection.
David S. Miller [Tue, 15 Jul 2008 07:13:44 +0000 (00:13 -0700)]
netdev: Add netdev->addr_list_lock protection.

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: Add addr_list_lock to struct net_device.
David S. Miller [Tue, 15 Jul 2008 07:08:33 +0000 (00:08 -0700)]
netdev: Add addr_list_lock to struct net_device.

This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().

Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add struct net to ICMPMSGIN_INC_STATS_BH
Pavel Emelyanov [Tue, 15 Jul 2008 06:04:00 +0000 (23:04 -0700)]
mib: add struct net to ICMPMSGIN_INC_STATS_BH

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add struct net to ICMPMSGOUT_INC_STATS
Pavel Emelyanov [Tue, 15 Jul 2008 06:03:35 +0000 (23:03 -0700)]
mib: add struct net to ICMPMSGOUT_INC_STATS

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add struct net to ICMP_INC_STATS_BH
Pavel Emelyanov [Tue, 15 Jul 2008 06:03:00 +0000 (23:03 -0700)]
mib: add struct net to ICMP_INC_STATS_BH

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomib: add struct net to ICMP_INC_STATS
Pavel Emelyanov [Tue, 15 Jul 2008 06:02:35 +0000 (23:02 -0700)]
mib: add struct net to ICMP_INC_STATS

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoicmp: drop unused MIB accounting wrappers
Pavel Emelyanov [Tue, 15 Jul 2008 06:02:09 +0000 (23:02 -0700)]
icmp: drop unused MIB accounting wrappers

There are ICMP_XXX_STATS that are not used in the kernel, so I remove
them, not to "just patch" them later. But if there's some sense in
keeping them, kick me - I will remake this set keeping them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoinet: toss struct net initialization around
Pavel Emelyanov [Tue, 15 Jul 2008 06:01:40 +0000 (23:01 -0700)]
inet: toss struct net initialization around

Some places, that deal with ICMP statistics already have where
to get a struct net from, but use it directly, without declaring
a separate variable on the stack.

Since I will need this net soon, I declare a struct net on the
stack and use it in the existing places in a separate patch not
to spoil the future ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoicmp: add struct net argument to icmp_out_count
Pavel Emelyanov [Tue, 15 Jul 2008 06:00:43 +0000 (23:00 -0700)]
icmp: add struct net argument to icmp_out_count

This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: remove unnecessary include statements
Patrick McHardy [Tue, 15 Jul 2008 05:51:55 +0000 (22:51 -0700)]
vlan: remove unnecessary include statements

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: clean up hard_start_xmit functions
Patrick McHardy [Tue, 15 Jul 2008 05:51:39 +0000 (22:51 -0700)]
vlan: clean up hard_start_xmit functions

Remove excessive comments and debugging, use NETDEV_TX codes,
remove some empty lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: clean up vlan_dev_hard_header()
Patrick McHardy [Tue, 15 Jul 2008 05:51:19 +0000 (22:51 -0700)]
vlan: clean up vlan_dev_hard_header()

Remove some debugging and excessive comments, merge the two
dev_hard_header calls into one.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: ethtool ->get_flags support
Patrick McHardy [Tue, 15 Jul 2008 05:51:01 +0000 (22:51 -0700)]
vlan: ethtool ->get_flags support

Allow to query LRO settings of underlying device when VLAN RX
acceleration is used.

Suggested by Ben Hutchings <bhutchings@solarflare.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopacket: deliver VLAN TCI to userspace
Patrick McHardy [Tue, 15 Jul 2008 05:50:39 +0000 (22:50 -0700)]
packet: deliver VLAN TCI to userspace

Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
properly deal with hardware VLAN tagging/stripping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopacket: support extensible, 64 bit clean mmaped ring structure
Patrick McHardy [Tue, 15 Jul 2008 05:50:15 +0000 (22:50 -0700)]
packet: support extensible, 64 bit clean mmaped ring structure

The tpacket_hdr is not 64 bit clean due to use of an unsigned long
and can't be extended because the following struct sockaddr_ll needs
to be at a fixed offset.

Add support for a version 2 tpacket protocol that removes these
limitations.

Userspace can query the header size through a new getsockopt option
and change the protocol version through a setsockopt option. The
changes needed to switch to the new protocol version are:

1. replace struct tpacket_hdr by struct tpacket2_hdr
2. query header len and save
3. set protocol version to 2
 - set up ring as usual
4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
   instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))

Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: deliver packets received with VLAN acceleration to network taps
Patrick McHardy [Tue, 15 Jul 2008 05:49:30 +0000 (22:49 -0700)]
vlan: deliver packets received with VLAN acceleration to network taps

When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.

Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: Don't store VLAN tag in cb
Patrick McHardy [Tue, 15 Jul 2008 05:49:06 +0000 (22:49 -0700)]
vlan: Don't store VLAN tag in cb

Use a real skb member to store the skb to avoid clashes with qdiscs,
which are allowed to use the cb area themselves. As currently only real
devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
invalidation is neccessary.

The new member fills a hole on 64 bit, the skb layout changes from:

        __u32                      mark;                 /*   172     4 */
        sk_buff_data_t             transport_header;     /*   176     4 */
        sk_buff_data_t             network_header;       /*   180     4 */
        sk_buff_data_t             mac_header;           /*   184     4 */
        sk_buff_data_t             tail;                 /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        sk_buff_data_t             end;                  /*   192     4 */

        /* XXX 4 bytes hole, try to pack */

to

        __u32                      mark;                 /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */

        /* XXX 2 bytes hole, try to pack */

        sk_buff_data_t             transport_header;     /*   180     4 */
        sk_buff_data_t             network_header;       /*   184     4 */

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Optimization to multicast name lookup algorithm
Allan Stephens [Tue, 15 Jul 2008 05:45:33 +0000 (22:45 -0700)]
tipc: Optimization to multicast name lookup algorithm

This patch simplifies and speeds up TIPC's algorithm for identifying
on-node and off-node destinations that overlap a multicast name
sequence range.  Rather than traversing the list of all known name
publications within the cluster, it now traverses the (potentially
much shorter) list of name publications made by the node itself, and
determines if any off-node destinations exist by comparing the sizes
of the two lists.  (Since the node list must be a subset of the
cluster list, a difference in sizes means that at least one off-node
destination must exist.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Add missing locks when inspecting node list & link list
Allan Stephens [Tue, 15 Jul 2008 05:44:58 +0000 (22:44 -0700)]
tipc: Add missing locks when inspecting node list & link list

This patch ensures that TIPC configuration commands that display info
about neighboring nodes and their links take the spinlocks that
protect the node list and link lists from changing while the lists
are being traversed.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Fix bug in scope checking for multicast messages
Allan Stephens [Tue, 15 Jul 2008 05:44:32 +0000 (22:44 -0700)]
tipc: Fix bug in scope checking for multicast messages

This patch ensures that TIPC's multicast message name lookup
algorithm does individualized scope checking for each published
name it examines.  Previously, scope checking was only done for
the first name in a name table publication list, which could
result in incoming multicast messages being delivered to ports
publishing names with "node" scope, or not being delivered to
ports publishing names with "cluster" or "zone" scope.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Eliminate improper use of TIPC_OK error code
Allan Stephens [Tue, 15 Jul 2008 05:44:01 +0000 (22:44 -0700)]
tipc: Eliminate improper use of TIPC_OK error code

This patch corrects many places where TIPC routines indicated
successful completion by returning TIPC_OK instead of 0.
(The TIPC_OK symbol has the value 0, but it should only be used
in contexts that deal with the error code field of a TIPC
message header.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Fix race condition that could cause accept() to fail
Allan Stephens [Tue, 15 Jul 2008 05:43:32 +0000 (22:43 -0700)]
tipc: Fix race condition that could cause accept() to fail

This patch ensurs that accept() returns successfully even when
the newly created socket is immediately disconnected by its peer.
Previously, accept() would fail if it was unable to pass back
the optional address info for the socket's peer before the
socket became disconnected; TIPC now allows accept() to gather
peer address information after disconnection.  As a bonus, the
revised code accesses the socket's port more efficiently, without
the overhead incurred by a reference table lookup.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Optimize pointer dereferencing when receiving stream data
Allan Stephens [Tue, 15 Jul 2008 05:42:51 +0000 (22:42 -0700)]
tipc: Optimize pointer dereferencing when receiving stream data

This patch eliminates an unnecessary pointer dereference when
accessing a stream-based socket's receive queue.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: Remove unneeded parameter to tipc_createport_raw()
Allan Stephens [Tue, 15 Jul 2008 05:42:19 +0000 (22:42 -0700)]
tipc: Remove unneeded parameter to tipc_createport_raw()

This patch eliminates an unneeded parameter when creating a low-level
TIPC port object.  Instead of returning both the pointer to the port
structure and the port's reference ID, it now returns only the pointer
since the port structure contains the reference ID as one of its fields.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Update version to 1.7.8.
Michael Chan [Tue, 15 Jul 2008 05:40:21 +0000 (22:40 -0700)]
bnx2: Update version to 1.7.8.

Signed-off-by: Michael Chan <mchan@braodcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Support secondary MAC addresses.
Benjamin Li [Tue, 15 Jul 2008 05:39:52 +0000 (22:39 -0700)]
bnx2: Support secondary MAC addresses.

Add support for configuring secondary unicast addresses.  There
are 4 additional perfect match filters which can be used for
secondary unicast address support.

  *  Modified bnx2_set_mac_addr() to be more generic in handling
     the setting of the perfect match filters
  *  Changed bnx2_set_rx_mode() to handle the unicast dev_addr_list

Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Allow flexible VLAN tag settings.
Michael Chan [Tue, 15 Jul 2008 05:39:03 +0000 (22:39 -0700)]
bnx2: Allow flexible VLAN tag settings.

Negotiate with boot code and ASF firmware to see if it can
support keeping VLAN tags in the RX packets.  If supported
by firmware, the VLAN tag will be kept in the RX packet
unless VLAN acceleration is registered.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Add ack parameter to bnx2_fw_sync().
Michael Chan [Tue, 15 Jul 2008 05:38:23 +0000 (22:38 -0700)]
bnx2: Add ack parameter to bnx2_fw_sync().

ack=1 means wait for firmware acknowledgement, and ack=0
means don't wait.  All current callers will set it to 1.

In the next patch, new calls will set ack=0.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Add PCI ID for 5716.
Michael Chan [Tue, 15 Jul 2008 05:37:47 +0000 (22:37 -0700)]
bnx2: Add PCI ID for 5716.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <Benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Prevent ethtool -s from crashing when device is down.
Michael Chan [Tue, 15 Jul 2008 05:37:21 +0000 (22:37 -0700)]
bnx2: Prevent ethtool -s from crashing when device is down.

The device may be in D3-hot state and may crash if we try to
configure the speed settings by accessing the registers.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
David S. Miller [Tue, 15 Jul 2008 05:30:17 +0000 (22:30 -0700)]
Merge branch 'davem-next' of /linux/kernel/git/jgarzik/netdev-2.6

15 years agonetlabel: return msg overflow error from netlbl_cipsov4_list faster
Denis V. Lunev [Tue, 15 Jul 2008 05:28:25 +0000 (22:28 -0700)]
netlabel: return msg overflow error from netlbl_cipsov4_list faster

Currently, we are trying to place the information from the kernel to
1, 2, 3 and 4 pages sequentially. These pages are allocated via slab.
Though, from the slab point of view steps 3 and 4 are equivalent on
most architectures. So, lets skip 3 pages attempt.

By the way, should we switch from .doit to .dumpit interface here?
The amount of data seems quite big for me.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: Remove references to wan-router.txt in Kconfigs
Johann Felix Soden [Tue, 15 Jul 2008 05:22:29 +0000 (22:22 -0700)]
net: Remove references to wan-router.txt in Kconfigs

This patch removes references in drivers/net/wan/Kconfig and
net/wanrouter/Kconfig to Documentation/networking/wan-router.txt
which was removed in commit 99971e70fdc1862e120f3319fc0a4dba8c728acf
("[WANPIPE]: Forgotten bits of Sangoma drivers removal.").

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotun: Fix/rewrite packet filtering logic
Max Krasnyansky [Tue, 15 Jul 2008 05:18:19 +0000 (22:18 -0700)]
tun: Fix/rewrite packet filtering logic

Please see the following thread to get some context on this
http://marc.info/?l=linux-netdev&m=121564433018903&w=2

Basically the issue is that current multi-cast filtering stuff in
the TUN/TAP driver is seriously broken.
Original patch went in without proper review and ACK. It was broken and
confusing to start with and subsequent patches broke it completely.
To give you an idea of what's broken here are some of the issues:

- Very confusing comments throughout the code that imply that the
character device is a network interface in its own right, and that packets
are passed between the two nics. Which is completely wrong.

- Wrong set of ioctls is used for setting up filters. They look like
shortcuts for manipulating state of the tun/tap network interface but
in reality manipulate the state of the TX filter.

- ioctls that were originally used for setting address of the the TX filter
got "fixed" and now set the address of the network interface itself. Which
made filter totaly useless.

- Filtering is done too late. Instead of filtering early on, to avoid
unnecessary wakeups, filtering is done in the read() call.

The list goes on and on :)

So the patch cleans all that up. It introduces simple and clean interface for
setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering
before enqueuing the packets.

TX filtering is useful in the scenarios where TAP is part of a bridge, in
which case it gets all broadcast, multicast and potentially other packets when
the bridge is learning. So for example Ethernet tunnelling app may want to
setup TX filters to avoid tunnelling multicast traffic. QEMU and other
hypervisors can push RX filtering that is currently done in the guest into the
host context therefore saving wakeups and unnecessary data transfer.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years ago8021q: Check return of dev_set_promiscuity/allmulti
Wang Chen [Tue, 15 Jul 2008 03:59:03 +0000 (20:59 -0700)]
8021q: Check return of dev_set_promiscuity/allmulti

dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check all positive increment for promiscuity and allmulti
to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomacvlan: Check return of dev_set_allmulti
Wang Chen [Tue, 15 Jul 2008 03:57:07 +0000 (20:57 -0700)]
macvlan: Check return of dev_set_allmulti

allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for allmulti to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv4: Fix ipmr unregister device oops
Wang Chen [Tue, 15 Jul 2008 03:56:34 +0000 (20:56 -0700)]
ipv4: Fix ipmr unregister device oops

An oops happens during device unregister.

The following oops happened when I add two tunnels, which
use a same device, and then delete one tunnel.
Obviously deleting tunnel "A" causes device unregister, which
send a notification, and after receiving notification, ipmr do
unregister again for tunnel "B" which also use same device.
That is wrong.
After receiving notification, ipmr only needs to decrease reference
count and don't do duplicated unregister.
Fortunately, IPv6 side doesn't add tunnel in ip6mr, so it's clean.

This patch fixs:
- unregister device oops
- using after dev_put()

Here is the oops:
===
Jul 11 15:39:29 wangchen kernel: ------------[ cut here ]------------
Jul 11 15:39:29 wangchen kernel: kernel BUG at net/core/dev.c:3651!
Jul 11 15:39:29 wangchen kernel: invalid opcode: 0000 [#1]
Jul 11 15:39:29 wangchen kernel: Modules linked in: ipip tunnel4 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet binfmt_misc button battery ac loop dm_mod usbhid ff_memless pcmcia firmware_class ohci1394 8139too mii ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ide_cd_mod cdrom snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm i2c_i801 snd_timer snd i2c_core soundcore snd_page_alloc rng_core shpchp ehci_hcd uhci_hcd pci_hotplug intel_agp agpgart usbcore ext3 jbd ata_piix ahci libata dock edd fan thermal processor thermal_sys piix sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Jul 11 15:39:29 wangchen kernel:
Jul 11 15:39:29 wangchen kernel: Pid: 4102, comm: mroute Not tainted (2.6.26-rc9-default #69)
Jul 11 15:39:29 wangchen kernel: EIP: 0060:[<c024636b>] EFLAGS: 00010202 CPU: 0
Jul 11 15:39:29 wangchen kernel: EIP is at rollback_registered+0x61/0xe3
Jul 11 15:39:29 wangchen kernel: EAX: 00000001 EBX: ecba6000 ECX: 00000000 EDX: ffffffff
Jul 11 15:39:29 wangchen kernel: ESI: 00000001 EDI: ecba6000 EBP: c03de2e8 ESP: ed8e7c3c
Jul 11 15:39:29 wangchen kernel:  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Jul 11 15:39:29 wangchen kernel: Process mroute (pid: 4102, ti=ed8e6000 task=ed41e830 task.ti=ed8e6000)
Jul 11 15:39:29 wangchen kernel: Stack: ecba6000 c024641c 00000028 c0284e1a 00000001 c03de2e8 ecba6000 eecff360
Jul 11 15:39:29 wangchen kernel:        c0284e4c c03536f4 fffffff8 00000000 c029a819 ecba6000 00000006 ecba6000
Jul 11 15:39:29 wangchen kernel:        00000000 ecba6000 c03de2c0 c012841b ffffffff 00000000 c024639f ecba6000
Jul 11 15:39:29 wangchen kernel: Call Trace:
Jul 11 15:39:29 wangchen kernel:  [<c024641c>] unregister_netdevice+0x2f/0x51
Jul 11 15:39:29 wangchen kernel:  [<c0284e1a>] vif_delete+0xaf/0xc3
Jul 11 15:39:29 wangchen kernel:  [<c0284e4c>] ipmr_device_event+0x1e/0x30
Jul 11 15:39:29 wangchen kernel:  [<c029a819>] notifier_call_chain+0x2a/0x47
Jul 11 15:39:29 wangchen kernel:  [<c012841b>] raw_notifier_call_chain+0x9/0xc
Jul 11 15:39:29 wangchen kernel:  [<c024639f>] rollback_registered+0x95/0xe3
Jul 11 15:39:29 wangchen kernel:  [<c024641c>] unregister_netdevice+0x2f/0x51
Jul 11 15:39:29 wangchen kernel:  [<c0284e1a>] vif_delete+0xaf/0xc3
Jul 11 15:39:29 wangchen kernel:  [<c0285eee>] ip_mroute_setsockopt+0x47a/0x801
Jul 11 15:39:29 wangchen kernel:  [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd]
Jul 11 15:39:29 wangchen kernel:  [<c01727c4>] __find_get_block_slow+0xda/0xe4
Jul 11 15:39:29 wangchen kernel:  [<c0172a7f>] __find_get_block+0xf8/0x122
Jul 11 15:39:29 wangchen kernel:  [<c0172a7f>] __find_get_block+0xf8/0x122
Jul 11 15:39:29 wangchen kernel:  [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd]
Jul 11 15:39:29 wangchen kernel:  [<c0263501>] ip_setsockopt+0xa9/0x9ee
Jul 11 15:39:29 wangchen kernel:  [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd]
Jul 11 15:39:29 wangchen kernel:  [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd]
Jul 11 15:39:29 wangchen kernel:  [<eea69287>] __ext3_get_inode_loc+0xcf/0x271 [ext3]
Jul 11 15:39:29 wangchen kernel:  [<eea743c7>] __ext3_journal_dirty_metadata+0x13/0x32 [ext3]
Jul 11 15:39:29 wangchen kernel:  [<c0116434>] __wake_up+0xf/0x15
Jul 11 15:39:29 wangchen kernel:  [<eea5a424>] journal_stop+0x1bd/0x1c6 [jbd]
Jul 11 15:39:29 wangchen kernel:  [<eea703a7>] __ext3_journal_stop+0x19/0x34 [ext3]
Jul 11 15:39:29 wangchen kernel:  [<c014291e>] get_page_from_freelist+0x94/0x369
Jul 11 15:39:29 wangchen kernel:  [<c01408f2>] filemap_fault+0x1ac/0x2fe
Jul 11 15:39:29 wangchen kernel:  [<c01a605e>] security_sk_alloc+0xd/0xf
Jul 11 15:39:29 wangchen kernel:  [<c023edea>] sk_prot_alloc+0x36/0x78
Jul 11 15:39:29 wangchen kernel:  [<c0240037>] sk_alloc+0x3a/0x40
Jul 11 15:39:29 wangchen kernel:  [<c0276062>] raw_hash_sk+0x46/0x4e
Jul 11 15:39:29 wangchen kernel:  [<c0166aff>] d_alloc+0x1b/0x157
Jul 11 15:39:29 wangchen kernel:  [<c023e4d1>] sock_common_setsockopt+0x12/0x16
Jul 11 15:39:29 wangchen kernel:  [<c023cb1e>] sys_setsockopt+0x6f/0x8e
Jul 11 15:39:29 wangchen kernel:  [<c023e105>] sys_socketcall+0x15c/0x19e
Jul 11 15:39:29 wangchen kernel:  [<c0103611>] sysenter_past_esp+0x6a/0x99
Jul 11 15:39:29 wangchen kernel:  [<c0290000>] unix_poll+0x69/0x78
Jul 11 15:39:29 wangchen kernel:  =======================
Jul 11 15:39:29 wangchen kernel: Code: 83 e0 01 00 00 85 c0 75 1f 53 53 68 12 81 31 c0 e8 3c 30 ed ff ba 3f 0e 00 00 b8 b9 7f 31 c0 83 c4 0c 5b e9 f5 26 ed ff 48 74 04 <0f> 0b eb fe 89 d8 e8 21 ff ff ff 89 d8 e8 62 ea ff ff c7 83 e0
Jul 11 15:39:29 wangchen kernel: EIP: [<c024636b>] rollback_registered+0x61/0xe3 SS:ESP 0068:ed8e7c3c
Jul 11 15:39:29 wangchen kernel: ---[ end trace c311acf85d169786 ]---
===

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv4: Check return of dev_set_allmulti
Wang Chen [Tue, 15 Jul 2008 03:55:26 +0000 (20:55 -0700)]
ipv4: Check return of dev_set_allmulti

allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for allmulti to get error return.

PS: For unwinding tunnel creating, we let ipip->ioctl() to handle it.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv6: Fix using after dev_put()
Wang Chen [Tue, 15 Jul 2008 03:54:54 +0000 (20:54 -0700)]
ipv6: Fix using after dev_put()

Patrick McHardy pointed it out.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv6: Check return of dev_set_allmulti
Wang Chen [Tue, 15 Jul 2008 03:54:23 +0000 (20:54 -0700)]
ipv6: Check return of dev_set_allmulti

allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for allmulti to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobridge: Check return of dev_set_promiscuity
Wang Chen [Tue, 15 Jul 2008 03:53:13 +0000 (20:53 -0700)]
bridge: Check return of dev_set_promiscuity

dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for promiscuity to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobonding: Check return of dev_set_promiscuity/allmulti
Wang Chen [Tue, 15 Jul 2008 03:51:36 +0000 (20:51 -0700)]
bonding: Check return of dev_set_promiscuity/allmulti

dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

In bond_alb and bond_main, we check all positive increment for promiscuity
and allmulti to get error return.
But there are still two problems left.
1. Some code path has no mechanism to signal errors upstream.
2. If there are multi slaves, it's hard to tell which slaves increment
   promisc/allmulti successfully and which failed.
So I left these problems to be FIXME.
Fortunately, the overflow is very rare case.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoaf_packet: Check return of dev_set_promiscuity/allmulti
Wang Chen [Tue, 15 Jul 2008 03:49:46 +0000 (20:49 -0700)]
af_packet: Check return of dev_set_promiscuity/allmulti

dev_set_promiscuity/allmulti might overflow.  Commit: "netdevice: Fix
promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

In af_packet, we check all positive increment for promiscuity and
allmulti to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Tue, 15 Jul 2008 03:40:34 +0000 (20:40 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6