safe/jmp/linux-2.6
14 years agoIXP42x HSS support for setting internal clock rate
Krzysztof Halasa [Sat, 5 Sep 2009 03:59:49 +0000 (03:59 +0000)]
IXP42x HSS support for setting internal clock rate

HSS usually uses external clocks, so it's not a big deal. Internal clock
is used for direct DTE-DTE connections and when the DCE doesn't provide
it's own clock.

This also depends on the oscillator frequency. Intel seems to have
calculated the clock register settings for 33.33 MHz (66.66 MHz timer
base). Their settings seem quite suboptimal both in terms of average
frequency (60 ppm is unacceptable for G.703 applications, their primary
intended usage(?)) and jitter.

Many (most?) platforms use a 33.333 MHz oscillator, a 10 ppm difference
from Intel's base.

Instead of creating static tables, I've created a procedure to program
the HSS clock register. The register consists of 3 parts (A, B, C).
The average frequency (= bit rate) is:
66.66x MHz / (A  + (B + 1) / (C + 1))
The procedure aims at the closest average frequency, possibly at the
cost of increased jitter. Nobody would be able to directly drive an
unbufferred transmitter with a HSS anyway, and the frequency error is
what it really counts.

I've verified the above with an oscilloscope on IXP425. It seems IXP46x
and possibly IXP43x use a bit different clock generation algorithm - it
looks like the avg frequency is:
(on IXP465) 66.66x MHz / (A  + B / (C + 1)).
Also they use much greater precomputed A and B - on IXP425 it would
simply result in more jitter, but I don't know how does it work on
IXP46x (perhaps 3 least significant bits aren't used?).

Anyway it looks that they were aiming for exactly +60 ppm or -60 ppm,
while <1 ppm is typically possible (with a synchronized clock, of
course).

The attached patch makes it possible to set almost any bit rate
(my IXP425 533 MHz quits at > 22 Mb/s if a single port is used, and the
minimum is ca. 65 Kb/s).

This is independent of MVIP (multi-E1/T1 on one HSS) mode.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoWAN: remove deprecated PCI_DEVICE_ID from PCI200SYN driver.
Krzysztof Halasa [Sat, 5 Sep 2009 00:54:30 +0000 (00:54 +0000)]
WAN: remove deprecated PCI_DEVICE_ID from PCI200SYN driver.

PCI200SYN has its own PCI subsystem device ID for 3+ years, now it's
time to remove the generic PLX905[02] ID from the driver. Anyone with
old EEPROM data will have to run the upgrade.

Having the generic PLX905[02] (PCI-local bus bridge) ID is harmful
as the driver tries to handle other devices based on these bridges.

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobe2net: Code changes in Tx path to use skb_dma_map/skb_dma_unmap
Ajit Khaparde [Fri, 4 Sep 2009 03:12:29 +0000 (03:12 +0000)]
be2net: Code changes in Tx path to use skb_dma_map/skb_dma_unmap

Code changes to
 - In the tx completion processing, there were instances of unmapping a
memory as a page which was originally mapped as single. This patch takes care
of this by using skb_dma_map()/skb_dma_unmap() to map/unmap Tx buffers.
 - set gso_max_size to 65535. This was not done till now.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobe2net: Changes to support flashing of the be2 network adapter
Ajit Khaparde [Fri, 4 Sep 2009 03:12:16 +0000 (03:12 +0000)]
be2net: Changes to support flashing of the be2 network adapter

Changes to support flashing of the be2 network adapter using the
request_firmware() & ethtool infrastructure. The trigger to flash the device
will come from ethtool utility. The driver will invoke request_firmware()
to start the flash process. The file containing the flash image is expected
to be available in /lib/firmware/

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agowan: dlci/sdla transmit return dehacking
Stephen Hemminger [Fri, 4 Sep 2009 05:33:46 +0000 (05:33 +0000)]
wan: dlci/sdla transmit return dehacking

This is a brute force removal of the wierd slave interface done for
DLCI -> SDLA transmit. Before it was using non-standard return values
and freeing skb in caller.  This changes it to using normal return
values, and freeing in the callee.  Luckly only one driver pair was
doing this. Not tested on real hardware, in fact I wonder if this
driver pair is even being used by any users.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: update version to 4.0.50
Dhananjay Phadke [Sat, 5 Sep 2009 17:43:12 +0000 (17:43 +0000)]
netxen: update version to 4.0.50

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: refactor firmware info code
Dhananjay Phadke [Sat, 5 Sep 2009 17:43:11 +0000 (17:43 +0000)]
netxen: refactor firmware info code

o Combine netxen_get_firmware_info(), netxen_check_options()
  so that they are updated every time firmware is reset.
o Set dma mask everytime firmware is reset.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: pre calculate register addresses
Amit Kumar Salecha [Sat, 5 Sep 2009 17:43:10 +0000 (17:43 +0000)]
netxen: pre calculate register addresses

For registers accessed in fast path (interrupt / softirq)
avoid expensive I/O address translation. These registers
are directly mapped in PCI bar 0 and do not require
any window checks.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: fix ip addr hashing after firmware reset
Amit Kumar Salecha [Sat, 5 Sep 2009 17:43:09 +0000 (17:43 +0000)]
netxen: fix ip addr hashing after firmware reset

Reprogram local IP addresses after firmware is reset
or after resuming from suspend.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: firmware hang detection
Dhananjay Phadke [Sat, 5 Sep 2009 17:43:08 +0000 (17:43 +0000)]
netxen: firmware hang detection

Implement state machine to detect firmware hung state
and recover. Since firmware will be shared by all PCI
functions that have different class drivers (NIC or
FCOE or iSCSI), explicit hardware based serialization
is required for initializing firmware.

o Used global scratchpad register to maintain device
  reference count. Every probed pci function adds to
  ref count.

o Implement timer (delayed work) for each pci func
  that checks firmware heartbit every 5 sec and detaches
  itself if firmware is dead. Last detaching function
  reloads firmware. Other functions wait for firmware
  init, and re-attach themselves.

Heartbit is not supported by NX2031 firmware.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: handle firmware load errors
Dhananjay Phadke [Sat, 5 Sep 2009 17:43:07 +0000 (17:43 +0000)]
netxen: handle firmware load errors

Unwind allocations and release file firmware when
when firmware load fails.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: add classful multiqueue dummy scheduler
David S. Miller [Sun, 6 Sep 2009 08:58:51 +0000 (01:58 -0700)]
net_sched: add classful multiqueue dummy scheduler

This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.

This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.

Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

- cl_ops->select_queue selects the tx queue number for new child classes.

- qdisc_ops->attach() overrides root qdisc device grafting to attach
  non-shared qdiscs to the queues.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: move dev_graft_qdisc() to sch_generic.c
Patrick McHardy [Fri, 4 Sep 2009 06:41:20 +0000 (06:41 +0000)]
net_sched: move dev_graft_qdisc() to sch_generic.c

It will be used in a following patch by the multiqueue qdisc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: reintroduce dev->qdisc for use by sch_api
Patrick McHardy [Fri, 4 Sep 2009 06:41:18 +0000 (06:41 +0000)]
net_sched: reintroduce dev->qdisc for use by sch_api

Currently the multiqueue integration with the qdisc API suffers from
a few problems:

- with multiple queues, all root qdiscs use the same handle. This means
  they can't be exposed to userspace in a backwards compatible fashion.

- all API operations always refer to queue number 0. Newly created
  qdiscs are automatically shared between all queues, its not possible
  to address individual queues or restore multiqueue behaviour once a
  shared qdisc has been attached.

- Dumps only contain the root qdisc of queue 0, in case of non-shared
  qdiscs this means the statistics are incomplete.

This patch reintroduces dev->qdisc, which points to the (single) root qdisc
from userspace's point of view. Currently it either points to the first
(non-shared) default qdisc, or a qdisc shared between all queues. The
following patches will introduce a classful dummy qdisc, which will be used
as root qdisc and contain the per-queue qdiscs as children.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: remove some unnecessary checks in classful schedulers
Patrick McHardy [Fri, 4 Sep 2009 06:41:17 +0000 (06:41 +0000)]
net_sched: remove some unnecessary checks in classful schedulers

The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
originate from either ->get() or ->walk() and are always valid.

Remove unnecessary checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: make cls_ops->change and cls_ops->delete optional
Patrick McHardy [Fri, 4 Sep 2009 06:41:16 +0000 (06:41 +0000)]
net_sched: make cls_ops->change and cls_ops->delete optional

Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.

In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.

As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: make cls_ops->tcf_chain() optional
Patrick McHardy [Fri, 4 Sep 2009 06:41:15 +0000 (06:41 +0000)]
net_sched: make cls_ops->tcf_chain() optional

Some qdiscs don't support attaching filters. Handle this centrally in
cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet_sched: fix class grafting errno codes
Patrick McHardy [Fri, 4 Sep 2009 06:41:13 +0000 (06:41 +0000)]
net_sched: fix class grafting errno codes

If the parent qdisc doesn't support classes, use EOPNOTSUPP.
If the parent class doesn't exist, use ENOENT. Currently EINVAL
is returned in both cases.

Additionally check whether grafting is supported and remove a now
unnecessary graft function from sch_ingress.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetlink: silence compiler warning
Brian Haley [Sat, 5 Sep 2009 03:36:52 +0000 (20:36 -0700)]
netlink: silence compiler warning

  CC      net/netlink/genetlink.o
net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

From following the code 'err' is initialized, but set it to zero to
silence the warning.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: Catch bogus stream sequence numbers
Vlad Yasevich [Fri, 4 Sep 2009 22:21:03 +0000 (18:21 -0400)]
sctp: Catch bogus stream sequence numbers

Since our TSN map is capable of holding at most a 4K chunk gap,
there is no way that during this gap, a stream sequence number
(unsigned short) can wrap such that the new number is smaller
then the next expected one.  If such a case is encountered,
this is a protocol violation.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: remove dup code in net/sctp/output.c
Wei Yongjun [Fri, 4 Sep 2009 06:34:06 +0000 (14:34 +0800)]
sctp: remove dup code in net/sctp/output.c

Use sctp_packet_reset() instead of dup code.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: turn flags in 'struct sctp_association' into bit fields
Wei Yongjun [Fri, 4 Sep 2009 06:33:19 +0000 (14:33 +0800)]
sctp: turn flags in 'struct sctp_association' into bit fields

This shrinks the size of struct sctp_association a little.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Sysctl configuration for IPv4 Address Scoping
Bhaskar Dutta [Thu, 3 Sep 2009 11:55:47 +0000 (17:25 +0530)]
sctp: Sysctl configuration for IPv4 Address Scoping

This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack

Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Get rid of an extra routing lookup when adding a transport.
Vlad Yasevich [Fri, 4 Sep 2009 22:21:01 +0000 (18:21 -0400)]
sctp: Get rid of an extra routing lookup when adding a transport.

We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet.  There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.

We now do just the one call to do both route to destination and get
path mtu updates.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Turn flags in 'sctp_packet' into bit fields
Vlad Yasevich [Fri, 4 Sep 2009 22:21:01 +0000 (18:21 -0400)]
sctp: Turn flags in 'sctp_packet' into bit fields

This shrinks the size of sctp_packet a little.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Correctly track if AUTH has been bundled.
Vlad Yasevich [Fri, 4 Sep 2009 22:21:00 +0000 (18:21 -0400)]
sctp: Correctly track if AUTH has been bundled.

We currently track if AUTH has been bundled using the 'auth'
pointer to the chunk.  However, AUTH is disallowed after DATA
is already in the packet, so we need to instead use the
'has_auth' field.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: fix to reset packet information after packet transmit
Wei Yongjun [Wed, 2 Sep 2009 05:05:33 +0000 (13:05 +0800)]
sctp: fix to reset packet information after packet transmit

The packet information does not reset after packet transmit, this
may cause some problems such as following DATA chunk be sent without
AUTH chunk, even if the authentication of DATA chunk has been
requested by the peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Failover transmitted list on transport delete
Vlad Yasevich [Fri, 4 Sep 2009 22:21:00 +0000 (18:21 -0400)]
sctp: Failover transmitted list on transport delete

Add-IP feature allows users to delete an active transport.  If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.

Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Fix SCTP_MAXSEG socket option to comply to spec.
Vlad Yasevich [Fri, 4 Sep 2009 22:21:00 +0000 (18:21 -0400)]
sctp: Fix SCTP_MAXSEG socket option to comply to spec.

We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Don't do NAGLE delay on large writes that were fragmented small
Vlad Yasevich [Fri, 4 Sep 2009 22:20:59 +0000 (18:20 -0400)]
sctp: Don't do NAGLE delay on large writes that were fragmented small

SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Nagle delay should be based on path mtu
Vlad Yasevich [Fri, 4 Sep 2009 22:20:59 +0000 (18:20 -0400)]
sctp: Nagle delay should be based on path mtu

The decision to delay due to Nagle should be based on the path mtu
and future packet size.  We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation.  This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.
Vlad Yasevich [Fri, 4 Sep 2009 22:20:59 +0000 (18:20 -0400)]
sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.

We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing.  The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing.  The fix is to use the peers old rwnd and add our flight
size to it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: drop a_rwnd to 0 when receive buffer overflows.
Vlad Yasevich [Fri, 4 Sep 2009 22:20:59 +0000 (18:20 -0400)]
sctp: drop a_rwnd to 0 when receive buffer overflows.

SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Clear fast_recovery on the transport when T3 timer expires.
Vlad Yasevich [Fri, 4 Sep 2009 22:20:58 +0000 (18:20 -0400)]
sctp: Clear fast_recovery on the transport when T3 timer expires.

If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void.  We can clear the fast recovery
flag.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Fix error count increments that were results of HEARTBEATS
Vlad Yasevich [Wed, 26 Aug 2009 13:36:25 +0000 (09:36 -0400)]
sctp: Fix error count increments that were results of HEARTBEATS

SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint.  As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon.  This goes for both
the overall error count and the path error count.

Now, there is a difference in how the detection is done
between the two.  The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold.  The overall error detection
is done _BEFORE_ the increment.  Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.

Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: use proc_create()
Alexey Dobriyan [Sun, 23 Aug 2009 19:11:36 +0000 (23:11 +0400)]
sctp: use proc_create()

create_proc_entry() is deprecated (not formally, though).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: fix check the chunk length of received HEARTBEAT-ACK chunk
Wei Yongjun [Sat, 22 Aug 2009 03:27:37 +0000 (11:27 +0800)]
sctp: fix check the chunk length of received HEARTBEAT-ACK chunk

The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
  sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory.  We should really make sure that the chunk format
is what we expect, before attempting to touch the data.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: drop SHUTDOWN chunk if the TSN is less than the CTSN
Wei Yongjun [Sat, 22 Aug 2009 03:24:00 +0000 (11:24 +0800)]
sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN

If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Send user messages to the lower layer as one
Vlad Yasevich [Mon, 10 Aug 2009 17:51:03 +0000 (13:51 -0400)]
sctp: Send user messages to the lower layer as one

Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Try to encourage SACK bundling with DATA.
Vlad Yasevich [Fri, 7 Aug 2009 17:23:28 +0000 (13:23 -0400)]
sctp: Try to encourage SACK bundling with DATA.

If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Generate SACKs when actually sending outbound DATA
Vlad Yasevich [Fri, 7 Aug 2009 14:43:07 +0000 (10:43 -0400)]
sctp: Generate SACKs when actually sending outbound DATA

We are now trying to bundle SACKs when we have outbound
DATA to send.  However, there are situations where this
outbound DATA will not be sent (due to congestion or
available window).  In such cases it's ok to wait for the
timer to expire.  This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.

Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Fix data segmentation with small frag_size
Vlad Yasevich [Fri, 4 Sep 2009 22:20:56 +0000 (18:20 -0400)]
sctp: Fix data segmentation with small frag_size

Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation.   Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.

What we need to do is track the largest possbile DATA chunk that
can fit into the mtu.  Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down.  Otherwise,
we just use the smaller segment size without changing it further.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Disallow new connection on a closing socket
Vlad Yasevich [Thu, 30 Jul 2009 22:08:28 +0000 (18:08 -0400)]
sctp: Disallow new connection on a closing socket

If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down.  If this was a result of a close() call, there will be no socket
and will cause a memory leak.  We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Fix piggybacked ACKs
Doug Graham [Wed, 29 Jul 2009 16:05:57 +0000 (12:05 -0400)]
sctp: Fix piggybacked ACKs

This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet.  The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
following paragraph from section 6.2 had not been implemented correctly:

   Before an endpoint transmits a DATA chunk, if any received DATA
   chunks have not been acknowledged (e.g., due to delayed ack), the
   sender should create a SACK and bundle it with the outbound DATA
   chunk, as long as the size of the final SCTP packet does not exceed
   the current MTU.  See Section 6.2.

When about to send a DATA chunk, the code now checks to see if the SACK
timer is running.  If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer.  For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request.  Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms).  This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.

Signed-off-by: Doug Graham <dgraham@nortel.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: remove unused union (sctp_cmsg_data_t) definition
Rami Rosen [Thu, 30 Jul 2009 06:38:43 +0000 (09:38 +0300)]
sctp: remove unused union (sctp_cmsg_data_t) definition

This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.

Signed-off-by: Rami Rosen <rosenrami@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: release cached route when the transport goes down.
Vlad Yasevich [Tue, 23 Jun 2009 15:28:05 +0000 (11:28 -0400)]
sctp: release cached route when the transport goes down.

When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything.  This way, if a better route
or source address is available, we'll try to use it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: update the route for non-active transports after addresses are added
Wei Yongjun [Tue, 16 Jun 2009 02:07:23 +0000 (10:07 +0800)]
sctp: update the route for non-active transports after addresses are added

Update the route and saddr entries for the non-active transports as some
of the added addresses can be used as better source addresses, or may
be there is a better route.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: check the unrecognized ASCONF parameter before access it
Wei Yongjun [Tue, 16 Jun 2009 06:48:24 +0000 (14:48 +0800)]
sctp: check the unrecognized ASCONF parameter before access it

This patch fix to check the unrecognized ASCONF parameter before
access it.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: avoid overwrite the return value of sctp_process_asconf_ack()
Wei Yongjun [Tue, 16 Jun 2009 06:47:30 +0000 (14:47 +0800)]
sctp: avoid overwrite the return value of sctp_process_asconf_ack()

The return value of sctp_process_asconf_ack() may be
overwritten while process parameters with no error.
This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agonet: Fix a build break because of a typo in drivers/net/3c503.c
Sachin Sant [Fri, 4 Sep 2009 10:41:07 +0000 (03:41 -0700)]
net: Fix a build break because of a typo in drivers/net/3c503.c

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocan: sja1000: legacy SJA1000 ISA bus driver
Wolfgang Grandegger [Tue, 1 Sep 2009 05:37:33 +0000 (05:37 +0000)]
can: sja1000: legacy SJA1000 ISA bus driver

This patch adds support for legacy SJA1000 CAN controllers on the ISA
or PC-104 bus. The I/O port or memory address and the IRQ number must
be specified via module parameters:

  insmod sja1000_isa.ko port=0x310,0x380 irq=7,11

for ISA devices using I/O ports or:

  insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11

for memory mapped ISA devices.

Indirect access via address and data port is supported as well:

  insmod sja1000_isa.ko port=0x310,0x380 indirect=1 irq=7,11

Here is a full list of the supported module parameters:

  port:I/O port number (array of ulong)
  mem:I/O memory address (array of ulong)
  indirect:Indirect access via address and data port (array of byte)
  irq:IRQ number (array of int)
  clk:External oscillator clock frequency (default=16000000 [16 MHz])
      (array of int)
  cdr:Clock divider register (default=0x48 [CDR_CBP | CDR_CLK_OFF])
      (array of byte)
  ocr:Output clock register (default=0x18 [OCR_TX0_PUSHPULL])
      (array of byte)

Note: for clk, cdr, ocr, the first argument re-defines the default
for all other devices, e.g.:

 insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 clk=24000000

is equivalent to

 insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 \
                       clk=24000000,24000000

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocan: sja1000: fix network statistics update
Wolfgang Grandegger [Tue, 1 Sep 2009 05:29:41 +0000 (05:29 +0000)]
can: sja1000: fix network statistics update

The member "tx_bytes" of "struct net_device_stats" should be
incremented when the interrupt is done and an "arbitration
lost error" is a TX error and the statistics should be updated
accordingly.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocan: add can_free_echo_skb() for upcoming drivers
Wolfgang Grandegger [Tue, 1 Sep 2009 05:26:12 +0000 (05:26 +0000)]
can: add can_free_echo_skb() for upcoming drivers

This patch adds the function can_free_echo_skb to the CAN
device interface to allow upcoming drivers to release echo
skb's in case of error.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoWAN: dscc4: Fix warning pointing out a bug.
David S. Miller [Fri, 4 Sep 2009 04:34:39 +0000 (21:34 -0700)]
WAN: dscc4: Fix warning pointing out a bug.

Noticed by Stephen Rothwell:

Today's linux-next build (x86_64 allmodconfig gcc-4.4.0)
produced this warning:

drivers/net/wan/dscc4.c: In function 'dscc4_rx_skb':
drivers/net/wan/dscc4.c:670: warning: suggest parentheses around comparison in operand of '|'

which actually points out a bug, I think.  It is doing
(x & (y | z)) != y | z
when it probably means
(x & (y | z)) != (y | z)

Introduced by commit 5de3fcab91b0e1809eec030355d15801daf25083
("WAN: bit and/or confusion").

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: Fix tcp_v6_send_response(): it didn't set skb transport header
Cosmin Ratiu [Fri, 4 Sep 2009 03:44:38 +0000 (20:44 -0700)]
ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header

Here is a patch which fixes an issue observed when using TCP over IPv6
and AH from IPsec.

When a connection gets closed the 4-way method and the last ACK from
the server gets dropped, the subsequent FINs from the client do not
get ACKed because tcp_v6_send_response does not set the transport
header pointer. This causes ah6_output to try to allocate a lot of
memory, which typically fails, so the ACKs never make it out of the
stack.

I have reproduced the problem on kernel 2.6.7, but after looking at
the latest kernel it seems the problem is still there.

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: organize device initialization/deinit into separate functions
Scott Feldman [Thu, 3 Sep 2009 17:02:45 +0000 (17:02 +0000)]
enic: organize device initialization/deinit into separate functions

To unclutter probe() a little bit, put all device initialization code
in one spot and device deinit code in another spot.  Also remove unused
rq->buf_index variable/func.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: bug fix: check for zero port MTU before posting warning
Scott Feldman [Thu, 3 Sep 2009 17:02:40 +0000 (17:02 +0000)]
enic: bug fix: check for zero port MTU before posting warning

Nic firmware can return zero for port MTU, so check for non-zero value
before checking for change in port MTU.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: changes to driver/firmware interface
Scott Feldman [Thu, 3 Sep 2009 17:02:35 +0000 (17:02 +0000)]
enic: changes to driver/firmware interface

Deprecate some old APIa; change arguments to stats dump all API; add new
interrupt assert API

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: bug fix: enable VLAN filtering
Scott Feldman [Thu, 3 Sep 2009 17:02:29 +0000 (17:02 +0000)]
enic: bug fix: enable VLAN filtering

Bug fix: enable VLAN filtering

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: provision for multiple Rx/Tx queues; prepare for RSS support
Scott Feldman [Thu, 3 Sep 2009 17:02:24 +0000 (17:02 +0000)]
enic: provision for multiple Rx/Tx queues; prepare for RSS support

Provision for multiple Rx/Tx queues.  Max of 8 WQs and 8 RQs.  Max for
completion queue is 8+8=16 and max for interrupt resources is 8+8+2.

Add driver/firmware interface for setting up RSS secret key and indirection
table.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: bug fix: included MAC drops in rx_dropped netstat
Scott Feldman [Thu, 3 Sep 2009 17:02:19 +0000 (17:02 +0000)]
enic: bug fix: included MAC drops in rx_dropped netstat

Bug fix: included MAC drops in rx_dropped netstat.  Also track Rx trunctations
stat at the MAC

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: bug fix: protect fw call i/f with spinlock
Scott Feldman [Thu, 3 Sep 2009 17:02:14 +0000 (17:02 +0000)]
enic: bug fix: protect fw call i/f with spinlock

Some driver -> nic firmware calls weren't guarded with a spinlock, exposing
the call i/f to a race between two threads

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: use netdev_alloc_skb
Scott Feldman [Thu, 3 Sep 2009 17:02:08 +0000 (17:02 +0000)]
enic: use netdev_alloc_skb

Use netdev_alloc_skb rather than dev_alloc_skb

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: bug fix: split TSO fragments larger than 16K into multiple descs
Scott Feldman [Thu, 3 Sep 2009 17:02:03 +0000 (17:02 +0000)]
enic: bug fix: split TSO fragments larger than 16K into multiple descs

enic WQ desc supports a maximum 16K buf size, so split any send fragments
larger than 16K into several descs.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: workaround A0 erratum
Scott Feldman [Thu, 3 Sep 2009 17:01:58 +0000 (17:01 +0000)]
enic: workaround A0 erratum

A0 revision ASIC has an erratum on the RQ desc cache on chip where the
cache can become corrupted causing pkt buf writes to wrong locations.  The s/w
workaround is to post a dummy RQ desc in the ring every 32 descs, causing a
flush of the cache.  A0 parts are not production, but there are enough of
these parts in the wild in test setups to warrant including workaround.  A1
revision ASIC parts fix erratum.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: add support for multiple BARs
Scott Feldman [Thu, 3 Sep 2009 17:01:53 +0000 (17:01 +0000)]
enic: add support for multiple BARs

Nic firmware can place resources (queues, intrs, etc) on multiple BARs, so
allow driver to discover/map resources beyond BAR0.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: adds drops accounting
Eric Dumazet [Thu, 3 Sep 2009 00:39:16 +0000 (00:39 +0000)]
vlan: adds drops accounting

Its hard to tell if vlans are dropping frames, since
every frame given to vlan_???_start_xmit() functions
is accounted as fully transmitted by lower device.

We can test dev_queue_xmit() return values to
properly account for dropped frames.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomacvlan: add multiqueue capability
Eric Dumazet [Thu, 3 Sep 2009 00:11:45 +0000 (00:11 +0000)]
macvlan: add multiqueue capability

macvlan devices are currently not multi-queue capable.

We can do that defining rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from lower device.

macvlan_get_tx_queues() is a copy of vlan_get_tx_queues().

Because macvlan_start_xmit() has to update netdev_queue
stats only (and not dev->stats), I chose to change
tx_errors/tx_aborted_errors accounting to tx_dropped,
since netdev_queue structure doesnt define tx_errors /
tx_aborted_errors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: Convert MDIO ioctl implementation to use struct mii_ioctl_data
Ben Hutchings [Thu, 3 Sep 2009 10:41:17 +0000 (10:41 +0000)]
netdev: Convert MDIO ioctl implementation to use struct mii_ioctl_data

A few drivers still access the arguments to MDIO ioctls as an array of
u16.  Convert them to use struct mii_ioctl_data.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: Remove redundant checks for CAP_NET_ADMIN in MDIO implementations
Ben Hutchings [Thu, 3 Sep 2009 10:39:43 +0000 (10:39 +0000)]
netdev: Remove redundant checks for CAP_NET_ADMIN in MDIO implementations

dev_ioctl() already checks capable(CAP_NET_ADMIN) before calling the
driver's implementation of MDIO ioctls.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: Remove SIOCDEVPRIVATE aliases for MDIO ioctls
Ben Hutchings [Thu, 3 Sep 2009 10:38:33 +0000 (10:38 +0000)]
netdev: Remove SIOCDEVPRIVATE aliases for MDIO ioctls

The standard MDIO ioctl numbers are well-established and these should
no longer be needed.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosky2: only enable Vaux if capable of wakeup
Stephen Hemminger [Thu, 3 Sep 2009 06:16:25 +0000 (06:16 +0000)]
sky2: only enable Vaux if capable of wakeup

While perusing vendor driver, I saw that it did not enable the Vaux
power unless device was able to wake from lan for D3cold.
This might help for Rene's power issue.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: fix infinite loop on dma mapping failure
Dhananjay Phadke [Thu, 3 Sep 2009 13:10:55 +0000 (13:10 +0000)]
netxen: fix infinite loop on dma mapping failure

Fix a perpetual while() loop in unwinding partial
mapped tx skb on dma mapping failure.

Reported-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: remove duplicate napi_add
Dhananjay Phadke [Thu, 3 Sep 2009 13:10:54 +0000 (13:10 +0000)]
netxen: remove duplicate napi_add

Remove duplicate calls to netxen_napi_add().

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: fix lro buffer allocation
Dhananjay Phadke [Thu, 3 Sep 2009 13:10:53 +0000 (13:10 +0000)]
netxen: fix lro buffer allocation

Alloc 12k skbuffs so that firmware can aggregate more
packets into one buffer. This doesn't raise memory
consumption since 9k skbs use 16k slab cache anyway.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Add support for using FCoE DDP in 82599 as FCoE targets
Yi Zou [Thu, 3 Sep 2009 14:56:31 +0000 (14:56 +0000)]
ixgbe: Add support for using FCoE DDP in 82599 as FCoE targets

The FCoE DDP in 82599 can be used for both FCoE initiator as well as FCoE
target, depending on the indication of the exchange being the responder or
originator in the F_CTL (frame control) field in the encapsulated Fiber
Channel frame header (T10 Spec., FC-FS). For the initiator, OX_ID is used
for FCoE DDP, where for the target RX_ID is used for FCoE DDP.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Distribute transmission of FCoE traffic in 82599
Yi Zou [Thu, 3 Sep 2009 14:56:10 +0000 (14:56 +0000)]
ixgbe: Distribute transmission of FCoE traffic in 82599

This adds a simple selection of a FCoE tx queue based on the current cpu id to
distribute transmission of FCoE traffic evenly among multiple FCoE transmit
queues.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Add support for multiple Tx queues for FCoE in 82599
Yi Zou [Thu, 3 Sep 2009 14:55:50 +0000 (14:55 +0000)]
ixgbe: Add support for multiple Tx queues for FCoE in 82599

This patch adds support for multiple transmit queues to the Fiber Channel
over Ethernet (FCoE) feature found in 82599. Currently, FCoE has multiple
Rx queues available, along with a redirection table, that helps distribute
the I/O load across multiple CPUs based on the FC exchange ID. To make
this the most effective, we need to provide the same layout of transmit
queues to match receive.

Particularly, when Data Center Bridging (DCB) is enabled, the designated
traffic class for FCoE can have dedicated queues for just FCoE traffic,
while not affecting any other type of traffic flow.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: set vf rlpml wasn't taking vlan tag into account
Alexander Duyck [Thu, 3 Sep 2009 14:49:33 +0000 (14:49 +0000)]
igb: set vf rlpml wasn't taking vlan tag into account

This patch updates things so that vlan tags are taken into account when
setting the receive large packet maximum length.  This allows the VF driver
to correctly receive full sized frames when vlans are enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: only disable/enable interrupt bits for igb physical function
Alexander Duyck [Thu, 3 Sep 2009 14:49:15 +0000 (14:49 +0000)]
igb: only disable/enable interrupt bits for igb physical function

The igb_irq_disable/enable calls were causing virtual functions associated
with the igb physical function to have their interrupts disabled.  In order
to prevent this from occuring we should only clear/set the bits related to
the physical function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: add support for set_rx_mode netdevice operation
Alexander Duyck [Thu, 3 Sep 2009 14:48:56 +0000 (14:48 +0000)]
igb: add support for set_rx_mode netdevice operation

This patch adds support for the set_rx_mode netdevice operation so that igb
can better support multiple unicast addresses.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Remove debugging code
Eric Dumazet [Thu, 3 Sep 2009 12:17:20 +0000 (05:17 -0700)]
net: Remove debugging code

Remove a debugging aid I accidently left in previous 'cleanup' patch

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: enable multiqueue xmits
Eric Dumazet [Thu, 3 Sep 2009 09:19:58 +0000 (02:19 -0700)]
vlan: enable multiqueue xmits

vlan_dev_hard_start_xmit() & vlan_dev_hwaccel_hard_start_xmit()
select txqueue number 0, instead of using index provided by
skb_get_queue_mapping().

This is not correct after commit 2e59af3dcbdf11635c03f
[vlan: multiqueue vlan device] because
txq->tx_packets  & txq->tx_bytes changes are performed on
a single location, and not the right locking.

Fix is to take the appropriate struct netdev_queue pointer

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: net/core/dev.c cleanups
Eric Dumazet [Thu, 3 Sep 2009 08:29:39 +0000 (01:29 -0700)]
net: net/core/dev.c cleanups

Pure style cleanup patch before surgery :)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoatm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() when we...
Karl Hiramoto [Thu, 3 Sep 2009 06:26:39 +0000 (23:26 -0700)]
atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() when we can send packets again.

This patch removes the call to dev_kfree_skb() when the atm device is busy.
Calling dev_kfree_skb() causes heavy packet loss then the device is under
heavy load, the more correct behavior should be to stop the upper layers,
then when the lower device can queue packets again wake the upper layers.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofec: don't enable irqs in hard irq context
Uwe Kleine-König [Tue, 1 Sep 2009 23:14:16 +0000 (23:14 +0000)]
fec: don't enable irqs in hard irq context

fec_enet_mii, fec_enet_rx and fec_enet_tx are both only called by
fec_enet_interrupt in interrupt context.  So they must not use
spin_lock_irq/spin_unlock_irq.

This fixes:
WARNING: at kernel/lockdep.c:2140 trace_hardirqs_on_caller+0x130/0x194()
...

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Matt Waddel <Matt.Waddel@freescale.com>
Cc: netdev@vger.kernel.org
Cc: Tim Sander <tim01@vlsi.informatik.tu-darmstadt.de>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofec: fix recursive locking of mii_lock
Uwe Kleine-König [Tue, 1 Sep 2009 23:14:15 +0000 (23:14 +0000)]
fec: fix recursive locking of mii_lock

mii_discover_phy is only called by fec_enet_mii (via mip->mii_func).  So
&fep->mii_lock is already held and mii_discover_phy must not call
mii_queue which locks &fep->mii_lock, too.

This was noticed by lockdep:

=============================================
[ INFO: possible recursive locking detected ]
2.6.31-rc8-00038-g37d0892 #109
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&fep->mii_lock){-.....}, at: [<c01569f8>] mii_queue+0x2c/0xcc

but task is already holding lock:
 (&fep->mii_lock){-.....}, at: [<c0156328>] fec_enet_interrupt+0x78/0x460

other info that might help us debug this:
2 locks held by swapper/1:
 #0:  (rtnl_mutex){+.+.+.}, at: [<c0183534>] rtnl_lock+0x18/0x20
 #1:  (&fep->mii_lock){-.....}, at: [<c0156328>] fec_enet_interrupt+0x78/0x460

stack backtrace:
Backtrace:
[<c00226fc>] (dump_backtrace+0x0/0x108) from [<c01eac14>] (dump_stack+0x18/0x1c)
 r6:c781d118 r5:c03e41d8 r4:00000001
[<c01eabfc>] (dump_stack+0x0/0x1c) from [<c005bae4>] (__lock_acquire+0x1a20/0x1a88)
[<c005a0c4>] (__lock_acquire+0x0/0x1a88) from [<c005bbac>] (lock_acquire+0x60/0x74)
[<c005bb4c>] (lock_acquire+0x0/0x74) from [<c01edda8>] (_spin_lock_irqsave+0x54/0x68)
 r7:60000093 r6:c01569f8 r5:c785e468 r4:00000000
[<c01edd54>] (_spin_lock_irqsave+0x0/0x68) from [<c01569f8>] (mii_queue+0x2c/0xcc)
 r7:c785e468 r6:c0156b24 r5:600a0000 r4:c785e000
[<c01569cc>] (mii_queue+0x0/0xcc) from [<c0156b78>] (mii_discover_phy+0x54/0xa8)
 r8:00000002 r7:00000032 r6:c785e000 r5:c785e360 r4:c785e000
[<c0156b24>] (mii_discover_phy+0x0/0xa8) from [<c0156354>] (fec_enet_interrupt+0xa4/0x460)
 r5:c785e360 r4:c077a170
[<c01562b0>] (fec_enet_interrupt+0x0/0x460) from [<c0066674>] (handle_IRQ_event+0x48/0x120)
[<c006662c>] (handle_IRQ_event+0x0/0x120) from [<c0068438>] (handle_level_irq+0x94/0x11c)
...

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Matt Waddel <Matt.Waddel@freescale.com>
Cc: netdev@vger.kernel.org
Cc: Tim Sander <tim01@vlsi.informatik.tu-darmstadt.de>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoNET: Fix possible corruption in bpqether driver
Ralf Baechle [Thu, 3 Sep 2009 06:09:29 +0000 (23:09 -0700)]
NET: Fix possible corruption in bpqether driver

The bpq ether driver is modifying the data art of the skb by first
dropping the KISS byte (a command byte for the radio) then prepending the
length + 4 of the remaining AX.25 packet to be transmitted as a little
endian 16-bit number.  If the high byte of the length has a different
value than the dropped KISS byte users of clones of the skb may observe
this as corruption.  This was observed with by running listen(8) -a which
uses a packet socket which clones transmit packets.  The corruption will
then typically be displayed for as a KISS "TX Delay" command for AX.25
packets in the range of 252..508 bytes or any other KISS command for
yet larger packets.

Fixed by using skb_cow to create a private copy should the skb be cloned.
Using skb_cow also allows us to cleanup the old logic to ensure sufficient
headroom in the skb.

While at it, replace a return of 0 from bpq_xmit with the proper constant
NETDEV_TX_OK which is now being used everywhere else in this function.

Affected: all 2.2, 2.4 and 2.6 kernels.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoWARNING: some request_irq() failures ignored in el2_open()
roel kluin [Tue, 1 Sep 2009 06:24:53 +0000 (06:24 +0000)]
WARNING: some request_irq() failures ignored in el2_open()

Request_irq() may fail in different ways, handle accordingly.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: replace hard coded GFP_KERNEL with sk_allocation
Wu Fengguang [Thu, 3 Sep 2009 06:45:45 +0000 (23:45 -0700)]
tcp: replace hard coded GFP_KERNEL with sk_allocation

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

mount_root => nfs_root_data => tcp_close => lock sk_lock =>
tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/ethtool: Add support for the ethtool feature to flash firmware image from a speci...
Ajit Khaparde [Wed, 2 Sep 2009 17:02:55 +0000 (17:02 +0000)]
net/ethtool: Add support for the ethtool feature to flash firmware image from a specified file.

This patch adds support to flash a firmware image to a device using ethtool.
The driver gets the filename of the firmware image and flashes the image
using the request firmware path.

The region "on the chip" to be flashed can be specified by an option.
It is upto the device driver to enumerate the region number passed by ethtool,
to the region to be flashed.

The default behavior is to flash all the regions on the chip.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers: Kill now superfluous ->last_rx stores
Eric Dumazet [Mon, 31 Aug 2009 06:34:50 +0000 (06:34 +0000)]
drivers: Kill now superfluous ->last_rx stores

The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip: Report qdisc packet drops
Eric Dumazet [Thu, 3 Sep 2009 01:05:33 +0000 (18:05 -0700)]
ip: Report qdisc packet drops

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048

send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: multiqueue vlan device
Eric Dumazet [Thu, 3 Sep 2009 01:03:00 +0000 (18:03 -0700)]
vlan: multiqueue vlan device

vlan devices are currently not multi-queue capable.

We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from real device.

register_vlan_device() is also handled.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: drop_monitor: make last_rx timestamp private
Neil Horman [Wed, 2 Sep 2009 21:37:45 +0000 (14:37 -0700)]
net: drop_monitor: make last_rx timestamp private

It was recently pointed out to me that the last_rx field of the
net_device structure wasn't updated regularly.  In fact only the
bonding driver really uses it currently.  Since the drop_monitor code
relies on the last_rx field to detect drops on recevie in hardware, We
need to find a more reliable way to rate limit our drop checks (so
that we don't check for drops on every frame recevied, which would be
inefficient.  This patch makes a last_rx timestamp that is private to
the drop monitor code and is updated for every device that we track.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Wed, 2 Sep 2009 21:18:09 +0000 (14:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6

14 years agoath9k: Reconfigure beacon timers after the scan is completed.
Vivek Natarajan [Wed, 2 Sep 2009 10:20:55 +0000 (15:50 +0530)]
ath9k: Reconfigure beacon timers after the scan is completed.

Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agocfg80211: fix looping soft lockup in find_ie()
Bob Copeland [Tue, 1 Sep 2009 22:12:11 +0000 (18:12 -0400)]
cfg80211: fix looping soft lockup in find_ie()

The find_ie() function uses a size_t for the len parameter, and
directly uses len as a loop variable.  If any received packets
are malformed, it is possible for the decrease of len to overflow,
and since the result is unsigned, the loop will not terminate.
Change it to a signed int so the loop conditional works for
negative values.

This fixes the following soft lockup:

[38573.102007] BUG: soft lockup - CPU#0 stuck for 61s! [phy0:2230]
[38573.102007] Modules linked in: aes_i586 aes_generic fuse af_packet ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter ip_tables x_tables acpi_cpufreq binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod kvm_intel kvm uinput i915 arc4 ecb drm snd_hda_codec_idt ath5k snd_hda_intel hid_apple mac80211 usbhid appletouch snd_hda_codec snd_pcm ath cfg80211 snd_timer i2c_algo_bit ohci1394 video snd processor ieee1394 rfkill ehci_hcd sg sky2 backlight snd_page_alloc uhci_hcd joydev output ac thermal button battery sr_mod applesmc cdrom input_polldev evdev unix [last unloaded: scsi_wait_scan]
[38573.102007] irq event stamp: 2547724535
[38573.102007] hardirqs last  enabled at (2547724534): [<c1002ffc>] restore_all_notrace+0x0/0x18
[38573.102007] hardirqs last disabled at (2547724535): [<c10038f4>] apic_timer_interrupt+0x28/0x34
[38573.102007] softirqs last  enabled at (92950144): [<c103ab48>] __do_softirq+0x108/0x210
[38573.102007] softirqs last disabled at (92950274): [<c1348e74>] _spin_lock_bh+0x14/0x80
[38573.102007]
[38573.102007] Pid: 2230, comm: phy0 Tainted: G        W  (2.6.31-rc7-wl #8) MacBook1,1
[38573.102007] EIP: 0060:[<f8ea2d50>] EFLAGS: 00010292 CPU: 0
[38573.102007] EIP is at cmp_ies+0x30/0x180 [cfg80211]
[38573.102007] EAX: 00000082 EBX: 00000000 ECX: ffffffc1 EDX: d8efd014
[38573.102007] ESI: ffffff7c EDI: 0000004d EBP: eee2dc50 ESP: eee2dc3c
[38573.102007]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[38573.102007] CR0: 8005003b CR2: d8efd014 CR3: 01694000 CR4: 000026d0
[38573.102007] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[38573.102007] DR6: ffff0ff0 DR7: 00000400
[38573.102007] Call Trace:
[38573.102007]  [<f8ea2f8d>] cmp_bss+0xed/0x100 [cfg80211]
[38573.102007]  [<f8ea33e4>] cfg80211_bss_update+0x84/0x410 [cfg80211]
[38573.102007]  [<f8ea3884>] cfg80211_inform_bss_frame+0x114/0x180 [cfg80211]
[38573.102007]  [<f97255ff>] ieee80211_bss_info_update+0x4f/0x180 [mac80211]
[38573.102007]  [<f972b118>] ieee80211_rx_bss_info+0x88/0xf0 [mac80211]
[38573.102007]  [<f9739297>] ? ieee802_11_parse_elems+0x27/0x30 [mac80211]
[38573.102007]  [<f972b224>] ieee80211_rx_mgmt_probe_resp+0xa4/0x1c0 [mac80211]
[38573.102007]  [<f972bc59>] ieee80211_sta_rx_queued_mgmt+0x919/0xc50 [mac80211]
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c105ffd0>] ? mark_held_locks+0x60/0x80
[38573.102007]  [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70
[38573.102007]  [<c134baa5>] ? sub_preempt_count+0x85/0xc0
[38573.102007]  [<c1348bce>] ? _spin_unlock_irqrestore+0x3e/0x70
[38573.102007]  [<c12c1c0f>] ? skb_dequeue+0x4f/0x70
[38573.102007]  [<f972c021>] ieee80211_sta_work+0x91/0xb80 [mac80211]
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c134baa5>] ? sub_preempt_count+0x85/0xc0
[38573.102007]  [<c10479af>] worker_thread+0x18f/0x320
[38573.102007]  [<c104794e>] ? worker_thread+0x12e/0x320
[38573.102007]  [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70
[38573.102007]  [<f972bf90>] ? ieee80211_sta_work+0x0/0xb80 [mac80211]
[38573.102007]  [<c104cbb0>] ? autoremove_wake_function+0x0/0x50
[38573.102007]  [<c1047820>] ? worker_thread+0x0/0x320
[38573.102007]  [<c104c854>] kthread+0x84/0x90
[38573.102007]  [<c104c7d0>] ? kthread+0x0/0x90
[38573.102007]  [<c1003ab7>] kernel_thread_helper+0x7/0x10

Cc: stable@kernel.org
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agowireless: remove mac80211 rate selection extra menu
Luis R. Rodriguez [Tue, 1 Sep 2009 15:22:46 +0000 (08:22 -0700)]
wireless: remove mac80211 rate selection extra menu

We can just display this upon enabling mac80211 with an
'if MAC80211 != n' check.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agowireless: update reg debug kconfig entry
Luis R. Rodriguez [Tue, 1 Sep 2009 15:22:43 +0000 (08:22 -0700)]
wireless: update reg debug kconfig entry

Refer to the wireless wiki for more information.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>