Revision tags: v3.18-rc4 |
|
#
1d76c1d0 |
| 05-Nov-2014 |
David S. Miller <davem@davemloft.net> |
Merge branch 'gue-next'
Tom Herbert says:
==================== gue: Remote checksum offload
This patch set implements remote checksum offload for GUE, which is a mechanism that provides checksum o
Merge branch 'gue-next'
Tom Herbert says:
==================== gue: Remote checksum offload
This patch set implements remote checksum offload for GUE, which is a mechanism that provides checksum offload of encapsulated packets using rudimentary offload capabilities found in most Network Interface Card (NIC) devices. The outer header checksum for UDP is enabled in packets and, with some additional meta information in the GUE header, a receiver is able to deduce the checksum to be set for an inner encapsulated packet. Effectively this offloads the computation of the inner checksum. Enabling the outer checksum in encapsulation has the additional advantage that it covers more of the packet than the inner checksum including the encapsulation headers.
Remote checksum offload is described in: http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01
The GUE transmit and receive paths are modified to support the remote checksum offload option. The option contains a checksum offset and checksum start which are directly derived from values set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the operation is to calculate the packet checksum from "start" to end of the packet (normally derived for checksum complete), and then set the resultant value at checksum "offset" (the checksum field has already been primed with the pseudo header). This emulates a NIC that implements NETIF_F_HW_CSUM.
The primary purpose of this feature is to eliminate cost of performing checksum calculation over a packet when encpasulating.
In this patch set: - Move fou_build_header into fou.c and split it into a couple of functions - Enable offloading of outer UDP checksum in encapsulation - Change udp_offload to support remote checksum offload, includes new GSO type and ensuring encapsulated layers (TCP) doesn't try to set a checksum covered by RCO - TX support for RCO with GUE. This is configured through ip_tunnel and set the option on transmit when packet being encapsulated is CHECKSUM_PARTIAL - RX support for RCO with GUE for normal and GRO paths. Includes resolving the offloaded checksum
v2: Address comments from davem: Move accounting for private option field in gue_encap_hlen to patch in which we add the remote checksum offload option.
Testing:
I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200 streams, comparing GUE with and without remote checksum offload (doing checksum-unnecessary to complete conversion in both cases). These were run on mlnx4 and bnx2x. Some mlnx4 results are below.
GRE/GUE TCP_STREAM IPv4, with remote checksum offload 9.71% TX CPU utilization 7.42% RX CPU utilization 36380 Mbps IPv4, without remote checksum offload 12.40% TX CPU utilization 7.36% RX CPU utilization 36591 Mbps TCP_RR IPv4, with remote checksum offload 77.79% CPU utilization 91/144/216 90/95/99% latencies 1.95127e+06 tps IPv4, without remote checksum offload 78.70% CPU utilization 89/152/297 90/95/99% latencies 1.95458e+06 tps
IPIP/GUE TCP_STREAM With remote checksum offload 10.30% TX CPU utilization 7.43% RX CPU utilization 36486 Mbps Without remote checksum offload 12.47% TX CPU utilization 7.49% RX CPU utilization 36694 Mbps TCP_RR With remote checksum offload 77.80% CPU utilization 87/153/270 90/95/99% latencies 1.98735e+06 tps Without remote checksum offload 77.98% CPU utilization 87/150/287 90/95/99% latencies 1.98737e+06 tps
SIT/GUE TCP_STREAM With remote checksum offload 9.68% TX CPU utilization 7.36% RX CPU utilization 35971 Mbps Without remote checksum offload 12.95% TX CPU utilization 8.04% RX CPU utilization 36177 Mbps TCP_RR With remote checksum offload 79.32% CPU utilization 94/158/295 90/95/99% latencies 1.88842e+06 tps Without remote checksum offload 80.23% CPU utilization 94/149/226 90/95/99% latencies 1.90338e+06 tps
VXLAN TCP_STREAM 35.03% TX CPU utilization 20.85% RX CPU utilization 36230 Mbps TCP_RR 77.36% CPU utilization 84/146/270 90/95/99% latencies 2.08063e+06 tps
We can also look at CPU time in csum_partial using perf (with bnx2x setup). For GRE with TCP_STREAM I see:
With remote checksum offload 0.33% TX 1.81% RX Without remote checksum offload 6.00% TX 0.51% RX
I suspect the fact that time in csum_partial noticably increases with remote checksum offload for RX is due to taking the cache miss on the encapsulated header in that function. By similar reasoning, if on the TX side the packet were not in cache (say we did a splice from a file whose data was never touched by the CPU) the CPU savings for TX would probably be more pronounced. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c1aa8347 |
| 04-Nov-2014 |
Tom Herbert <therbert@google.com> |
gue: Protocol constants for remote checksum offload
Define a private flag for remote checksun offload as well as a length for the option.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off
gue: Protocol constants for remote checksum offload
Define a private flag for remote checksun offload as well as a length for the option.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5024c33a |
| 04-Nov-2014 |
Tom Herbert <therbert@google.com> |
gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags, private flags, and control messages. This includes definitions to compute length of o
gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags, private flags, and control messages. This includes definitions to compute length of optional fields corresponding to a set of flags. Flag validation is in validate_gue_flags function. This checks for unknown flags, and that length of optional fields is <= length in guehdr hlen.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.18-rc3 |
|
#
d5432503 |
| 27-Oct-2014 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.18
A few small driver fixes for v3.18 plus the removal of the s6000 suppo
Merge tag 'asoc-v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.18
A few small driver fixes for v3.18 plus the removal of the s6000 support since the relevant chip is no longer supported in mainline.
show more ...
|
Revision tags: v3.18-rc2 |
|
#
93035286 |
| 22-Oct-2014 |
Takashi Iwai <tiwai@suse.de> |
Merge branch 'topic/enum-info-cleanup' into for-next
this is a series of patches to just convert the plain info callback for enum ctl elements to snd_ctl_elem_info(). Also, it includes the extensio
Merge branch 'topic/enum-info-cleanup' into for-next
this is a series of patches to just convert the plain info callback for enum ctl elements to snd_ctl_elem_info(). Also, it includes the extension of snd_ctl_elem_info(), for catching the unexpected string cut-off and handling the zero items.
show more ...
|
#
1b62f19c |
| 21-Oct-2014 |
Mauro Carvalho Chehab <mchehab@osg.samsung.com> |
Merge tag 'v3.18-rc1' into v4l_for_linus
Linux 3.18-rc1
* tag 'v3.18-rc1': (9167 commits) Linux 3.18-rc1 MAINTAINERS: corrected bcm2835 search Net: DSA: Fix checking for get_phy_flags functio
Merge tag 'v3.18-rc1' into v4l_for_linus
Linux 3.18-rc1
* tag 'v3.18-rc1': (9167 commits) Linux 3.18-rc1 MAINTAINERS: corrected bcm2835 search Net: DSA: Fix checking for get_phy_flags function sparc64: Do not define thread fpregs save area as zero-length array. sparc64: Fix corrupted thread fault code. MAINTAINERS: Become the docs maintainer x86,kvm,vmx: Preserve CR4 across VM entry ipv6: fix a potential use after free in sit.c ipv6: fix a potential use after free in ip6_offload.c ipv4: fix a potential use after free in gre_offload.c tcp: fix build error if IPv6 is not enabled futex: Ensure get_futex_key_refs() always implies a barrier bna: fix skb->truesize underestimation net: dsa: add includes for ethtool and phy_fixed definitions openvswitch: Set flow-key members. netrom: use linux/uaccess.h dsa: Fix conversion from host device to mii bus tipc: fix bug in bundled buffer reception ipv6: introduce tcp_v6_iif() sfc: add support for skb->xmit_more ...
show more ...
|
#
1ef24960 |
| 21-Oct-2014 |
Mauro Carvalho Chehab <mchehab@osg.samsung.com> |
Merge tag 'v3.18-rc1' into patchwork
Linux 3.18-rc1
* tag 'v3.18-rc1': (9526 commits) Linux 3.18-rc1 MAINTAINERS: corrected bcm2835 search Net: DSA: Fix checking for get_phy_flags function
Merge tag 'v3.18-rc1' into patchwork
Linux 3.18-rc1
* tag 'v3.18-rc1': (9526 commits) Linux 3.18-rc1 MAINTAINERS: corrected bcm2835 search Net: DSA: Fix checking for get_phy_flags function sparc64: Do not define thread fpregs save area as zero-length array. sparc64: Fix corrupted thread fault code. MAINTAINERS: Become the docs maintainer x86,kvm,vmx: Preserve CR4 across VM entry ipv6: fix a potential use after free in sit.c ipv6: fix a potential use after free in ip6_offload.c ipv4: fix a potential use after free in gre_offload.c tcp: fix build error if IPv6 is not enabled futex: Ensure get_futex_key_refs() always implies a barrier bna: fix skb->truesize underestimation net: dsa: add includes for ethtool and phy_fixed definitions openvswitch: Set flow-key members. netrom: use linux/uaccess.h dsa: Fix conversion from host device to mii bus tipc: fix bug in bundled buffer reception ipv6: introduce tcp_v6_iif() sfc: add support for skb->xmit_more ...
show more ...
|
#
a13926db |
| 21-Oct-2014 |
Chris Zankel <chris@zankel.net> |
Merge tag 'v3.18-rc1' into for_next
Linux 3.18-rc1
|
Revision tags: v3.18-rc1 |
|
#
35a9ad8a |
| 09-Oct-2014 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Most notable changes in here:
1) By far the biggest accomplishment, thanks to a la
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Most notable changes in here:
1) By far the biggest accomplishment, thanks to a large range of contributors, is the addition of multi-send for transmit. This is the result of discussions back in Chicago, and the hard work of several individuals.
Now, when the ->ndo_start_xmit() method of a driver sees skb->xmit_more as true, it can choose to defer the doorbell telling the driver to start processing the new TX queue entires.
skb->xmit_more means that the generic networking is guaranteed to call the driver immediately with another SKB to send.
There is logic added to the qdisc layer to dequeue multiple packets at a time, and the handling mis-predicted offloads in software is now done with no locks held.
Finally, pktgen is extended to have a "burst" parameter that can be used to test a multi-send implementation.
Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4, virtio_net
Adding support is almost trivial, so export more drivers to support this optimization soon.
I want to thank, in no particular or implied order, Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann, David Tat, Hannes Frederic Sowa, and Rusty Russell.
2) PTP and timestamping support in bnx2x, from Michal Kalderon.
3) Allow adjusting the rx_copybreak threshold for a driver via ethtool, and add rx_copybreak support to enic driver. From Govindarajulu Varadarajan.
4) Significant enhancements to the generic PHY layer and the bcm7xxx driver in particular (EEE support, auto power down, etc.) from Florian Fainelli.
5) Allow raw buffers to be used for flow dissection, allowing drivers to determine the optimal "linear pull" size for devices that DMA into pools of pages. The objective is to get exactly the necessary amount of headers into the linear SKB area pre-pulled, but no more. The new interface drivers use is eth_get_headlen(). From WANG Cong, with driver conversions (several had their own by-hand duplicated implementations) by Alexander Duyck and Eric Dumazet.
6) Support checksumming more smoothly and efficiently for encapsulations, and add "foo over UDP" facility. From Tom Herbert.
7) Add Broadcom SF2 switch driver to DSA layer, from Florian Fainelli.
8) eBPF now can load programs via a system call and has an extensive testsuite. Alexei Starovoitov and Daniel Borkmann.
9) Major overhaul of the packet scheduler to use RCU in several major areas such as the classifiers and rate estimators. From John Fastabend.
10) Add driver for Intel FM10000 Ethernet Switch, from Alexander Duyck.
11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric Dumazet.
12) Add Datacenter TCP congestion control algorithm support, From Florian Westphal.
13) Reorganize sk_buff so that __copy_skb_header() is significantly faster. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits) netlabel: directly return netlbl_unlabel_genl_init() net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers net: description of dma_cookie cause make xmldocs warning cxgb4: clean up a type issue cxgb4: potential shift wrapping bug i40e: skb->xmit_more support net: fs_enet: Add NAPI TX net: fs_enet: Remove non NAPI RX r8169:add support for RTL8168EP net_sched: copy exts->type in tcf_exts_change() wimax: convert printk to pr_foo() af_unix: remove 0 assignment on static ipv6: Do not warn for informational ICMP messages, regardless of type. Update Intel Ethernet Driver maintainers list bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING tipc: fix bug in multicast congestion handling net: better IFF_XMIT_DST_RELEASE support net/mlx4_en: remove NETDEV_TX_BUSY 3c59x: fix bad split of cpu_to_le32(pci_map_single()) net: bcmgenet: fix Tx ring priority programming ...
show more ...
|
Revision tags: v3.17 |
|
#
6106253e |
| 04-Oct-2014 |
David S. Miller <davem@davemloft.net> |
Merge branch 'gudp'
Tom Herbert says:
==================== net: Generic UDP Encapsulation
Generic UDP Encapsulation (GUE) is UDP encapsulation protocol which encapsulates packets of various IP pro
Merge branch 'gudp'
Tom Herbert says:
==================== net: Generic UDP Encapsulation
Generic UDP Encapsulation (GUE) is UDP encapsulation protocol which encapsulates packets of various IP protocols. The GUE protocol is described in http://tools.ietf.org/html/draft-herbert-gue-01.
The receive path of GUE is implemented in the FOU over UDP module (FOU). This includes a UDP encap receive function for GUE as well as GUE specific GRO functions. Management and configuration of GUE ports shares most of the same code with FOU.
For the transmit path, the previous FOU support for IPIP, sit, and GRE was simply extended for GUE (when GUE is enabled insert the GUE header on transmit in addition to UDP header inserted for FOU).
Semantically GUE is the same as FOU in that the encapsulation (UDP and GUE headers) that are inserted on transmission and removed on reception so that IP packet is processed with the inner header.
This patch set includes: - Some fixes to FOU, removal of IPv4,v6 specific GRO functions - Support to configure a GUE receive port - Implementation of GUE receive path (normal and GRO) - Additions to ip_tunnel netlink to configure GUE - GUE header inserion in ip_tunnel transmit path
v2: - Include net/gue.h in patch set
Testing:
I ran performance numbers using netperf TCP_RR with 200 streams, comparing encapsulation without GUE, encapsulation with GUE, and encapsulation with FOU.
GRE TCP_STREAM IPv4, FOU, UDP checksum enabled 14.04% TX CPU utilization 13.17% RX CPU utilization 9211 Mbps IPv4, GUE, UDP checksum enabled 14.99% TX CPU utilization 13.79% RX CPU utilization 9185 Mbps IPv4, FOU, UDP checksum disabled 13.14% TX CPU utilization 23.18% RX CPU utilization 9277 Mbps IPv4, GUE, UDP checksum disabled 13.66% TX CPU utilization 23.57% RX CPU utilization 9184 Mbps TCP_RR IPv4, FOU, UDP checksum enabled 94.2% CPU utilization 155/249/460 90/95/99% latencies 1.17018e+06 tps IPv4, GUE, UDP checksum enabled 93.9% CPU utilization 158/253/472 90/95/99% latencies 1.15045e+06 tps
IPIP TCP_STREAM FOU, UDP checksum enabled 15.28% TX CPU utilization 13.92% RX CPU utilization 9342 Mbps GUE, UDP checksum enabled 13.99% TX CPU utilization 13.34% RX CPU utilization 9210 Mbps FOU, UDP checksum disabled 15.08% TX CPU utilization 24.64% RX CPU utilization 9226 Mbps GUE, UDP checksum disabled 15.90% TX CPU utilization 24.77% RX CPU utilization 9197 Mbps TCP_RR FOU, UDP checksum enabled 94.23% CPU utilization 149/237/429 90/95/99% latencies 1.19553e+06 tps GUE, UDP checksum enabled 93.75% CPU utilization 152/243/442 90/95/99% latencies 1.17027e+06 tps
SIT TCP_STREAM FOU, UDP checksum enabled 14.47% TX CPU utilization 14.58% RX CPU utilization 9106 Mbps GUE, UDP checksum enabled 15.09% TX CPU utilization 14.84% RX CPU utilization 9080 Mbps FOU, UDP checksum disabled 15.70% TX CPU utilization 27.93% RX CPU utilization 9097 Mbps GUE, UDP checksum disabled 15.04% TX CPU utilization 27.54% RX CPU utilization 9073 Mbps TCP_RR FOU, UDP checksum enabled 96.9% CPU utilization 170/281/581 90/95/99% latencies 1.03372e+06 tps GUE, UDP checksum enabled 97.16% CPU utilization 172/286/576 90/95/99% latencies 1.00469e+06 tps ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
37dd0247 |
| 04-Oct-2014 |
Tom Herbert <therbert@google.com> |
gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The fou module now supports direct foo-over-udp (no encapsulation header) and GUE
gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The fou module now supports direct foo-over-udp (no encapsulation header) and GUE. To support this a type parameter is added to the fou netlink parameters.
For a GUE socket we define gue_udp_recv, gue_gro_receive, and gue_gro_complete to handle the specifics of the GUE protocol. Most of the code to manage and configure sockets is common with the fou.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|