| #
57f94001 |
| 12-Jun-2026 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: add deprecation warnings for irc and pptp trackers
IRC Direct client-to-client requires plaintext. IRC over TLS should be preferred, making this helper ineffective. Add a dep
netfilter: conntrack: add deprecation warnings for irc and pptp trackers
IRC Direct client-to-client requires plaintext. IRC over TLS should be preferred, making this helper ineffective. Add a deprecation warning and update the help text to better reflect that this is needed for the DCC extension, not IRC itself.
PPTP is esoteric these days and it is the only helper that requires the destroy callback in the conntrack helper API.
Removal would simplify the conntrack core.
Both helpers are IPv4 only.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
d4349ba9 |
| 15-Apr-2026 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: allow nfnetlink built-in only
Netfilter has its own netlink multiplexer, initially only a few subsystem were using it, most notably conntrack, queue and log, later in time nf_tables. Thes
netfilter: allow nfnetlink built-in only
Netfilter has its own netlink multiplexer, initially only a few subsystem were using it, most notably conntrack, queue and log, later in time nf_tables. These days it is the control plane of preference.
Just remove modular support for this, allow it built-in only.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
403cec8a |
| 07-May-2026 |
Florian Westphal <fw@strlen.de> |
netfilter: add option for GCOV profiling
Similar to a few other subsystems: add a new config toggle to enable netfilter gcov profiling in netfilter, including ebtables, arptables and so on.
ipset a
netfilter: add option for GCOV profiling
Similar to a few other subsystems: add a new config toggle to enable netfilter gcov profiling in netfilter, including ebtables, arptables and so on.
ipset and ipvs gain their own, dedicated toggles.
Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
84dee05d |
| 30-Mar-2026 |
Fernando Fernandez Mancera <fmancera@suse.de> |
netfilter: conntrack: remove UDP-Lite conntrack support
UDP-Lite (RFC 3828) socket support was recently retired from the core networking stack. As a follow-up of that, drop the connection tracker an
netfilter: conntrack: remove UDP-Lite conntrack support
UDP-Lite (RFC 3828) socket support was recently retired from the core networking stack. As a follow-up of that, drop the connection tracker and NAT support for UDP-Lite in Netfilter.
This patch removes CONFIG_NF_CT_PROTO_UDPLITE and scrubs UDP-Lite awareness from the conntrack core, NAT core, nft_ct, and ctnetlink. Please note that stateless packet inspection, matching, ipsets or logging support for IPPROTO_UDPLITE is preserved.
As conntrack no longer extracts UDP-Lite ports or tracks its L4 state, when performing NAT the UDP-Lite checksum cannot be updated anymore. That is an expected and acceptable consequence of removing UDP-Lite conntrack module.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
309b905d |
| 25-Mar-2026 |
Fernando Fernandez Mancera <fmancera@suse.de> |
ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs
Maintaining a modular IPv6 stack offers image size savings for specific setups, this benefit is outweighed by the architectural burde
ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs
Maintaining a modular IPv6 stack offers image size savings for specific setups, this benefit is outweighed by the architectural burden it imposes on the subsystems on implementation and maintenance. Therefore, drop it.
Change CONFIG_IPV6 from tristate to bool. Remove all Kconfig dependencies across the tree that explicitly checked for IPV6=m. In addition, remove MODULE_DESCRIPTION(), MODULE_ALIAS(), MODULE_AUTHOR() and MODULE_LICENSE().
This is also replacing module_init() by device_initcall(). It is not possible to use fs_initcall() as IPv4 does because that creates a race condition on IPv6 addrconf.
Finally, modify the default configs from CONFIG_IPV6=m to CONFIG_IPV6=y except for m68k as according to the bloat-o-meter the image is increasing by 330KB~ and that isn't acceptable. Instead, disable IPv6 on this architecture by default. This is aligned with m68k RAM requirements and recommendations [1].
[1] http://www.linux-m68k.org/faq/ram.html
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # arm64 Link: https://patch.msgid.link/20260325120928.15848-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
9fce6658 |
| 30-Jun-2025 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: Exclude LEGACY TABLES on PREEMPT_RT.
The seqcount xt_recseq is used to synchronize the replacement of xt_table::private in xt_replace_table() against all readers such as ipt_do_table()
T
netfilter: Exclude LEGACY TABLES on PREEMPT_RT.
The seqcount xt_recseq is used to synchronize the replacement of xt_table::private in xt_replace_table() against all readers such as ipt_do_table()
To ensure that there is only one writer, the writing side disables bottom halves. The sequence counter can be acquired recursively. Only the first invocation modifies the sequence counter (signaling that a writer is in progress) while the following (recursive) writer does not modify the counter. The lack of a proper locking mechanism for the sequence counter can lead to live lock on PREEMPT_RT if the high prior reader preempts the writer. Additionally if the per-CPU lock on PREEMPT_RT is removed from local_bh_disable() then there is no synchronisation for the per-CPU sequence counter.
The affected code is "just" the legacy netfilter code which is replaced by "netfilter tables". That code can be disabled without sacrificing functionality because everything is provided by the newer implementation. This will only requires the usage of the "-nft" tools instead of the "-legacy" ones. The long term plan is to remove the legacy code so lets accelerate the progress.
Relax dependencies on iptables legacy, replace select with depends on, this should cause no harm to existing kernel configs and users can still toggle IP{6}_NF_IPTABLES_LEGACY in any case. Make EBTABLES_LEGACY, IPTABLES_LEGACY and ARPTABLES depend on NETFILTER_XTABLES_LEGACY. Hide xt_recseq and its users, xt_register_table() and xt_percpu_counter_alloc() behind NETFILTER_XTABLES_LEGACY. Let NETFILTER_XTABLES_LEGACY depend on !PREEMPT_RT.
This will break selftest expecing the legacy options enabled and will be addressed in a following patch.
Co-developed-by: Florian Westphal <fw@strlen.de> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
fd72f265 |
| 22-May-2025 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: conntrack: remove DCCP protocol support
The DCCP socket family has now been removed from this tree, see:
8bb3212be4b4 ("Merge branch 'net-retire-dccp-socket'")
Remove connection track
netfilter: conntrack: remove DCCP protocol support
The DCCP socket family has now been removed from this tree, see:
8bb3212be4b4 ("Merge branch 'net-retire-dccp-socket'")
Remove connection tracking and NAT support for this protocol, this should not pose a problem because no DCCP traffic is expected to be seen on the wire.
As for the code for matching on dccp header for iptables and nftables, mark it as deprecated and keep it in place. Ruleset restoration is an atomic operation. Without dccp matching support, an astray match on dccp could break this operation leaving your computer with no policy in place, so let's follow a more conservative approach for matches.
Add CONFIG_NFT_EXTHDR_DCCP which is set to 'n' by default to deprecate dccp extension support. Similarly, label CONFIG_NETFILTER_XT_MATCH_DCCP as deprecated too and also set it to 'n' by default.
Code to match on DCCP protocol from ebtables also remains in place, this is just a few checks on IPPROTO_DCCP from _check() path which is exercised when ruleset is loaded. There is another use of IPPROTO_DCCP from the _check() path in the iptables multiport match. Another check for IPPROTO_DCCP from the packet in the reject target is also removed.
So let's schedule removal of the dccp matching for a second stage, this should not interfer with the dccp retirement since this is only matching on the dccp header.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
99de9d40 |
| 19-May-2025 |
Eric Biggers <ebiggers@google.com> |
sctp: use skb_crc32c() instead of __skb_checksum()
Make sctp_compute_cksum() just use the new function skb_crc32c(), instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC3
sctp: use skb_crc32c() instead of __skb_checksum()
Make sctp_compute_cksum() just use the new function skb_crc32c(), instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C. This is faster and simpler.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
3ba0032a |
| 01-Apr-2025 |
Michal Koutný <mkoutny@suse.com> |
netfilter: xt_cgroup: Make it independent from net_cls
The xt_group matching supports the default hierarchy since commit c38c4597e4bf3 ("netfilter: implement xt_cgroup cgroup2 path match"). The cgro
netfilter: xt_cgroup: Make it independent from net_cls
The xt_group matching supports the default hierarchy since commit c38c4597e4bf3 ("netfilter: implement xt_cgroup cgroup2 path match"). The cgroup v1 matching (based on clsid) and cgroup v2 matching (based on path) are rather independent. Downgrade the Kconfig dependency to mere CONFIG_SOCK_GROUP_DATA so that xt_group can be built even without CONFIG_NET_CLS_CGROUP for path matching. Also add a message for users when they attempt to specify any clsid.
Link: https://lists.opensuse.org/archives/list/kernel@lists.opensuse.org/thread/S23NOILB7MUIRHSKPBOQKJHVSK26GP6X/ Cc: Jan Engelhardt <ej@inai.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
b261d222 |
| 02-Apr-2025 |
Eric Biggers <ebiggers@google.com> |
lib/crc: remove CONFIG_LIBCRC32C
Now that LIBCRC32C does nothing besides select CRC32, make every option that selects LIBCRC32C instead select CRC32 directly. Then remove LIBCRC32C.
Reviewed-by: C
lib/crc: remove CONFIG_LIBCRC32C
Now that LIBCRC32C does nothing besides select CRC32, make every option that selects LIBCRC32C instead select CRC32 directly. Then remove LIBCRC32C.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
| #
a9525c7f |
| 24-Jan-2024 |
Florian Westphal <fw@strlen.de> |
netfilter: xtables: allow xtables-nft only builds
Add hidden IP(6)_NF_IPTABLES_LEGACY symbol.
When any of the "old" builtin tables are enabled the "old" iptables interface will be supported.
To di
netfilter: xtables: allow xtables-nft only builds
Add hidden IP(6)_NF_IPTABLES_LEGACY symbol.
When any of the "old" builtin tables are enabled the "old" iptables interface will be supported.
To disable the old set/getsockopt interface the existing options for the builtin tables need to be turned off:
CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_FILTER is not set CONFIG_IP_NF_NAT is not set CONFIG_IP_NF_MANGLE is not set CONFIG_IP_NF_RAW is not set CONFIG_IP_NF_SECURITY is not set
Same for CONFIG_IP6_NF_ variants.
This allows to build a kernel that only supports ip(6)tables-nft (iptables-over-nftables api).
In the future the _LEGACY symbol will become visible and the select statements will be turned into 'depends on', but for now be on safe side so "make oldconfig" won't break things.
Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
84601d6e |
| 21-Apr-2023 |
Florian Westphal <fw@strlen.de> |
bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and
bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and not the actual bpf program.
Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.
Uapi example usage: union bpf_attr attr = { };
attr.link_create.prog_fd = progfd; attr.link_create.attach_type = 0; /* unused */ attr.link_create.netfilter.pf = PF_INET; attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; attr.link_create.netfilter.priority = -128;
err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
... this would attach progfd to ipv4:input hook.
Such hook gets removed automatically if the calling program exits.
BPF_NETFILTER program invocation is added in followup change.
NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it allows to tell userspace which program is attached at the given hook when user runs 'nft hook list' command rather than just the priority and not-very-helpful 'this hook runs a bpf prog but I can't tell which one'.
Will also be used to disallow registration of two bpf programs with same priority in a followup patch.
v4: arm32 cmpxchg only supports 32bit operand s/prio/priority/ v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
| #
bde7170a |
| 16-Mar-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: xtables: disable 32bit compat interface by default
This defaulted to 'y' because before this knob existed the 32bit compat layer was always compiled in if CONFIG_COMPAT was set.
32bit ip
netfilter: xtables: disable 32bit compat interface by default
This defaulted to 'y' because before this knob existed the 32bit compat layer was always compiled in if CONFIG_COMPAT was set.
32bit iptables on 64bit kernel isn't common anymore, so remove the default-y now.
Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
c0c3ab63 |
| 07-Feb-2023 |
Xin Long <lucien.xin@gmail.com> |
net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack
net: create nf_conntrack_ovs for ovs and tc use
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only.
There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
ebddb140 |
| 08-Dec-2022 |
Xin Long <lucien.xin@gmail.com> |
net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch c
net: move the nat function to nf_nat_ovs for ovs and tc
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC.
Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
d037abc2 |
| 21-Oct-2022 |
Florian Westphal <fw@strlen.de> |
netfilter: nft_objref: make it builtin
nft_objref is needed to reference named objects, it makes no sense to disable it.
Before: text data bss dec filename 4014 424 0
netfilter: nft_objref: make it builtin
nft_objref is needed to reference named objects, it makes no sense to disable it.
Before: text data bss dec filename 4014 424 0 4438 nft_objref.o 4174 1128 0 5302 nft_objref.ko 359351 15276 864 375491 nf_tables.ko After: text data bss dec filename 3815 408 0 4223 nft_objref.o 363161 15692 864 379717 nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
aa5762c3 |
| 15-Aug-2022 |
Geert Uytterhoeven <geert@linux-m68k.org> |
netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09 ("netfilter: provide config option to disable ancient procf
netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09 ("netfilter: provide config option to disable ancient procfs parts") in v3.3.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
| #
b06ada6d |
| 04-Aug-2022 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: flowtable: fix incorrect Kconfig dependencies
Remove default to 'y', this infrastructure is not fundamental for the flowtable operational.
Add a missing dependency on CONFIG_NF_FLOW_TABL
netfilter: flowtable: fix incorrect Kconfig dependencies
Remove default to 'y', this infrastructure is not fundamental for the flowtable operational.
Add a missing dependency on CONFIG_NF_FLOW_TABLE.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: b038177636f8 ("netfilter: nf_flow_table: count pending offload workqueue tasks") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
b0381776 |
| 15-Jun-2022 |
Vlad Buslov <vladbu@nvidia.com> |
netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incr
netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate.
Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
023223df |
| 17-Dec-2021 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: nf_tables: make counter support built-in
Make counter support built-in to allow for direct call in case of CONFIG_RETPOLINE.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| #
bdfa75ad |
| 22-Oct-2021 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.
With a build fix from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| #
42df6e1d |
| 08-Oct-2021 |
Lukas Wunner <lukas@wunner.de> |
netfilter: Introduce egress hook
Support classifying packets with netfilter on egress to satisfy user requirements such as: * outbound security policies for containers (Laura) * filtering and mangli
netfilter: Introduce egress hook
Support classifying packets with netfilter on egress to satisfy user requirements such as: * outbound security policies for containers (Laura) * filtering and mangling intra-node Direct Server Return (DSR) traffic on a load balancer (Laura) * filtering locally generated traffic coming in through AF_PACKET, such as local ARP traffic generated for clustering purposes or DHCP (Laura; the AF_PACKET plumbing is contained in a follow-up commit) * L2 filtering from ingress and egress for AVB (Audio Video Bridging) and gPTP with nftables (Pablo) * in the future: in-kernel NAT64/NAT46 (Pablo)
The egress hook introduced herein complements the ingress hook added by commit e687ad60af09 ("netfilter: add netfilter ingress hook after handle_ing() under unique static key"). A patch for nftables to hook up egress rules from user space has been submitted separately, so users may immediately take advantage of the feature.
Alternatively or in addition to netfilter, packets can be classified with traffic control (tc). On ingress, packets are classified first by tc, then by netfilter. On egress, the order is reversed for symmetry. Conceptually, tc and netfilter can be thought of as layers, with netfilter layered above tc.
Traffic control is capable of redirecting packets to another interface (man 8 tc-mirred). E.g., an ingress packet may be redirected from the host namespace to a container via a veth connection: tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
In this case, netfilter egress classifying is not performed when leaving the host namespace! That's because the packet is still on the tc layer. If tc redirects the packet to a physical interface in the host namespace such that it leaves the system, the packet is never subjected to netfilter egress classifying. That is only logical since it hasn't passed through netfilter ingress classifying either.
Packets can alternatively be redirected at the netfilter layer using nft fwd. Such a packet *is* subjected to netfilter egress classifying since it has reached the netfilter layer.
Internally, the skb->nf_skip_egress flag controls whether netfilter is invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may be called recursively by tunnel drivers such as vxlan, the flag is reverted to false after sch_handle_egress(). This ensures that netfilter is applied both on the overlay and underlying network.
Interaction between tc and netfilter is possible by setting and querying skb->mark.
If netfilter egress classifying is not enabled on any interface, it is patched out of the data path by way of a static_key and doesn't make a performance difference that is discernible from noise:
Before: 1537 1538 1538 1537 1538 1537 Mb/sec After: 1536 1534 1539 1539 1539 1540 Mb/sec Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec
When netfilter egress classifying is enabled on at least one interface, a minimal performance penalty is incurred for every egress packet, even if the interface it's transmitted over doesn't have any netfilter egress rules configured. That is caused by checking dev->nf_hooks_egress against NULL.
Measurements were performed on a Core i7-3615QM. Commands to reproduce: ip link add dev foo type dummy ip link set dev foo up modprobe pktgen echo "add_device foo" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
Accept all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
Drop all traffic with tc: tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'
Apply this patch when measuring packet drops to avoid errors in dmesg: https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/
Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Laura García Liébana <nevola@gmail.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
77076934 |
| 05-Oct-2021 |
Vegard Nossum <vegard.nossum@gmail.com> |
netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
This option, NF_CONNTRACK_SECMARK, is a bool, so it can never be 'm'.
Fixes: 33b8e77605620 ("[NETFILTER]: Add CONFIG_NETFIL
netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
This option, NF_CONNTRACK_SECMARK, is a bool, so it can never be 'm'.
Fixes: 33b8e77605620 ("[NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
d4fb1f95 |
| 08-Jun-2021 |
Florian Westphal <fw@strlen.de> |
netfilter: nfnetlink_hook: add depends-on nftables
nfnetlink_hook.c: In function 'nfnl_hook_put_nft_chain_info': nfnetlink_hook.c:76:7: error: implicit declaration of 'nft_is_active'
This macro is
netfilter: nfnetlink_hook: add depends-on nftables
nfnetlink_hook.c: In function 'nfnl_hook_put_nft_chain_info': nfnetlink_hook.c:76:7: error: implicit declaration of 'nft_is_active'
This macro is only defined when NF_TABLES is enabled. While its possible to also add an ifdef-guard, the infrastructure is currently not useful without nf_tables.
Reported-by: kernel test robot <lkp@intel.com> Fixes: 252956528caa ("netfilter: add new hook nfnl subsystem") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
| #
e2cf17d3 |
| 04-Jun-2021 |
Florian Westphal <fw@strlen.de> |
netfilter: add new hook nfnl subsystem
This nfnl subsystem allows to dump the list of all active netfiler hooks, e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
This helps to see what kind o
netfilter: add new hook nfnl subsystem
This nfnl subsystem allows to dump the list of all active netfiler hooks, e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
This helps to see what kind of features are currently enabled in the network stack.
Sample output from nft tool using this infra:
$ nft list hook ip input family ip hook input { +0000000010 nft_do_chain_inet [nf_tables] # nft table firewalld INPUT +0000000100 nf_nat_ipv4_local_in [nf_nat] +2147483647 ipv4_confirm [nf_conntrack] }
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|