#
f80acbd2 |
| 25-May-2018 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-task-fd-query'
Yonghong Song says:
==================== Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bp
Merge branch 'bpf-task-fd-query'
Yonghong Song says:
==================== Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system.
There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, this command will return bpf related information to user space. Right now it only supports tracepoint/kprobe/uprobe perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id.
Patch #1 adds function perf_get_event() in kernel/events/core.c. Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY. Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query() in the libbpf library for samples/selftests/bpftool to use. Patch #4 adds ksym_get_addr() utility function. Patch #5 add a test in samples/bpf for querying k[ret]probes and u[ret]probes. Patch #6 add a test in tools/testing/selftests/bpf for querying raw_tracepoint and tracepoint. Patch #7 add a new subcommand "perf" to bpftool.
Changelogs: v4 -> v5: . return strlen(buf) instead of strlen(buf) + 1 in the attr.buf_len. As long as user provides non-empty buffer, it will be filed with empty string, truncated string, or full string based on the buffer size and the length of to-be-copied string. v3 -> v4: . made attr buf_len input/output. The length of actual buffter is written to buf_len so user space knows what is actually needed. If user provides a buffer with length >= 1 but less than required, do partial copy and return -ENOSPC. . code simplification with put_user. . changed query result attach_info to fd_type. . add tests at selftests/bpf to test zero len, null buf and insufficient buf. v2 -> v3: . made perf_get_event() return perf_event pointer const. this was to ensure that event fields are not meddled. . detect whether newly BPF_TASK_FD_QUERY is supported or not in "bpftool perf" and warn users if it is not. v1 -> v2: . changed bpf subcommand name from BPF_PERF_EVENT_QUERY to BPF_TASK_FD_QUERY. . fixed various "bpftool perf" issues and added documentation and auto-completion. ====================
Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
30687ad9 |
| 24-May-2018 |
Yonghong Song <yhs@fb.com> |
tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpf
Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and implement bpf_task_fd_query() in libbpf. The test programs in s
tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpf
Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and implement bpf_task_fd_query() in libbpf. The test programs in samples/bpf and tools/testing/selftests/bpf, and later bpftool will use this libbpf function to query kernel.
Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
75445134 |
| 24-May-2018 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v4.17-rc6' into next
Sync up with mainline to bring in Atmel controller changes for Caroline.
|
#
ff4fb475 |
| 23-May-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'btf-uapi-cleanups'
Martin KaFai Lau says:
==================== This patch set makes some changes to cleanup the unused bits in BTF uapi. It also makes the btf_header extensible.
Ple
Merge branch 'btf-uapi-cleanups'
Martin KaFai Lau says:
==================== This patch set makes some changes to cleanup the unused bits in BTF uapi. It also makes the btf_header extensible.
Please see individual patches for details.
v2: - Remove NR_SECS from patch 2 - Remove "unsigned" check on array->index_type from patch 3 - Remove BTF_INT_VARARGS and further limit BTF_INT_ENCODING from 8 bits to 4 bits in patch 4 - Adjustments in test_btf.c to reflect changes in v2 ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
61746dbe |
| 23-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add tests for the btf uapi changes
This patch does the followings: 1. Modify libbpf and test_btf to reflect the uapi changes in btf 2. Add test for the btf_header changes 3. Add tests for
bpf: btf: Add tests for the btf uapi changes
This patch does the followings: 1. Modify libbpf and test_btf to reflect the uapi changes in btf 2. Add test for the btf_header changes 3. Add tests for array->index_type 4. Add err_str check to the tests 5. Fix a 4 bytes hole in "struct test #1" by swapping "m" and "n"
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
dd8070bf |
| 23-May-2018 |
Johannes Berg <johannes.berg@intel.com> |
Merge remote-tracking branch 'net-next/master' into mac80211-next
Bring in net-next which had pulled in net, so I have the changes from mac80211 and can apply a patch that would otherwise conflict.
Merge remote-tracking branch 'net-next/master' into mac80211-next
Bring in net-next which had pulled in net, so I have the changes from mac80211 and can apply a patch that would otherwise conflict.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
show more ...
|
Revision tags: v4.17-rc6 |
|
#
b9f672af |
| 17-May-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for yo
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Provide a new BPF helper for doing a FIB and neighbor lookup in the kernel tables from an XDP or tc BPF program. The helper provides a fast-path for forwarding packets. The API supports IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are implemented in this initial work, from David (Ahern).
2) Just a tiny diff but huge feature enabled for nfp driver by extending the BPF offload beyond a pure host processing offload. Offloaded XDP programs are allowed to set the RX queue index and thus opening the door for defining a fully programmable RSS/n-tuple filter replacement. Once BPF decided on a queue already, the device data-path will skip the conventional RSS processing completely, from Jakub.
3) The original sockmap implementation was array based similar to devmap. However unlike devmap where an ifindex has a 1:1 mapping into the map there are use cases with sockets that need to be referenced using longer keys. Hence, sockhash map is added reusing as much of the sockmap code as possible, from John.
4) Introduce BTF ID. The ID is allocatd through an IDR similar as with BPF maps and progs. It also makes BTF accessible to user space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data through BPF_OBJ_GET_INFO_BY_FD, from Martin.
5) Enable BPF stackmap with build_id also in NMI context. Due to the up_read() of current->mm->mmap_sem build_id cannot be parsed. This work defers the up_read() via a per-cpu irq_work so that at least limited support can be enabled, from Song.
6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND JIT conversion as well as implementation of an optimized 32/64 bit immediate load in the arm64 JIT that allows to reduce the number of emitted instructions; in case of tested real-world programs they were shrinking by three percent, from Daniel.
7) Add ifindex parameter to the libbpf loader in order to enable BPF offload support. Right now only iproute2 can load offloaded BPF and this will also enable libbpf for direct integration into other applications, from David (Beckett).
8) Convert the plain text documentation under Documentation/bpf/ into RST format since this is the appropriate standard the kernel is moving to for all documentation. Also add an overview README.rst, from Jesper.
9) Add __printf verification attribute to the bpf_verifier_vlog() helper. Though it uses va_list we can still allow gcc to check the format string, from Mathieu.
10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...' is a bash 4.0+ feature which is not guaranteed to be available when calling out to shell, therefore use a more portable variant, from Joe.
11) Fix a 64 bit division in xdp_umem_reg() by using div_u64() instead of relying on the gcc built-in, from Björn.
12) Fix a sock hashmap kmalloc warning reported by syzbot when an overly large key size is used in hashmap then causing overflows in htab->elem_size. Reject bogus attr->key_size early in the sock_hash_alloc(), from Yonghong.
13) Ensure in BPF selftests when urandom_read is being linked that --build-id is always enabled so that test_stacktrace_build_id[_nmi] won't be failing, from Alexei.
14) Add bitsperlong.h as well as errno.h uapi headers into the tools header infrastructure which point to one of the arch specific uapi headers. This was needed in order to fix a build error on some systems for the BPF selftests, from Sirio.
15) Allow for short options to be used in the xdp_monitor BPF sample code. And also a bpf.h tools uapi header sync in order to fix a selftest build failure. Both from Prashant.
16) More formally clarify the meaning of ID in the direct packet access section of the BPF documentation, from Wang. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f0307a7e |
| 16-May-2018 |
David Beckett <david.beckett@netronome.com> |
libbpf: add ifindex to enable offload support
BPF programs currently can only be offloaded using iproute2. This patch will allow programs to be offloaded using libbpf calls.
Signed-off-by: David Be
libbpf: add ifindex to enable offload support
BPF programs currently can only be offloaded using iproute2. This patch will allow programs to be offloaded using libbpf calls.
Signed-off-by: David Beckett <david.beckett@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17-rc5 |
|
#
bba95255 |
| 13-May-2018 |
Zhi Wang <zhi.a.wang@intel.com> |
Merge branch 'drm-intel-next-queued' into gvt-next
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
|
#
94cc2fde |
| 11-May-2018 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
drm-misc-next is still based on v4.16-rc7, and was getting a bit stale.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.inte
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
drm-misc-next is still based on v4.16-rc7, and was getting a bit stale.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
show more ...
|
#
a1d1f079 |
| 09-May-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-btf-id'
Martin KaFai Lau says:
==================== This series introduces BTF ID which is exposed through the new BPF_BTF_GET_FD_BY_ID cmd, new "struct bpf_btf_info" and new memb
Merge branch 'bpf-btf-id'
Martin KaFai Lau says:
==================== This series introduces BTF ID which is exposed through the new BPF_BTF_GET_FD_BY_ID cmd, new "struct bpf_btf_info" and new members in the "struct bpf_map_info".
Please see individual patch for details. ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17-rc4 |
|
#
cd8b8928 |
| 04-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID
This patch adds test for BPF_BTF_GET_FD_BY_ID and the new btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info".
It also mo
bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID
This patch adds test for BPF_BTF_GET_FD_BY_ID and the new btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info".
It also modifies the existing BPF_OBJ_GET_INFO_BY_FD test to reflect the new "struct bpf_btf_info".
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
53f071e1 |
| 02-May-2018 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next-queued
Need d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API") in dinq to be able to fix https://bugs.f
Merge drm/drm-next into drm-intel-next-queued
Need d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API") in dinq to be able to fix https://bugs.freedesktop.org/show_bug.cgi?id=106085.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
#
552c69b3 |
| 02-May-2018 |
John Johansen <john.johansen@canonical.com> |
Merge tag 'v4.17-rc3' into apparmor-next
Linux v4.17-rc3
Merge in v4.17 for LSM updates
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
Revision tags: v4.17-rc3 |
|
#
8cad95f5 |
| 24-Apr-2018 |
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> |
Merge tag 'v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into fbdev-for-next
Linux 4.17-rc2
|
#
b393a707 |
| 23-Apr-2018 |
James Morris <james.morris@microsoft.com> |
Merge tag 'v4.17-rc2' into next-general
Sync to Linux 4.17-rc2 for developers.
|
Revision tags: v4.17-rc2 |
|
#
1b80f86e |
| 21-Apr-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-04-21
The following pull-request contains BPF updates for yo
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-04-21
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Initial work on BPF Type Format (BTF) is added, which is a meta data format which describes the data types of BPF programs / maps. BTF has its roots from CTF (Compact C-Type format) with a number of changes to it. First use case is to provide a generic pretty print capability for BPF maps inspection, later work will also add BTF to bpftool. pahole support to convert dwarf to BTF will be upstreamed as well (https://github.com/iamkafai/pahole/tree/btf), from Martin.
2) Add a new xdp_bpf_adjust_tail() BPF helper for XDP that allows for changing the data_end pointer. Only shrinking is currently supported which helps for crafting ICMP control messages. Minor changes in drivers have been added where needed so they recalc the packet's length also when data_end was adjusted, from Nikita.
3) Improve bpftool to make it easier to feed hex bytes via cmdline for map operations, from Quentin.
4) Add support for various missing BPF prog types and attach types that have been added to kernel recently but neither to bpftool nor libbpf yet. Doc and bash completion updates have been added as well for bpftool, from Andrey.
5) Proper fix for avoiding to leak info stored in frame data on page reuse for the two bpf_xdp_adjust_{head,meta} helpers by disallowing to move the pointers into struct xdp_frame area, from Jesper.
6) Follow-up compile fix from BTF in order to include stdbool.h in libbpf, from Björn.
7) Few fixes in BPF sample code, that is, a typo on the netdevice in a comment and fixup proper dump of XDP action code in the tracepoint exception, from Wang and Jesper. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
878a4d32 |
| 20-Apr-2018 |
Björn Töpel <bjorn.topel@intel.com> |
libbpf: fixed build error for samples/bpf/
Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not include stdbool.h, so GCC complained when building samples/bpf/.
In file included from
libbpf: fixed build error for samples/bpf/
Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not include stdbool.h, so GCC complained when building samples/bpf/.
In file included from /home/btopel/src/ext/linux/samples/bpf/libbpf.h:6:0, from /home/btopel/src/ext/linux/samples/bpf/test_lru_dist.c:24: /home/btopel/src/ext/linux/tools/lib/bpf/bpf.h:105:4: error: unknown type name ‘bool’; did you mean ‘_Bool’? bool do_log); ^~~~ _Bool
Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
34df9d37 |
| 19-Apr-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-type-format'
Martin KaFai Lau says:
==================== This patch introduces BPF Type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes the data types
Merge branch 'bpf-type-format'
Martin KaFai Lau says:
==================== This patch introduces BPF Type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes the data types of BPF program/map. Hence, it basically focus on the C programming language which the modern BPF is primary using. The first use case is to provide a generic pretty print capability for a BPF map.
A modified pahole that can convert dwarf to BTF is here:
https://github.com/iamkafai/pahole/tree/btf
Please see individual patch for details.
v5: - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not currently used. They can be added in the future. Some bpf_df_xxx() are removed together. - Add comment in patch 7 to clarify that the new bpffs_map_fops should not be extended further.
v4: - Fix warning (remove unneeded semicolon) - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in patch 1. Caught by W=1.
v3: - Rebase to bpf-next - Fix sparse warning (by adding static) - Add BTF header logging: btf_verifier_log_hdr() - Fix the alignment test on btf->type_off - Add tests for the BTF header - Lower the max BTF size to 16MB. It should be enough for some time. We could raise it later if it would be needed.
v2: - Use kvfree where needed in patch 1 and 2 - Also consider BTF_INT_OFFSET() in the btf_int_check_meta() in patch 1 - Fix an incorrect goto target in map_create() during the btf-error-path in patch 7 - re-org some local vars to keep the rev xmas tree in btf.c ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
8a138aed |
| 19-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add BTF support to libbpf
If the ".BTF" elf section exists, libbpf will try to create a btf_fd (through BPF_BTF_LOAD). If that fails, it will still continue loading the bpf prog/map witho
bpf: btf: Add BTF support to libbpf
If the ".BTF" elf section exists, libbpf will try to create a btf_fd (through BPF_BTF_LOAD). If that fails, it will still continue loading the bpf prog/map without the BTF.
If the bpf_object has a BTF loaded, it will create a map with the btf_fd. libbpf will try to figure out the btf_key_id and btf_value_id of a map by finding the BTF type with name "<map_name>_key" and "<map_name>_value". If they cannot be found, it will continue without using the BTF.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
30596ec3 |
| 17-Apr-2018 |
Zhenyu Wang <zhenyuw@linux.intel.com> |
Back merge 'drm-intel-fixes' into gvt-fixes
Need for 4.17-rc1
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
Revision tags: v4.17-rc1 |
|
#
664b0bae |
| 05-Apr-2018 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 4.17 merge window.
|
#
5bb053be |
| 03-Apr-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNA
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.
2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased.
3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff.
4) Support zerocopy on RDS sockets, from Sowmini Varadhan.
5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann.
6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier.
7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel.
8) Add packet forwarding tests to selftests, from Ido Schimmel.
9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet.
10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.
11) Add path MTU tests to selftests, from Stefano Brivio.
12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed.
13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree.
14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations.
15) In-kernel receive TLS support, from Dave Watson.
16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen.
17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al.
18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren.
19) Support XDP redirect in i40e driver, from Björn Töpel.
20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov.
21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan.
22) Use rhashtable for inet frag reassembly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
show more ...
|
Revision tags: v4.16 |
|
#
d4069fe6 |
| 01-Apr-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-03-31
The following pull-request contains BPF updates for yo
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-03-31
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei.
2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey.
3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John.
4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong.
5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri.
6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin.
7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita.
8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant.
9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua.
10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7828f20e |
| 31-Mar-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-cgroup-bind-connect'
Andrey Ignatov says:
==================== v2->v3: - rebase due to conflicts - fix ipv6=m build
v1->v2: - support expected_attach_type at prog load time so th
Merge branch 'bpf-cgroup-bind-connect'
Andrey Ignatov says:
==================== v2->v3: - rebase due to conflicts - fix ipv6=m build
v1->v2: - support expected_attach_type at prog load time so that prog (incl. context accesses and calls to helpers) can be validated with regard to specific attach point it is supposed to be attached to. Later, at attach time, attach type is checked so that it must be same as at load time if it was provided - reworked hooks to rely on expected_attach_type, and reduced number of new prog types from 6 to just 1: BPF_PROG_TYPE_CGROUP_SOCK_ADDR - reused BPF_PROG_TYPE_CGROUP_SOCK for sys_bind post-hooks - add selftests for post-sys_bind hook
For our container management we've been using complicated and fragile setup consisting of LD_PRELOAD wrapper intercepting bind and connect calls from all containerized applications. Unfortunately it doesn't work for apps that don't use glibc and changing all applications that run in the datacenter is not possible due to 3rd party code and libraries (despite being open source code) and sheer amount of legacy code that has to be rewritten (we're rewriting what we can in parallel)
These applications are written without containers in mind and have builtin assumptions about network services. Like an application X expects to connect localhost:special_port and find service Y in there. To move application X and service Y into two different containers LD_PRELOAD approach is used to help one service connect to another without rewriting them. Moving these two applications into different L2 (netns) or L3 (vrf) network isolation scopes doesn't help to solve the problem, since applications need to see each other like they were running on the host without containers. So if app X and app Y would run in different netns something would need to punch a connectivity hole in those namespaces. That would be real layering violation (with corresponding network debugging pains), since clean l2, l3 abstraction would suddenly support something that breaks through the layers.
Instead we used LD_PRELOAD (and now bpf programs) at bind/connect time to help applications discover and connect to each other. All applications are running in init_nens and there are no vrfs. After bind/connect the normal fib/neighbor core networking logic works as it should always do and the whole system is clean from network point of view and can be debugged with standard tools.
We also considered resurrecting Hannes's afnetns work, but all hierarchical namespace abstraction don't work due to these builtin networking assumptions inside the apps. To run an application inside cgroup container that was not written with containers in mind we have to make an illusion of running in non-containerized environment. In some cases we remember the port and container id in the post-bind hook in a bpf map and when some other task in a different container is trying to connect to a service we need to know where this service is running. It can be remote and can be local. Both client and service may or may not be written with containers in mind and this sockaddr rewrite is providing connectivity and load balancing feature.
BPF+cgroup looks to be the best solution for this problem. Hence we introduce 3 hooks: - at entry into sys_bind and sys_connect to let bpf prog look and modify 'struct sockaddr' provided by user space and fail bind/connect when appropriate - post sys_bind after port is allocated
The approach works great and has zero overhead for anyone who doesn't use it and very low overhead when deployed.
Different use case for this feature is to do low overhead firewall that doesn't need to inspect all packets and works at bind/connect time. ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|