#
a15702d8 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Detect dead SCC.
When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC.
If both conditions are true for every v
af_unix: Detect dead SCC.
When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC.
If both conditions are true for every vertex in SCC, we can execute garbage collection for all skb in the SCC.
The actual garbage collection is done in the following patch, replacing the old implementation.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
bfdb0128 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Assign a unique index to SCC.
The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful
af_unix: Assign a unique index to SCC.
The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge.
If we start traversing from A in the following graph, the final lowlink of D is 3. The cross-edge here is one between D and C.
A -> B -> D D = (4, 3) (index, lowlink) ^ | | C = (3, 1) | V | B = (2, 1) `--- C <--' A = (1, 1)
This is because the lowlink of D is updated with the index of C.
In the following patch, we detect a dead SCC by checking two conditions for each vertex.
1) vertex has no edge directed to another SCC (no bridge) 2) vertex's out_degree is the same as the refcount of its file
If 1) is false, there is a receiver of all fds of the SCC and its ancestor SCC.
To evaluate 1), we need to assign a unique index to each SCC and assign it to all vertices in the SCC.
This patch changes the lowlink update logic for cross-edge so that in the example above, the lowlink of D is updated with the lowlink of C.
A -> B -> D D = (4, 1) (index, lowlink) ^ | | C = (3, 1) | V | B = (2, 1) `--- C <--' A = (1, 1)
Then, all vertices in the same SCC have the same lowlink, and we can quickly find the bridge connecting to different SCC if exists.
However, it is no longer called lowlink, so we rename it to scc_index. (It's sometimes called lowpoint.)
Also, we add a global variable to hold the last index used in DFS so that we do not reset the initial index in each DFS.
This patch can be squashed to the SCC detection patch but is split deliberately for anyone wondering why lowlink is not used as used in the original Tarjan's algorithm and many reference implementations.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ad081928 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Avoid Tarjan's algorithm if unnecessary.
Once a cyclic reference is formed, we need to run GC to check if there is dead SCC.
However, we do not need to run Tarjan's algorithm if we know th
af_unix: Avoid Tarjan's algorithm if unnecessary.
Once a cyclic reference is formed, we need to run GC to check if there is dead SCC.
However, we do not need to run Tarjan's algorithm if we know that the shape of the inflight graph has not been changed.
If an edge is added/updated/deleted and the edge's successor is inflight, we set false to unix_graph_grouped, which means we need to re-classify SCC.
Once we finalise SCC, we set true to unix_graph_grouped.
While unix_graph_grouped is true, we can iterate the grouped SCC using vertex->scc_entry in unix_walk_scc_fast().
list_add() and list_for_each_entry_reverse() uses seem weird, but they are to keep the vertex order consistent and make writing test easier.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
77e5593a |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Skip GC if no cycle exists.
We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC.
If a fd of an AF_UNIX socket is
af_unix: Skip GC if no cycle exists.
We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC.
If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX socket, they could form a cyclic reference. Then, we set true to unix_graph_maybe_cyclic and later run Tarjan's algorithm to group them into SCC.
Once we run Tarjan's algorithm, we are 100% sure whether cyclic references exist or not. If there is no cycle, we set false to unix_graph_maybe_cyclic and can skip the entire garbage collection next time.
When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC consists of multiple vertices.
Even if SCC is a single vertex, a cycle might exist as self-fd passing. Given the corner case is rare, we detect it by checking all edges of the vertex and set true to unix_graph_maybe_cyclic.
With this change, __unix_gc() is just a spin_lock() dance in the normal usage.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ba31b4a4 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Save O(n) setup of Tarjan's algo.
Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and usi
af_unix: Save O(n) setup of Tarjan's algo.
Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and using two variables.
The first time we link a vertex to unix_unvisited_vertices, we set unix_vertex_unvisited_index to index.
During DFS, we can see that the index of unvisited vertices is the same as unix_vertex_unvisited_index.
When we finalise SCC later, we set unix_vertex_grouped_index to each vertex's index.
Then, we can know (i) that the vertex is on the stack if the index of a visited vertex is >= 2 and (ii) that it is not on the stack and belongs to a different SCC if the index is unix_vertex_grouped_index.
After the whole algorithm, all indices of vertices are set as unix_vertex_grouped_index.
Next time we start DFS, we know that all unvisited vertices have unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index as the not-on-stack marker.
To use the same variable in __unix_walk_scc(), we can swap unix_vertex_(grouped|unvisited)_index at the end of Tarjan's algorithm.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
dcf70df2 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Fix up unix_edge.successor for embryo socket.
To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of e
af_unix: Fix up unix_edge.successor for embryo socket.
To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of embryo sockets.
Suppose that the fd of AF_UNIX socket A is passed to D and the fd B to C and that C and D are embryo sockets of A and B, respectively. It may appear that there are two separate graphs, A (-> D) and B (-> C), but this is not correct.
A --. .-- B X C <-' `-> D
Now, D holds A's refcount, and C has B's refcount, so unix_release() will never be called for A and B when we close() them. However, no one can call close() for D and C to free skbs holding refcounts of A and B because C/D is in A/B's receive queue, which should have been purged by unix_release() for A and B.
So, here's another type of cyclic reference. When a fd of an AF_UNIX socket is passed to an embryo socket, the reference is indirectly held by its parent listening socket.
.-> A .-> B | `- sk_receive_queue | `- sk_receive_queue | `- skb | `- skb | `- sk == C | `- sk == D | `- sk_receive_queue | `- sk_receive_queue | `- skb +---------' `- skb +-. | | `---------------------------------------------------------'
Technically, the graph must be denoted as A <-> B instead of A (-> D) and B (-> C) to find such a cyclic reference without touching each socket's receive queue.
.-> A --. .-- B <-. | X | == A <-> B `-- C <-' `-> D --'
We apply this fixup during GC by fetching the real successor by unix_edge_successor().
When we call accept(), we clear unix_sock.listener under unix_gc_lock not to confuse GC.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3484f063 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Detect Strongly Connected Components.
In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references.
The algorithm visit
af_unix: Detect Strongly Connected Components.
In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references.
The algorithm visits every vertex exactly once using depth-first search (DFS).
DFS starts by pushing an input vertex to a stack and assigning it a unique number. Two fields, index and lowlink, are initialised with the number, but lowlink could be updated later during DFS.
If a vertex has an edge to an unvisited inflight vertex, we visit it and do the same processing. So, we will have vertices in the stack in the order they appear and number them consecutively in the same order.
If a vertex has a back-edge to a visited vertex in the stack, we update the predecessor's lowlink with the successor's index.
After iterating edges from the vertex, we check if its index equals its lowlink.
If the lowlink is different from the index, it shows there was a back-edge. Then, we go backtracking and propagate the lowlink to its predecessor and resume the previous edge iteration from the next edge.
If the lowlink is the same as the index, we pop vertices before and including the vertex from the stack. Then, the set of vertices is SCC, possibly forming a cycle. At the same time, we move the vertices to unix_visited_vertices.
When we finish the algorithm, all vertices in each SCC will be linked via unix_vertex.scc_entry.
Let's take an example. We have a graph including five inflight vertices (F is not inflight):
A -> B -> C -> D -> E (-> F) ^ | `---------'
Suppose that we start DFS from C. We will visit C, D, and B first and initialise their index and lowlink. Then, the stack looks like this:
> B = (3, 3) (index, lowlink) D = (2, 2) C = (1, 1)
When checking B's edge to C, we update B's lowlink with C's index and propagate it to D.
B = (3, 1) (index, lowlink) > D = (2, 1) C = (1, 1)
Next, we visit E, which has no edge to an inflight vertex.
> E = (4, 4) (index, lowlink) B = (3, 1) D = (2, 1) C = (1, 1)
When we leave from E, its index and lowlink are the same, so we pop E from the stack as single-vertex SCC. Next, we leave from B and D but do nothing because their lowlink are different from their index.
B = (3, 1) (index, lowlink) D = (2, 1) > C = (1, 1)
Then, we leave from C, whose index and lowlink are the same, so we pop B, D and C as SCC.
Last, we do DFS for the rest of vertices, A, which is also a single-vertex SCC.
Finally, each unix_vertex.scc_entry is linked as follows:
A -. B -> C -> D E -. ^ | ^ | ^ | `--' `---------' `--'
We use SCC later to decide whether we can garbage-collect the sockets.
Note that we still cannot detect SCC properly if an edge points to an embryo socket. The following two patches will sort it out.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
6ba76fd2 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Iterate all vertices by DFS.
The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once.
Here, we implement the DFS
af_unix: Iterate all vertices by DFS.
The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once.
Here, we implement the DFS part without recursion so that no one can abuse it.
unix_walk_scc() marks every vertex unvisited by initialising index as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in unix_unvisited_vertices and call __unix_walk_scc() to start DFS from an arbitrary vertex.
__unix_walk_scc() iterates all edges starting from the vertex and explores the neighbour vertices with DFS using edge_stack.
After visiting all neighbours, __unix_walk_scc() moves the visited vertex to unix_visited_vertices so that unix_walk_scc() will not restart DFS from the visited vertex.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
22c3c0c5 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX
af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX sockets on the host, and user->unix_inflight is the number of inflight fds per user.
We update them one by one in unix_inflight(), which can be done once in batch. Also, sendmsg() could fail even after unix_inflight(), then we need to acquire unix_gc_lock only to decrement the counters.
Let's bulk update the counters in unix_add_edges() and unix_del_edges(), which is called only for successfully passed fds.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
42f298c0 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Link struct unix_edge when queuing skb.
Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct uni
af_unix: Link struct unix_edge when queuing skb.
Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct unix_edge in UNIXCB(skb).fp.
Then, we call unix_add_edges() and construct the directed graph as follows:
1. Set the inflight socket's unix_sock to unix_edge.predecessor. 2. Set the receiver's unix_sock to unix_edge.successor. 3. Set the preallocated vertex to inflight socket's unix_sock.vertex. 4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices. 5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges.
Let's say we pass the fd of AF_UNIX socket A to B and the fd of B to C. The graph looks like this:
+-------------------------+ | unix_unvisited_vertices | <-------------------------. +-------------------------+ | + | | +--------------+ +--------------+ | +--------------+ | | unix_sock A | <---. .---> | unix_sock B | <-|-. .---> | unix_sock C | | +--------------+ | | +--------------+ | | | +--------------+ | .-+ | vertex | | | .-+ | vertex | | | | | vertex | | | +--------------+ | | | +--------------+ | | | +--------------+ | | | | | | | | | | +--------------+ | | | +--------------+ | | | | '-> | unix_vertex | | | '-> | unix_vertex | | | | | +--------------+ | | +--------------+ | | | `---> | entry | +---------> | entry | +-' | | |--------------| | | |--------------| | | | edges | <-. | | | edges | <-. | | +--------------+ | | | +--------------+ | | | | | | | | | .----------------------' | | .----------------------' | | | | | | | | | +--------------+ | | | +--------------+ | | | | unix_edge | | | | | unix_edge | | | | +--------------+ | | | +--------------+ | | `-> | vertex_entry | | | `-> | vertex_entry | | | |--------------| | | |--------------| | | | predecessor | +---' | | predecessor | +---' | |--------------| | |--------------| | | successor | +-----' | successor | +-----' +--------------+ +--------------+
Henceforth, we denote such a graph as A -> B (-> C).
Now, we can express all inflight fd graphs that do not contain embryo sockets. We will support the particular case later.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
29b64e35 |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
As with the previous patch, we preallocate to skb's scm_fp_list an array of struct unix_edge in the number of inflight AF_UNIX fds.
af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.
As with the previous patch, we preallocate to skb's scm_fp_list an array of struct unix_edge in the number of inflight AF_UNIX fds.
There we just preallocate memory and do not use immediately because sendmsg() could fail after this point. The actual use will be in the next patch.
When we queue skb with inflight edges, we will set the inflight socket's unix_sock as unix_edge->predecessor and the receiver's unix_sock as successor, and then we will link the edge to the inflight socket's unix_vertex.edges.
Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup() so that MSG_PEEK does not change the shape of the directed graph.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
1fbfdfaa |
| 25-Mar-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.
We will replace the garbage collection algorithm for AF_UNIX, where we will consider each inflight AF_UNIX socket as a vertex and i
af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.
We will replace the garbage collection algorithm for AF_UNIX, where we will consider each inflight AF_UNIX socket as a vertex and its file descriptor as an edge in a directed graph.
This patch introduces a new struct unix_vertex representing a vertex in the graph and adds its pointer to struct unix_sock.
When we send a fd using the SCM_RIGHTS message, we allocate struct scm_fp_list to struct scm_cookie in scm_fp_copy(). Then, we bump each refcount of the inflight fds' struct file and save them in scm_fp_list.fp.
After that, unix_attach_fds() inexplicably clones scm_fp_list of scm_cookie and sets it to skb. (We will remove this part after replacing GC.)
Here, we add a new function call in unix_attach_fds() to preallocate struct unix_vertex per inflight AF_UNIX fd and link each vertex to skb's scm_fp_list.vertices.
When sendmsg() succeeds later, if the socket of the inflight fd is still not inflight yet, we will set the preallocated vertex to struct unix_sock.vertex and link it to a global list unix_unvisited_vertices under spin_lock(&unix_gc_lock).
If the socket is already inflight, we free the preallocated vertex. This is to avoid taking the lock unnecessarily when sendmsg() could fail later.
In the following patch, we will similarly allocate another struct per edge, which will finally be linked to the inflight socket's unix_vertex.edges.
And then, we will count the number of edges as unix_vertex.out_degree.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
b7e1e969 |
| 26-Mar-2024 |
Takashi Iwai <tiwai@suse.de> |
Merge branch 'topic/sound-devel-6.10' into for-next
|
Revision tags: v6.8, v6.8-rc7 |
|
#
06d07429 |
| 29-Feb-2024 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync to get the drm_printer changes to drm-intel-next.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
d0331aa9 |
| 14-Apr-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into perf/core, to pick up perf/urgent fixes
Pick up perf/urgent fixes that are upstream already, but not yet in the perf/core development branch.
Signed-off-by: Ingo Molnar <m
Merge branch 'linus' into perf/core, to pick up perf/urgent fixes
Pick up perf/urgent fixes that are upstream already, but not yet in the perf/core development branch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
f4566a1e |
| 25-Mar-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
52afb15e |
| 25-Apr-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and bluetooth.
Nothing ma
Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and bluetooth.
Nothing major, regression fixes are mostly in drivers, two more of those are flowing towards us thru various trees. I wish some of the changes went into -rc5, we'll try to keep an eye on frequency of PRs from sub-trees.
Also disproportional number of fixes for bugs added in v6.4, strange coincidence.
Current release - regressions:
- igc: fix LED-related deadlock on driver unbind
- wifi: mac80211: small fixes to recent clean up of the connection process
- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel doesn't have all the code to deal with that version, yet
- Bluetooth: - set power_ctrl_enabled on NULL returned by gpiod_get_optional() - qca: fix invalid device address check, again
- eth: ravb: fix registered interrupt names
Current release - new code bugs:
- wifi: mac80211: check EHT/TTLM action frame length
Previous releases - regressions:
- fix sk_memory_allocated_{add|sub} for architectures where __this_cpu_{add|sub}* are not IRQ-safe
- dsa: mv88e6xx: fix link setup for 88E6250
Previous releases - always broken:
- ip: validate dev returned from __in_dev_get_rcu(), prevent possible null-derefs in a few places
- switch number of for_each_rcu() loops using call_rcu() on the iterator to for_each_safe()
- macsec: fix isolation of broadcast traffic in presence of offload
- vxlan: drop packets from invalid source address
- eth: mlxsw: trap and ACL programming fixes
- eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
- Bluetooth: - lots of fixes for the command submission rework from v6.4 - qca: fix NULL-deref on non-serdev suspend
Misc:
- tools: ynl: don't ignore errors in NLMSG_DONE messages"
* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). net: b44: set pause params only when interface is up tls: fix lockless read of strp->msg_ready in ->poll dpll: fix dpll_pin_on_pin_register() for multiple parent pins net: ravb: Fix registered interrupt names octeontx2-af: fix the double free in rvu_npc_freemem() net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets ice: fix LAG and VF lock dependency in ice_reset_vf() iavf: Fix TC config comparison with existing adapter TC config i40e: Report MFS in decimal base instead of hex i40e: Do not use WQ_MEM_RECLAIM flag for workqueue net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns() net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst ethernet: Add helper for assigning packet type when dest address does not match device address macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads net: phy: dp83869: Fix MII mode failure netfilter: nf_tables: honor table dormant flag from netdev release event path eth: bnxt: fix counting packets discarded due to OOM and netpoll igc: Fix LED-related deadlock on driver unbind ...
show more ...
|
#
1971d13f |
| 24-Apr-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock().
One is called from recvmsg() for a conne
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock().
One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket.
So, the splat is false-positive.
Let's add a dedicated lock class for the latter to suppress the splat.
Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl.
[0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock);
*** DEADLOCK ***
3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK>
Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
2ae9a897 |
| 11-Apr-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth.
Current release - new code bugs:
Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth.
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State"
* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) net: ena: Set tx_info->xdpf value to NULL net: ena: Fix incorrect descriptor free behavior net: ena: Wrong missing IO completions check order net: ena: Fix potential sign extension issue af_unix: Fix garbage collector racing against connect() net: dsa: mt7530: trap link-local frames regardless of ST Port State Revert "s390/ism: fix receive message buffer allocation" net: sparx5: fix wrong config being used when reconfiguring PCS net/mlx5: fix possible stack overflows net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev net/mlx5e: RSS, Block XOR hash with over 128 channels net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit net/mlx5e: HTB, Fix inconsistencies with QoS SQs number net/mlx5e: Fix mlx5e_priv_init() cleanup flow net/mlx5e: RSS, Block changing channels number when RXFH is configured net/mlx5: Correctly compare pkt reformat ids net/mlx5: Properly link new fs rules into the tree net/mlx5: offset comp irq index in name by one net/mlx5: Register devlink first under devlink lock net/mlx5: E-switch, store eswitch pointer before registering devlink_param ...
show more ...
|
#
47d8ac01 |
| 09-Apr-2024 |
Michal Luczaj <mhal@rbox.co> |
af_unix: Fix garbage collector racing against connect()
Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that
af_unix: Fix garbage collector racing against connect()
Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list.
sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped
connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- -----------
NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0
NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V])
// V became in-flight // V count=2 inflight=1
close(V)
// V count=1 inflight=1 // GC candidate condition met
for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates
// gc_candidates={L, V}
for u in gc_candidates: scan_children(u, dec_inflight)
// embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail)
// V count=1 inflight=2 (!)
If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another embryo coming, it can not possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen because unix_gc_lock is already taken. Inflight graph remains unaffected.
Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
100c8542 |
| 05-Apr-2024 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
A relatively large set of fixes here, the biggest piece of it is a
Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
A relatively large set of fixes here, the biggest piece of it is a series correcting some problems with the delay reporting for Intel SOF cards but there's a bunch of other things. Everything here is driver specific except for a fix in the core for an issue with sign extension handling volume controls.
show more ...
|
#
36a1818f |
| 25-Mar-2024 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get drm-misc-fixes to the state of v6.9-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
a2e7496b |
| 13-Mar-2024 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to sync before merging the patchset at [1].
[1] https://lore.kernel.org/all/cover.1709913674.git.jani.nikula@intel.com/
Signed-off-by: Thomas Zi
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to sync before merging the patchset at [1].
[1] https://lore.kernel.org/all/cover.1709913674.git.jani.nikula@intel.com/
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
show more ...
|
#
9f234784 |
| 21-Mar-2024 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v6.9-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
A bunch of fixes that came in during the merge window, pr
Merge tag 'asoc-fix-v6.9-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
A bunch of fixes that came in during the merge window, probably the most substantial thing is the DPCM locking fix for compressed audio which has been lurking for a while.
show more ...
|
#
5bd249ae |
| 18-Mar-2024 |
Mark Brown <broonie@kernel.org> |
spi: Merge up v6.8 release
An i.MX fix depends on other fixes that were sent to v6.8.
|