History log of /linux/net/unix/garbage.c (Results 76 – 100 of 633)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# a15702d8 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Detect dead SCC.

When iterating SCC, we call unix_vertex_dead() for each vertex
to check if the vertex is close()d and has no bridge to another
SCC.

If both conditions are true for every v

af_unix: Detect dead SCC.

When iterating SCC, we call unix_vertex_dead() for each vertex
to check if the vertex is close()d and has no bridge to another
SCC.

If both conditions are true for every vertex in SCC, we can
execute garbage collection for all skb in the SCC.

The actual garbage collection is done in the following patch,
replacing the old implementation.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# bfdb0128 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Assign a unique index to SCC.

The definition of the lowlink in Tarjan's algorithm is the
smallest index of a vertex that is reachable with at most one
back-edge in SCC. This is not useful

af_unix: Assign a unique index to SCC.

The definition of the lowlink in Tarjan's algorithm is the
smallest index of a vertex that is reachable with at most one
back-edge in SCC. This is not useful for a cross-edge.

If we start traversing from A in the following graph, the final
lowlink of D is 3. The cross-edge here is one between D and C.

A -> B -> D D = (4, 3) (index, lowlink)
^ | | C = (3, 1)
| V | B = (2, 1)
`--- C <--' A = (1, 1)

This is because the lowlink of D is updated with the index of C.

In the following patch, we detect a dead SCC by checking two
conditions for each vertex.

1) vertex has no edge directed to another SCC (no bridge)
2) vertex's out_degree is the same as the refcount of its file

If 1) is false, there is a receiver of all fds of the SCC and
its ancestor SCC.

To evaluate 1), we need to assign a unique index to each SCC and
assign it to all vertices in the SCC.

This patch changes the lowlink update logic for cross-edge so
that in the example above, the lowlink of D is updated with the
lowlink of C.

A -> B -> D D = (4, 1) (index, lowlink)
^ | | C = (3, 1)
| V | B = (2, 1)
`--- C <--' A = (1, 1)

Then, all vertices in the same SCC have the same lowlink, and we
can quickly find the bridge connecting to different SCC if exists.

However, it is no longer called lowlink, so we rename it to
scc_index. (It's sometimes called lowpoint.)

Also, we add a global variable to hold the last index used in DFS
so that we do not reset the initial index in each DFS.

This patch can be squashed to the SCC detection patch but is
split deliberately for anyone wondering why lowlink is not used
as used in the original Tarjan's algorithm and many reference
implementations.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# ad081928 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Avoid Tarjan's algorithm if unnecessary.

Once a cyclic reference is formed, we need to run GC to check if
there is dead SCC.

However, we do not need to run Tarjan's algorithm if we know th

af_unix: Avoid Tarjan's algorithm if unnecessary.

Once a cyclic reference is formed, we need to run GC to check if
there is dead SCC.

However, we do not need to run Tarjan's algorithm if we know that
the shape of the inflight graph has not been changed.

If an edge is added/updated/deleted and the edge's successor is
inflight, we set false to unix_graph_grouped, which means we need
to re-classify SCC.

Once we finalise SCC, we set true to unix_graph_grouped.

While unix_graph_grouped is true, we can iterate the grouped
SCC using vertex->scc_entry in unix_walk_scc_fast().

list_add() and list_for_each_entry_reverse() uses seem weird, but
they are to keep the vertex order consistent and make writing test
easier.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 77e5593a 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Skip GC if no cycle exists.

We do not need to run GC if there is no possible cyclic reference.
We use unix_graph_maybe_cyclic to decide if we should run GC.

If a fd of an AF_UNIX socket is

af_unix: Skip GC if no cycle exists.

We do not need to run GC if there is no possible cyclic reference.
We use unix_graph_maybe_cyclic to decide if we should run GC.

If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX
socket, they could form a cyclic reference. Then, we set true to
unix_graph_maybe_cyclic and later run Tarjan's algorithm to group
them into SCC.

Once we run Tarjan's algorithm, we are 100% sure whether cyclic
references exist or not. If there is no cycle, we set false to
unix_graph_maybe_cyclic and can skip the entire garbage collection
next time.

When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC
consists of multiple vertices.

Even if SCC is a single vertex, a cycle might exist as self-fd passing.
Given the corner case is rare, we detect it by checking all edges of
the vertex and set true to unix_graph_maybe_cyclic.

With this change, __unix_gc() is just a spin_lock() dance in the normal
usage.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# ba31b4a4 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Save O(n) setup of Tarjan's algo.

Before starting Tarjan's algorithm, we need to mark all vertices
as unvisited. We can save this O(n) setup by reserving two special
indices (0, 1) and usi

af_unix: Save O(n) setup of Tarjan's algo.

Before starting Tarjan's algorithm, we need to mark all vertices
as unvisited. We can save this O(n) setup by reserving two special
indices (0, 1) and using two variables.

The first time we link a vertex to unix_unvisited_vertices, we set
unix_vertex_unvisited_index to index.

During DFS, we can see that the index of unvisited vertices is the
same as unix_vertex_unvisited_index.

When we finalise SCC later, we set unix_vertex_grouped_index to each
vertex's index.

Then, we can know (i) that the vertex is on the stack if the index
of a visited vertex is >= 2 and (ii) that it is not on the stack and
belongs to a different SCC if the index is unix_vertex_grouped_index.

After the whole algorithm, all indices of vertices are set as
unix_vertex_grouped_index.

Next time we start DFS, we know that all unvisited vertices have
unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index
as the not-on-stack marker.

To use the same variable in __unix_walk_scc(), we can swap
unix_vertex_(grouped|unvisited)_index at the end of Tarjan's
algorithm.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# dcf70df2 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Fix up unix_edge.successor for embryo socket.

To garbage collect inflight AF_UNIX sockets, we must define the
cyclic reference appropriately. This is a bit tricky if the loop
consists of e

af_unix: Fix up unix_edge.successor for embryo socket.

To garbage collect inflight AF_UNIX sockets, we must define the
cyclic reference appropriately. This is a bit tricky if the loop
consists of embryo sockets.

Suppose that the fd of AF_UNIX socket A is passed to D and the fd B
to C and that C and D are embryo sockets of A and B, respectively.
It may appear that there are two separate graphs, A (-> D) and
B (-> C), but this is not correct.

A --. .-- B
X
C <-' `-> D

Now, D holds A's refcount, and C has B's refcount, so unix_release()
will never be called for A and B when we close() them. However, no
one can call close() for D and C to free skbs holding refcounts of A
and B because C/D is in A/B's receive queue, which should have been
purged by unix_release() for A and B.

So, here's another type of cyclic reference. When a fd of an AF_UNIX
socket is passed to an embryo socket, the reference is indirectly held
by its parent listening socket.

.-> A .-> B
| `- sk_receive_queue | `- sk_receive_queue
| `- skb | `- skb
| `- sk == C | `- sk == D
| `- sk_receive_queue | `- sk_receive_queue
| `- skb +---------' `- skb +-.
| |
`---------------------------------------------------------'

Technically, the graph must be denoted as A <-> B instead of A (-> D)
and B (-> C) to find such a cyclic reference without touching each
socket's receive queue.

.-> A --. .-- B <-.
| X | == A <-> B
`-- C <-' `-> D --'

We apply this fixup during GC by fetching the real successor by
unix_edge_successor().

When we call accept(), we clear unix_sock.listener under unix_gc_lock
not to confuse GC.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 3484f063 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Detect Strongly Connected Components.

In the new GC, we use a simple graph algorithm, Tarjan's Strongly
Connected Components (SCC) algorithm, to find cyclic references.

The algorithm visit

af_unix: Detect Strongly Connected Components.

In the new GC, we use a simple graph algorithm, Tarjan's Strongly
Connected Components (SCC) algorithm, to find cyclic references.

The algorithm visits every vertex exactly once using depth-first
search (DFS).

DFS starts by pushing an input vertex to a stack and assigning it
a unique number. Two fields, index and lowlink, are initialised
with the number, but lowlink could be updated later during DFS.

If a vertex has an edge to an unvisited inflight vertex, we visit
it and do the same processing. So, we will have vertices in the
stack in the order they appear and number them consecutively in
the same order.

If a vertex has a back-edge to a visited vertex in the stack,
we update the predecessor's lowlink with the successor's index.

After iterating edges from the vertex, we check if its index
equals its lowlink.

If the lowlink is different from the index, it shows there was a
back-edge. Then, we go backtracking and propagate the lowlink to
its predecessor and resume the previous edge iteration from the
next edge.

If the lowlink is the same as the index, we pop vertices before
and including the vertex from the stack. Then, the set of vertices
is SCC, possibly forming a cycle. At the same time, we move the
vertices to unix_visited_vertices.

When we finish the algorithm, all vertices in each SCC will be
linked via unix_vertex.scc_entry.

Let's take an example. We have a graph including five inflight
vertices (F is not inflight):

A -> B -> C -> D -> E (-> F)
^ |
`---------'

Suppose that we start DFS from C. We will visit C, D, and B first
and initialise their index and lowlink. Then, the stack looks like
this:

> B = (3, 3) (index, lowlink)
D = (2, 2)
C = (1, 1)

When checking B's edge to C, we update B's lowlink with C's index
and propagate it to D.

B = (3, 1) (index, lowlink)
> D = (2, 1)
C = (1, 1)

Next, we visit E, which has no edge to an inflight vertex.

> E = (4, 4) (index, lowlink)
B = (3, 1)
D = (2, 1)
C = (1, 1)

When we leave from E, its index and lowlink are the same, so we
pop E from the stack as single-vertex SCC. Next, we leave from
B and D but do nothing because their lowlink are different from
their index.

B = (3, 1) (index, lowlink)
D = (2, 1)
> C = (1, 1)

Then, we leave from C, whose index and lowlink are the same, so
we pop B, D and C as SCC.

Last, we do DFS for the rest of vertices, A, which is also a
single-vertex SCC.

Finally, each unix_vertex.scc_entry is linked as follows:

A -. B -> C -> D E -.
^ | ^ | ^ |
`--' `---------' `--'

We use SCC later to decide whether we can garbage-collect the
sockets.

Note that we still cannot detect SCC properly if an edge points
to an embryo socket. The following two patches will sort it out.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 6ba76fd2 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Iterate all vertices by DFS.

The new GC will use a depth first search graph algorithm to find
cyclic references. The algorithm visits every vertex exactly once.

Here, we implement the DFS

af_unix: Iterate all vertices by DFS.

The new GC will use a depth first search graph algorithm to find
cyclic references. The algorithm visits every vertex exactly once.

Here, we implement the DFS part without recursion so that no one
can abuse it.

unix_walk_scc() marks every vertex unvisited by initialising index
as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in
unix_unvisited_vertices and call __unix_walk_scc() to start DFS from
an arbitrary vertex.

__unix_walk_scc() iterates all edges starting from the vertex and
explores the neighbour vertices with DFS using edge_stack.

After visiting all neighbours, __unix_walk_scc() moves the visited
vertex to unix_visited_vertices so that unix_walk_scc() will not
restart DFS from the visited vertex.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 22c3c0c5 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.

Currently, we track the number of inflight sockets in two variables.
unix_tot_inflight is the total number of inflight AF_UNIX

af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.

Currently, we track the number of inflight sockets in two variables.
unix_tot_inflight is the total number of inflight AF_UNIX sockets on
the host, and user->unix_inflight is the number of inflight fds per
user.

We update them one by one in unix_inflight(), which can be done once
in batch. Also, sendmsg() could fail even after unix_inflight(), then
we need to acquire unix_gc_lock only to decrement the counters.

Let's bulk update the counters in unix_add_edges() and unix_del_edges(),
which is called only for successfully passed fds.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 42f298c0 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Link struct unix_edge when queuing skb.

Just before queuing skb with inflight fds, we call scm_stat_add(),
which is a good place to set up the preallocated struct unix_vertex
and struct uni

af_unix: Link struct unix_edge when queuing skb.

Just before queuing skb with inflight fds, we call scm_stat_add(),
which is a good place to set up the preallocated struct unix_vertex
and struct unix_edge in UNIXCB(skb).fp.

Then, we call unix_add_edges() and construct the directed graph
as follows:

1. Set the inflight socket's unix_sock to unix_edge.predecessor.
2. Set the receiver's unix_sock to unix_edge.successor.
3. Set the preallocated vertex to inflight socket's unix_sock.vertex.
4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices.
5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges.

Let's say we pass the fd of AF_UNIX socket A to B and the fd of B
to C. The graph looks like this:

+-------------------------+
| unix_unvisited_vertices | <-------------------------.
+-------------------------+ |
+ |
| +--------------+ +--------------+ | +--------------+
| | unix_sock A | <---. .---> | unix_sock B | <-|-. .---> | unix_sock C |
| +--------------+ | | +--------------+ | | | +--------------+
| .-+ | vertex | | | .-+ | vertex | | | | | vertex |
| | +--------------+ | | | +--------------+ | | | +--------------+
| | | | | | | |
| | +--------------+ | | | +--------------+ | | |
| '-> | unix_vertex | | | '-> | unix_vertex | | | |
| +--------------+ | | +--------------+ | | |
`---> | entry | +---------> | entry | +-' | |
|--------------| | | |--------------| | |
| edges | <-. | | | edges | <-. | |
+--------------+ | | | +--------------+ | | |
| | | | | |
.----------------------' | | .----------------------' | |
| | | | | |
| +--------------+ | | | +--------------+ | |
| | unix_edge | | | | | unix_edge | | |
| +--------------+ | | | +--------------+ | |
`-> | vertex_entry | | | `-> | vertex_entry | | |
|--------------| | | |--------------| | |
| predecessor | +---' | | predecessor | +---' |
|--------------| | |--------------| |
| successor | +-----' | successor | +-----'
+--------------+ +--------------+

Henceforth, we denote such a graph as A -> B (-> C).

Now, we can express all inflight fd graphs that do not contain
embryo sockets. We will support the particular case later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 29b64e35 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.

As with the previous patch, we preallocate to skb's scm_fp_list an
array of struct unix_edge in the number of inflight AF_UNIX fds.

af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd.

As with the previous patch, we preallocate to skb's scm_fp_list an
array of struct unix_edge in the number of inflight AF_UNIX fds.

There we just preallocate memory and do not use immediately because
sendmsg() could fail after this point. The actual use will be in
the next patch.

When we queue skb with inflight edges, we will set the inflight
socket's unix_sock as unix_edge->predecessor and the receiver's
unix_sock as successor, and then we will link the edge to the
inflight socket's unix_vertex.edges.

Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup()
so that MSG_PEEK does not change the shape of the directed graph.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 1fbfdfaa 25-Mar-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.

We will replace the garbage collection algorithm for AF_UNIX, where
we will consider each inflight AF_UNIX socket as a vertex and i

af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd.

We will replace the garbage collection algorithm for AF_UNIX, where
we will consider each inflight AF_UNIX socket as a vertex and its file
descriptor as an edge in a directed graph.

This patch introduces a new struct unix_vertex representing a vertex
in the graph and adds its pointer to struct unix_sock.

When we send a fd using the SCM_RIGHTS message, we allocate struct
scm_fp_list to struct scm_cookie in scm_fp_copy(). Then, we bump
each refcount of the inflight fds' struct file and save them in
scm_fp_list.fp.

After that, unix_attach_fds() inexplicably clones scm_fp_list of
scm_cookie and sets it to skb. (We will remove this part after
replacing GC.)

Here, we add a new function call in unix_attach_fds() to preallocate
struct unix_vertex per inflight AF_UNIX fd and link each vertex to
skb's scm_fp_list.vertices.

When sendmsg() succeeds later, if the socket of the inflight fd is
still not inflight yet, we will set the preallocated vertex to struct
unix_sock.vertex and link it to a global list unix_unvisited_vertices
under spin_lock(&unix_gc_lock).

If the socket is already inflight, we free the preallocated vertex.
This is to avoid taking the lock unnecessarily when sendmsg() could
fail later.

In the following patch, we will similarly allocate another struct
per edge, which will finally be linked to the inflight socket's
unix_vertex.edges.

And then, we will count the number of edges as unix_vertex.out_degree.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# b7e1e969 26-Mar-2024 Takashi Iwai <tiwai@suse.de>

Merge branch 'topic/sound-devel-6.10' into for-next


Revision tags: v6.8, v6.8-rc7
# 06d07429 29-Feb-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to get the drm_printer changes to drm-intel-next.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# d0331aa9 14-Apr-2024 Ingo Molnar <mingo@kernel.org>

Merge branch 'linus' into perf/core, to pick up perf/urgent fixes

Pick up perf/urgent fixes that are upstream already, but not
yet in the perf/core development branch.

Signed-off-by: Ingo Molnar <m

Merge branch 'linus' into perf/core, to pick up perf/urgent fixes

Pick up perf/urgent fixes that are upstream already, but not
yet in the perf/core development branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# f4566a1e 25-Mar-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 52afb15e 25-Apr-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.

Nothing ma

Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.

Nothing major, regression fixes are mostly in drivers, two more of
those are flowing towards us thru various trees. I wish some of the
changes went into -rc5, we'll try to keep an eye on frequency of PRs
from sub-trees.

Also disproportional number of fixes for bugs added in v6.4, strange
coincidence.

Current release - regressions:

- igc: fix LED-related deadlock on driver unbind

- wifi: mac80211: small fixes to recent clean up of the connection
process

- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
doesn't have all the code to deal with that version, yet

- Bluetooth:
- set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- qca: fix invalid device address check, again

- eth: ravb: fix registered interrupt names

Current release - new code bugs:

- wifi: mac80211: check EHT/TTLM action frame length

Previous releases - regressions:

- fix sk_memory_allocated_{add|sub} for architectures where
__this_cpu_{add|sub}* are not IRQ-safe

- dsa: mv88e6xx: fix link setup for 88E6250

Previous releases - always broken:

- ip: validate dev returned from __in_dev_get_rcu(), prevent possible
null-derefs in a few places

- switch number of for_each_rcu() loops using call_rcu() on the
iterator to for_each_safe()

- macsec: fix isolation of broadcast traffic in presence of offload

- vxlan: drop packets from invalid source address

- eth: mlxsw: trap and ACL programming fixes

- eth: bnxt: PCIe error recovery fixes, fix counting dropped packets

- Bluetooth:
- lots of fixes for the command submission rework from v6.4
- qca: fix NULL-deref on non-serdev suspend

Misc:

- tools: ynl: don't ignore errors in NLMSG_DONE messages"

* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: b44: set pause params only when interface is up
tls: fix lockless read of strp->msg_ready in ->poll
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
net: ravb: Fix registered interrupt names
octeontx2-af: fix the double free in rvu_npc_freemem()
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
ice: fix LAG and VF lock dependency in ice_reset_vf()
iavf: Fix TC config comparison with existing adapter TC config
i40e: Report MFS in decimal base instead of hex
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
ethernet: Add helper for assigning packet type when dest address does not match device address
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
net: phy: dp83869: Fix MII mode failure
netfilter: nf_tables: honor table dormant flag from netdev release event path
eth: bnxt: fix counting packets discarded due to OOM and netpoll
igc: Fix LED-related deadlock on driver unbind
...

show more ...


# 1971d13f 24-Apr-2024 Kuniyuki Iwashima <kuniyu@amazon.com>

af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().

syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().

One is called from recvmsg() for a conne

af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().

syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().

One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.

So, the splat is false-positive.

Let's add a dedicated lock class for the latter to suppress the splat.

Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.

[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
-----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302

but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (unix_gc_lock){+.+.}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
unix_detach_fds net/unix/af_unix.c:1819 [inline]
unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
skb_release_all net/core/skbuff.c:1200 [inline]
__kfree_skb net/core/skbuff.c:1216 [inline]
kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
kfree_skb include/linux/skbuff.h:1262 [inline]
manage_oob net/unix/af_unix.c:2672 [inline]
unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
do_splice_read fs/splice.c:985 [inline]
splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
do_splice+0xf2d/0x1880 fs/splice.c:1379
__do_splice fs/splice.c:1436 [inline]
__do_sys_splice fs/splice.c:1652 [inline]
__se_sys_splice+0x331/0x4a0 fs/splice.c:1634
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&u->lock){+.+.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(unix_gc_lock);
lock(&u->lock);
lock(unix_gc_lock);
lock(&u->lock);

*** DEADLOCK ***

3 locks held by kworker/u8:1/11:
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>

Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 2ae9a897 11-Apr-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.

Current release - new code bugs:

Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.

Current release - new code bugs:

- netfilter: complete validation of user input

- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev

Previous releases - regressions:

- core: fix u64_stats_init() for lockdep when used repeatedly in one
file

- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr

- bluetooth: fix memory leak in hci_req_sync_complete()

- batman-adv: avoid infinite loop trying to resize local TT

- drv: geneve: fix header validation in geneve[6]_xmit_skb

- drv: bnxt_en: fix possible memory leak in
bnxt_rdma_aux_device_init()

- drv: mlx5: offset comp irq index in name by one

- drv: ena: avoid double-free clearing stale tx_info->xdpf value

- drv: pds_core: fix pdsc_check_pci_health deadlock

Previous releases - always broken:

- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING

- bluetooth: fix setsockopt not validating user input

- af_unix: clear stale u->oob_skb.

- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies

- drv: virtio_net: fix guest hangup on invalid RSS update

- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow

- dsa: mt7530: trap link-local frames regardless of ST Port State"

* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: ena: Set tx_info->xdpf value to NULL
net: ena: Fix incorrect descriptor free behavior
net: ena: Wrong missing IO completions check order
net: ena: Fix potential sign extension issue
af_unix: Fix garbage collector racing against connect()
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Revert "s390/ism: fix receive message buffer allocation"
net: sparx5: fix wrong config being used when reconfiguring PCS
net/mlx5: fix possible stack overflows
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
net/mlx5e: RSS, Block XOR hash with over 128 channels
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
net/mlx5e: Fix mlx5e_priv_init() cleanup flow
net/mlx5e: RSS, Block changing channels number when RXFH is configured
net/mlx5: Correctly compare pkt reformat ids
net/mlx5: Properly link new fs rules into the tree
net/mlx5: offset comp irq index in name by one
net/mlx5: Register devlink first under devlink lock
net/mlx5: E-switch, store eswitch pointer before registering devlink_param
...

show more ...


# 47d8ac01 09-Apr-2024 Michal Luczaj <mhal@rbox.co>

af_unix: Fix garbage collector racing against connect()

Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that

af_unix: Fix garbage collector racing against connect()

Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.

sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped

connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc()
---------------- ------------------------- -----------

NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
// V count=1 inflight=0

NS = unix_peer(S)
skb2 = sock_alloc()
skb_queue_tail(NS, skb2[V])

// V became in-flight
// V count=2 inflight=1

close(V)

// V count=1 inflight=1
// GC candidate condition met

for u in gc_inflight_list:
if (total_refs == inflight_refs)
add u to gc_candidates

// gc_candidates={L, V}

for u in gc_candidates:
scan_children(u, dec_inflight)

// embryo (skb1) was not
// reachable from L yet, so V's
// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
for u in gc_candidates:
if (u.inflight)
scan_children(u, inc_inflight_move_tail)

// V count=1 inflight=2 (!)

If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.

Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


# 100c8542 05-Apr-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A relatively large set of fixes here, the biggest piece of it is a

Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A relatively large set of fixes here, the biggest piece of it is a
series correcting some problems with the delay reporting for Intel SOF
cards but there's a bunch of other things. Everything here is driver
specific except for a fix in the core for an issue with sign extension
handling volume controls.

show more ...


# 36a1818f 25-Mar-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get drm-misc-fixes to the state of v6.9-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# a2e7496b 13-Mar-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to sync before merging the patchset at [1].

[1] https://lore.kernel.org/all/cover.1709913674.git.jani.nikula@intel.com/

Signed-off-by: Thomas Zi

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to sync before merging the patchset at [1].

[1] https://lore.kernel.org/all/cover.1709913674.git.jani.nikula@intel.com/

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>

show more ...


# 9f234784 21-Mar-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.9-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A bunch of fixes that came in during the merge window, pr

Merge tag 'asoc-fix-v6.9-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A bunch of fixes that came in during the merge window, probably the most
substantial thing is the DPCM locking fix for compressed audio which has
been lurking for a while.

show more ...


# 5bd249ae 18-Mar-2024 Mark Brown <broonie@kernel.org>

spi: Merge up v6.8 release

An i.MX fix depends on other fixes that were sent to v6.8.


12345678910>>...26