drm/dp_mst: Introduce new refcounting scheme for mstbs and portsThe current way of handling refcounting in the DP MST helpers is reallyconfusing and probably just plain wrong because it's been hac
drm/dp_mst: Introduce new refcounting scheme for mstbs and portsThe current way of handling refcounting in the DP MST helpers is reallyconfusing and probably just plain wrong because it's been hacked up manytimes over the years without anyone actually going over the code andseeing if things could be simplified.To the best of my understanding, the current scheme works like this:drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. Whenthis refcount hits 0 for either of the two, they're removed from thetopology state, but not immediately freed. Both ports and branch deviceswill reinitialize their kref once it's hit 0 before actually destroyingthemselves. The intended purpose behind this is so that we can avoidproblems like not being able to free a remote payload that might stillbe active, due to us having removed all of the port/branch devicestructures in memory, as per:commit 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")Which may have worked, but then it caused use-after-free errors. Beingnew to MST at the time, I tried fixing it;commit 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structsare validated in almost every DP MST helper function. Simply put, thismeans we go through the topology and try to see if the givendrm_dp_mst_branch or drm_dp_mst_port is still attached to somethingbefore trying to use it in order to avoid dereferencing freed memory(something that has happened a LOT in the past with this library).Because of this it doesn't actually matter whether or not we keep keepthe ports and branches around in memory as that's not enough, becauseany function that validates the branches and ports passed to it willstill reject them anyway since they're no longer in the topologystructure. So, use-after-free errors were fixed but payload deallocationwas completely broken.Two years later, AMD informed me about this issue and I attempted tocome up with a temporary fix, pending a long-overdue cleanup of thislibrary:commit c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")But then that introduced use-after-free errors, so I quickly revertedit:commit 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")And in the process, learned that there is just no simple fix for this:the design is just broken. Unfortunately, the usage of these helpers arequite broken as well. Some drivers like i915 have been smart enough toavoid accessing any kind of information from MST port structures, butothers like nouveau have assumed, understandably so, thatdrm_dp_mst_port structures are normal and can just be accessed at anytime without worrying about use-after-free errors.After a lot of discussion, me and Daniel Vetter came up with a betteridea to replace all of this.To summarize, since this is documented far more indepth in thedocumentation this patch introduces, we make it so that drm_dp_mst_portand drm_dp_mst_branch structures have two different classes ofrefcounts: topology_kref, and malloc_kref. topology_kref corresponds tothe lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it'sgiven topology. Once it hits zero, any associated connectors are removedand the branch or port can no longer be validated. malloc_krefcorresponds to the lifetime of the memory allocation for the actualstructure, and will always be non-zero so long as the topology_kref isnon-zero. This gives us a way to allow callers to hold onto port andbranch device structures past their topology lifetime, and dramaticallysimplifies the lifetimes of both structures. This also finally fixes theport deallocation problem, properly.Additionally: since this now means that we can keep ports and branchdevices allocated in memory for however long we need, we no longer needa significant amount of the port validation that we currently do.Additionally, there is one last scenario that this fixes, which couldn'thave been fixed properly beforehand:- CPU1 unrefs port from topology (refcount 1->0)- CPU2 refs port in topology(refcount 0->1)Since we now can guarantee memory safety for ports and branchesas-needed, we also can make our main reference counting functions fixthis problem by using kref_get_unless_zero() internally so that topologyrefcounts can only ever reach 0 once.Changes since v4:* Change the kernel-figure summary for dp-mst/topology-figure-1.dot a bit - danvet* Remove figure numbers - danvetChanges since v3:* Remove rebase detritus - danvet* Split out purely style changes into separate patches - hwentlanChanges since v2:* Fix commit message - checkpatch* s/)-1/) - 1/g - checkpatchChanges since v1:* Remove forward declarations - danvet* Move "Branch device and port refcounting" section from documentation into kernel-doc comments - danvet* Export internal topology lifetime functions into their own section in the kernel-docs - danvet* s/@/&/g for struct references in kernel-docs - danvet* Drop the "when they are no longer being used" bits from the kernel docs - danvet* Modify diagrams to show how the DRM driver interacts with the topology and payloads - danvet* Make suggested documentation changes for drm_dp_mst_topology_get_mstb() and drm_dp_mst_topology_get_port() - danvet* Better explain the relationship between malloc refs and topology krefs in the documentation for drm_dp_mst_topology_get_port() and drm_dp_mst_topology_get_mstb() - danvet* Fix "See also" in drm_dp_mst_topology_get_mstb() - danvet* Rename drm_dp_mst_topology_get_(port|mstb)() -> drm_dp_mst_topology_try_get_(port|mstb)() and drm_dp_mst_topology_ref_(port|mstb)() -> drm_dp_mst_topology_get_(port|mstb)() - danvet* s/should/must in docs - danvet* WARN_ON(refcount == 0) in topology_get_(mstb|port) - danvet* Move kdocs for mstb/port structs inline - danvet* Split drm_dp_get_last_connected_port_and_mstb() changes into their own commit - danvetSigned-off-by: Lyude Paul <lyude@redhat.com>Reviewed-by: Harry Wentland <harry.wentland@amd.com>Reviewed-by: Daniel Vetter <daniel@ffwll.ch>Cc: David Airlie <airlied@redhat.com>Cc: Jerry Zuo <Jerry.Zuo@amd.com>Cc: Juston Li <juston.li@intel.com>Link: https://patchwork.freedesktop.org/patch/msgid/20190111005343.17443-7-lyude@redhat.com
show more ...