bc784972 | 08-May-2025 |
Ben Skeggs <bskeggs@nvidia.com> |
drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
The current code using NV90F1_CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES not only requires changes to support the new page t
drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
The current code using NV90F1_CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES not only requires changes to support the new page table layout used on Hopper/Blackwell GPUs, but is also broken in that it always mirrors the PDEs used for virtual address 0, rather than the area reserved for RM.
This works fine for the non-NVK case where the kernel has full control of the VMM layout and things end up in the right place, but NVK puts its kernel reserved area much higher in the address space.
Fixing the code to work at any VA is not enough as some parts of RM want the reserved area in a specific location, and NVK would then hit other assertions in RM instead.
Fortunately, it appears that RM never needs to allocate anything within its reserved area for DRM clients, and the COPY_SERVER_RESERVED_PDES control call primarily serves to allow RM to locate the root page table when initialising a channel's instance block.
Flag VMMs allocated by the DRM driver as externally owned, and use NV0080_CTRL_CMD_DMA_SET_PAGE_DIRECTORY to inform RM of the root page table in a similar way to NVIDIA's UVM driver.
The COPY_SERVER_RESERVED_PDES paths are kept for the golden context image and gr scrubber channel, where RM needs the reserved area.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
f9643364 | 14-Nov-2024 |
Ben Skeggs <bskeggs@nvidia.com> |
drm/nouveau/gsp: split device handling out on its own
Split handling of NV01_DEVICE (and other related objects) out into its own module.
Aside from moving the function pointers, no code change is i
drm/nouveau/gsp: split device handling out on its own
Split handling of NV01_DEVICE (and other related objects) out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
be33f499 | 14-Nov-2024 |
Ben Skeggs <bskeggs@nvidia.com> |
drm/nouveau/gsp: split rm alloc handling out on its own
Split base RM_ALLOC handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Be
drm/nouveau/gsp: split rm alloc handling out on its own
Split base RM_ALLOC handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
063d193f | 14-Nov-2024 |
Ben Skeggs <bskeggs@nvidia.com> |
drm/nouveau/gsp: split rm ctrl handling out on its own
Split base RM_CONTROL handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: B
drm/nouveau/gsp: split rm ctrl handling out on its own
Split base RM_CONTROL handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
a738fa91 | 27-Feb-2025 |
Zhi Wang <zhiw@nvidia.com> |
drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
Some GSP RPC commands need a new reply policy: "caller don't care about the message content but want to make sure a reply is
drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
Some GSP RPC commands need a new reply policy: "caller don't care about the message content but want to make sure a reply is received". To support this case, a new reply policy is introduced.
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the dump of the GSP message queue. After the large GSP RPC command is issued, GSP will write only an empty RPC header in the queue as the reply.
Without this change, the policy "receiving the entire message" is used for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving the returned GSP message in the suspend/resume path.
Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the returned GSP message but discards it for the caller. Use the new policy NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY.
Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message") Cc: Danilo Krummrich <dakr@kernel.org> Cc: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com
show more ...
|
214c9539 | 30-Oct-2024 |
Timur Tabi <ttabi@nvidia.com> |
drm/nouveau: expose GSP-RM logging buffers via debugfs
The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers that have printf-like logs from GSP-RM and PMU encoded in them.
LOGINIT,
drm/nouveau: expose GSP-RM logging buffers via debugfs
The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers that have printf-like logs from GSP-RM and PMU encoded in them.
LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA addresses are passed to GSP-RM during initialization. The buffers are required for GSP-RM to initialize properly.
LOGPMU is also allocated by Nouveau, but its contents are updated when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from GSP-RM. Nouveau then copies the RPC to the buffer.
The messages are encoded as an array of variable-length structures that contain the parameters to an NV_PRINTF call. The format string and parameter count are stored in a special ELF image that contains only logging strings. This image is not currently shipped with the Nvidia driver.
There are two methods to extract the logs.
OpenRM tries to load the logging ELF, and if present, parses the log buffers in real time and outputs the strings to the kernel console.
Alternatively, and this is the method used by this patch, the buffers can be exposed to user space, and a user-space tool (along with the logging ELF image) can parse the buffer and dump the logs.
This method has the advantage that it allows the buffers to be parsed even when the logging ELF file is not available to the user. However, it has the disadvantage the debugfs entries need to remain until the driver is unloaded.
The buffers are exposed via debugfs. If GSP-RM fails to initialize, then Nouveau immediately shuts down the GSP interface. This would normally also deallocate the logging buffers, thereby preventing the user from capturing the debug logs.
To avoid this, introduce the keep-gsp-logging command line parameter. If specified, and if at least one logging buffer has content, then Nouveau will migrate these buffers into new debugfs entries that are retained until the driver unloads.
An end-user can capture the logs using the following commands:
cp /sys/kernel/debug/nouveau/<path>/loginit loginit cp /sys/kernel/debug/nouveau/<path>/logrm logrm cp /sys/kernel/debug/nouveau/<path>/logintr logintr cp /sys/kernel/debug/nouveau/<path>/logpmu logpmu
where (for a PCI device) <path> is the PCI ID of the GPU (e.g. 0000:65:00.0).
Since LOGPMU is not needed for normal GSP-RM operation, it is only created if debugfs is available. Otherwise, the NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored.
A simple way to test the buffer migration feature is to have nvkm_gsp_init() return an error code.
Tested-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-2-ttabi@nvidia.com
show more ...
|