1f587623bSMatthew Brost========================================= 2f587623bSMatthew BrostI915 GuC Submission/DRM Scheduler Section 3f587623bSMatthew Brost========================================= 4f587623bSMatthew Brost 5f587623bSMatthew BrostUpstream plan 6f587623bSMatthew Brost============= 7f587623bSMatthew BrostFor upstream the overall plan for landing GuC submission and integrating the 8f587623bSMatthew Brosti915 with the DRM scheduler is: 9f587623bSMatthew Brost 10f587623bSMatthew Brost* Merge basic GuC submission 11f587623bSMatthew Brost * Basic submission support for all gen11+ platforms 12f587623bSMatthew Brost * Not enabled by default on any current platforms but can be enabled via 13f587623bSMatthew Brost modparam enable_guc 14f587623bSMatthew Brost * Lots of rework will need to be done to integrate with DRM scheduler so 15f587623bSMatthew Brost no need to nit pick everything in the code, it just should be 16f587623bSMatthew Brost functional, no major coding style / layering errors, and not regress 17f587623bSMatthew Brost execlists 18f587623bSMatthew Brost * Update IGTs / selftests as needed to work with GuC submission 19f587623bSMatthew Brost * Enable CI on supported platforms for a baseline 20f587623bSMatthew Brost * Rework / get CI heathly for GuC submission in place as needed 21f587623bSMatthew Brost* Merge new parallel submission uAPI 22f587623bSMatthew Brost * Bonding uAPI completely incompatible with GuC submission, plus it has 23f587623bSMatthew Brost severe design issues in general, which is why we want to retire it no 24f587623bSMatthew Brost matter what 25f587623bSMatthew Brost * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step 26f587623bSMatthew Brost which configures a slot with N contexts 27f587623bSMatthew Brost * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to 28f587623bSMatthew Brost a slot in a single execbuf IOCTL and the batches run on the GPU in 29f587623bSMatthew Brost paralllel 30f587623bSMatthew Brost * Initially only for GuC submission but execlists can be supported if 31f587623bSMatthew Brost needed 32f587623bSMatthew Brost* Convert the i915 to use the DRM scheduler 33f587623bSMatthew Brost * GuC submission backend fully integrated with DRM scheduler 34f587623bSMatthew Brost * All request queues removed from backend (e.g. all backpressure 35f587623bSMatthew Brost handled in DRM scheduler) 36f587623bSMatthew Brost * Resets / cancels hook in DRM scheduler 37f587623bSMatthew Brost * Watchdog hooks into DRM scheduler 38f587623bSMatthew Brost * Lots of complexity of the GuC backend can be pulled out once 39f587623bSMatthew Brost integrated with DRM scheduler (e.g. state machine gets 40f587623bSMatthew Brost simplier, locking gets simplier, etc...) 41f587623bSMatthew Brost * Execlists backend will minimum required to hook in the DRM scheduler 42f587623bSMatthew Brost * Legacy interface 43f587623bSMatthew Brost * Features like timeslicing / preemption / virtual engines would 44f587623bSMatthew Brost be difficult to integrate with the DRM scheduler and these 45f587623bSMatthew Brost features are not required for GuC submission as the GuC does 46f587623bSMatthew Brost these things for us 47f587623bSMatthew Brost * ROI low on fully integrating into DRM scheduler 48f587623bSMatthew Brost * Fully integrating would add lots of complexity to DRM 49f587623bSMatthew Brost scheduler 50f587623bSMatthew Brost * Port i915 priority inheritance / boosting feature in DRM scheduler 51f587623bSMatthew Brost * Used for i915 page flip, may be useful to other DRM drivers as 52f587623bSMatthew Brost well 53f587623bSMatthew Brost * Will be an optional feature in the DRM scheduler 54f587623bSMatthew Brost * Remove in-order completion assumptions from DRM scheduler 55f587623bSMatthew Brost * Even when using the DRM scheduler the backends will handle 56f587623bSMatthew Brost preemption, timeslicing, etc... so it is possible for jobs to 57f587623bSMatthew Brost finish out of order 58f587623bSMatthew Brost * Pull out i915 priority levels and use DRM priority levels 59f587623bSMatthew Brost * Optimize DRM scheduler as needed 60f587623bSMatthew Brost 61f587623bSMatthew BrostTODOs for GuC submission upstream 62f587623bSMatthew Brost================================= 63f587623bSMatthew Brost 64f587623bSMatthew Brost* Need an update to GuC firmware / i915 to enable error state capture 65f587623bSMatthew Brost* Open source tool to decode GuC logs 66f587623bSMatthew Brost* Public GuC spec 67f587623bSMatthew Brost 68f587623bSMatthew BrostNew uAPI for basic GuC submission 69f587623bSMatthew Brost================================= 70f587623bSMatthew BrostNo major changes are required to the uAPI for basic GuC submission. The only 71f587623bSMatthew Brostchange is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. 72f587623bSMatthew BrostThis attribute indicates the 2k i915 user priority levels are statically mapped 73f587623bSMatthew Brostinto 3 levels as follows: 74f587623bSMatthew Brost 75f587623bSMatthew Brost* -1k to -1 Low priority 76f587623bSMatthew Brost* 0 Medium priority 77f587623bSMatthew Brost* 1 to 1k High priority 78f587623bSMatthew Brost 79f587623bSMatthew BrostThis is needed because the GuC only has 4 priority bands. The highest priority 80f587623bSMatthew Brostband is reserved with the kernel. This aligns with the DRM scheduler priority 81f587623bSMatthew Brostlevels too. 82f587623bSMatthew Brost 83f587623bSMatthew BrostSpec references: 84f587623bSMatthew Brost---------------- 85f587623bSMatthew Brost* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt 86f587623bSMatthew Brost* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority 87f587623bSMatthew Brost* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t 88f587623bSMatthew Brost 89f587623bSMatthew BrostNew parallel submission uAPI 90f587623bSMatthew Brost============================ 910454a490SMatthew BrostThe existing bonding uAPI is completely broken with GuC submission because 920454a490SMatthew Brostwhether a submission is a single context submit or parallel submit isn't known 930454a490SMatthew Brostuntil execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple 940454a490SMatthew Brostcontexts in parallel with the GuC the context must be explicitly registered with 950454a490SMatthew BrostN contexts and all N contexts must be submitted in a single command to the GuC. 960454a490SMatthew BrostThe GuC interfaces do not support dynamically changing between N contexts as the 970454a490SMatthew Brostbonding uAPI does. Hence the need for a new parallel submission interface. Also 980454a490SMatthew Brostthe legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore 990454a490SMatthew BrostI915_SUBMIT_FENCE is by design a future fence, so not really something we should 1000454a490SMatthew Brostcontinue to support. 1010454a490SMatthew Brost 1020454a490SMatthew BrostThe new parallel submission uAPI consists of 3 parts: 1030454a490SMatthew Brost 1040454a490SMatthew Brost* Export engines logical mapping 1050454a490SMatthew Brost* A 'set_parallel' extension to configure contexts for parallel 1060454a490SMatthew Brost submission 1070454a490SMatthew Brost* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL 1080454a490SMatthew Brost 1090454a490SMatthew BrostExport engines logical mapping 1100454a490SMatthew Brost------------------------------ 1110454a490SMatthew BrostCertain use cases require BBs to be placed on engine instances in logical order 1120454a490SMatthew Brost(e.g. split-frame on gen11+). The logical mapping of engine instances can change 1130454a490SMatthew Brostbased on fusing. Rather than making UMDs be aware of fusing, simply expose the 1140454a490SMatthew Brostlogical mapping with the existing query engine info IOCTL. Also the GuC 1150454a490SMatthew Brostsubmission interface currently only supports submitting multiple contexts to 1160454a490SMatthew Brostengines in logical order which is a new requirement compared to execlists. 1170454a490SMatthew BrostLastly, all current platforms have at most 2 engine instances and the logical 1180454a490SMatthew Brostorder is the same as uAPI order. This will change on platforms with more than 2 1190454a490SMatthew Brostengine instances. 1200454a490SMatthew Brost 1210454a490SMatthew BrostA single bit will be added to drm_i915_engine_info.flags indicating that the 1220454a490SMatthew Brostlogical instance has been returned and a new field, 1230454a490SMatthew Brostdrm_i915_engine_info.logical_instance, returns the logical instance. 1240454a490SMatthew Brost 1250454a490SMatthew BrostA 'set_parallel' extension to configure contexts for parallel submission 1260454a490SMatthew Brost------------------------------------------------------------------------ 1270454a490SMatthew BrostThe 'set_parallel' extension configures a slot for parallel submission of N BBs. 1280454a490SMatthew BrostIt is a setup step that must be called before using any of the contexts. See 1290454a490SMatthew BrostI915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for 1300454a490SMatthew Brostsimilar existing examples. Once a slot is configured for parallel submission the 1310454a490SMatthew Brostexecbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only 1320454a490SMatthew Brostsupports GuC submission. Execlists supports can be added later if needed. 1330454a490SMatthew Brost 1340454a490SMatthew BrostAdd I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and 1350454a490SMatthew Brostdrm_i915_context_engines_parallel_submit to the uAPI to implement this 1360454a490SMatthew Brostextension. 1370454a490SMatthew Brost 138*f6757dfcSJani Nikula.. c:namespace-push:: rfc 139*f6757dfcSJani Nikula 1400d7502fcSMatthew Brost.. kernel-doc:: include/uapi/drm/i915_drm.h 1410d7502fcSMatthew Brost :functions: i915_context_engines_parallel_submit 1420454a490SMatthew Brost 143*f6757dfcSJani Nikula.. c:namespace-pop:: 144*f6757dfcSJani Nikula 1450454a490SMatthew BrostExtend execbuf2 IOCTL to support submitting N BBs in a single IOCTL 1460454a490SMatthew Brost------------------------------------------------------------------- 1470454a490SMatthew BrostContexts that have been configured with the 'set_parallel' extension can only 1480454a490SMatthew Brostsubmit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects 1490454a490SMatthew Brostin the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is 1500454a490SMatthew Brostset. The number of BBs is implicit based on the slot submitted and how it has 1510454a490SMatthew Brostbeen configured by 'set_parallel' or other extensions. No uAPI changes are 1520454a490SMatthew Brostrequired to the execbuf2 IOCTL. 153