1 // SPDX-License-Identifier: MIT 2 /* 3 * Copyright © 2014 Intel Corporation 4 */ 5 6 /** 7 * DOC: Logical Rings, Logical Ring Contexts and Execlists 8 * 9 * Motivation: 10 * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". 11 * These expanded contexts enable a number of new abilities, especially 12 * "Execlists" (also implemented in this file). 13 * 14 * One of the main differences with the legacy HW contexts is that logical 15 * ring contexts incorporate many more things to the context's state, like 16 * PDPs or ringbuffer control registers: 17 * 18 * The reason why PDPs are included in the context is straightforward: as 19 * PPGTTs (per-process GTTs) are actually per-context, having the PDPs 20 * contained there mean you don't need to do a ppgtt->switch_mm yourself, 21 * instead, the GPU will do it for you on the context switch. 22 * 23 * But, what about the ringbuffer control registers (head, tail, etc..)? 24 * shouldn't we just need a set of those per engine command streamer? This is 25 * where the name "Logical Rings" starts to make sense: by virtualizing the 26 * rings, the engine cs shifts to a new "ring buffer" with every context 27 * switch. When you want to submit a workload to the GPU you: A) choose your 28 * context, B) find its appropriate virtualized ring, C) write commands to it 29 * and then, finally, D) tell the GPU to switch to that context. 30 * 31 * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 32 * to a contexts is via a context execution list, ergo "Execlists". 33 * 34 * LRC implementation: 35 * Regarding the creation of contexts, we have: 36 * 37 * - One global default context. 38 * - One local default context for each opened fd. 39 * - One local extra context for each context create ioctl call. 40 * 41 * Now that ringbuffers belong per-context (and not per-engine, like before) 42 * and that contexts are uniquely tied to a given engine (and not reusable, 43 * like before) we need: 44 * 45 * - One ringbuffer per-engine inside each context. 46 * - One backing object per-engine inside each context. 47 * 48 * The global default context starts its life with these new objects fully 49 * allocated and populated. The local default context for each opened fd is 50 * more complex, because we don't know at creation time which engine is going 51 * to use them. To handle this, we have implemented a deferred creation of LR 52 * contexts: 53 * 54 * The local context starts its life as a hollow or blank holder, that only 55 * gets populated for a given engine once we receive an execbuffer. If later 56 * on we receive another execbuffer ioctl for the same context but a different 57 * engine, we allocate/populate a new ringbuffer and context backing object and 58 * so on. 59 * 60 * Finally, regarding local contexts created using the ioctl call: as they are 61 * only allowed with the render ring, we can allocate & populate them right 62 * away (no need to defer anything, at least for now). 63 * 64 * Execlists implementation: 65 * Execlists are the new method by which, on gen8+ hardware, workloads are 66 * submitted for execution (as opposed to the legacy, ringbuffer-based, method). 67 * This method works as follows: 68 * 69 * When a request is committed, its commands (the BB start and any leading or 70 * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 71 * for the appropriate context. The tail pointer in the hardware context is not 72 * updated at this time, but instead, kept by the driver in the ringbuffer 73 * structure. A structure representing this request is added to a request queue 74 * for the appropriate engine: this structure contains a copy of the context's 75 * tail after the request was written to the ring buffer and a pointer to the 76 * context itself. 77 * 78 * If the engine's request queue was empty before the request was added, the 79 * queue is processed immediately. Otherwise the queue will be processed during 80 * a context switch interrupt. In any case, elements on the queue will get sent 81 * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 82 * globally unique 20-bits submission ID. 83 * 84 * When execution of a request completes, the GPU updates the context status 85 * buffer with a context complete event and generates a context switch interrupt. 86 * During the interrupt handling, the driver examines the events in the buffer: 87 * for each context complete event, if the announced ID matches that on the head 88 * of the request queue, then that request is retired and removed from the queue. 89 * 90 * After processing, if any requests were retired and the queue is not empty 91 * then a new execution list can be submitted. The two requests at the front of 92 * the queue are next to be submitted but since a context may not occur twice in 93 * an execution list, if subsequent requests have the same ID as the first then 94 * the two requests must be combined. This is done simply by discarding requests 95 * at the head of the queue until either only one requests is left (in which case 96 * we use a NULL second context) or the first two requests have unique IDs. 97 * 98 * By always executing the first two requests in the queue the driver ensures 99 * that the GPU is kept as busy as possible. In the case where a single context 100 * completes but a second context is still executing, the request for this second 101 * context will be at the head of the queue when we remove the first one. This 102 * request will then be resubmitted along with a new request for a different context, 103 * which will cause the hardware to continue executing the second request and queue 104 * the new request (the GPU detects the condition of a context getting preempted 105 * with the same context and optimizes the context switch flow by not doing 106 * preemption, but just sampling the new tail pointer). 107 * 108 */ 109 110 #include <linux/interrupt.h> 111 #include <linux/string_helpers.h> 112 113 #include <drm/drm_print.h> 114 115 #include "gen8_engine_cs.h" 116 #include "i915_drv.h" 117 #include "i915_list_util.h" 118 #include "i915_reg.h" 119 #include "i915_timer_util.h" 120 #include "i915_trace.h" 121 #include "i915_vgpu.h" 122 #include "i915_wait_util.h" 123 #include "intel_breadcrumbs.h" 124 #include "intel_context.h" 125 #include "intel_engine_heartbeat.h" 126 #include "intel_engine_pm.h" 127 #include "intel_engine_regs.h" 128 #include "intel_engine_stats.h" 129 #include "intel_execlists_submission.h" 130 #include "intel_gt.h" 131 #include "intel_gt_irq.h" 132 #include "intel_gt_pm.h" 133 #include "intel_gt_regs.h" 134 #include "intel_gt_requests.h" 135 #include "intel_lrc.h" 136 #include "intel_lrc_reg.h" 137 #include "intel_mocs.h" 138 #include "intel_reset.h" 139 #include "intel_ring.h" 140 #include "intel_workarounds.h" 141 #include "shmem_utils.h" 142 143 #define RING_EXECLIST_QFULL (1 << 0x2) 144 #define RING_EXECLIST1_VALID (1 << 0x3) 145 #define RING_EXECLIST0_VALID (1 << 0x4) 146 #define RING_EXECLIST_ACTIVE_STATUS (3 << 0xE) 147 #define RING_EXECLIST1_ACTIVE (1 << 0x11) 148 #define RING_EXECLIST0_ACTIVE (1 << 0x12) 149 150 #define GEN8_CTX_STATUS_IDLE_ACTIVE (1 << 0) 151 #define GEN8_CTX_STATUS_PREEMPTED (1 << 1) 152 #define GEN8_CTX_STATUS_ELEMENT_SWITCH (1 << 2) 153 #define GEN8_CTX_STATUS_ACTIVE_IDLE (1 << 3) 154 #define GEN8_CTX_STATUS_COMPLETE (1 << 4) 155 #define GEN8_CTX_STATUS_LITE_RESTORE (1 << 15) 156 157 #define GEN8_CTX_STATUS_COMPLETED_MASK \ 158 (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) 159 160 #define GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE (0x1) /* lower csb dword */ 161 #define GEN12_CTX_SWITCH_DETAIL(csb_dw) ((csb_dw) & 0xF) /* upper csb dword */ 162 #define GEN12_CSB_SW_CTX_ID_MASK GENMASK(25, 15) 163 #define GEN12_IDLE_CTX_ID 0x7FF 164 #define GEN12_CSB_CTX_VALID(csb_dw) \ 165 (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) 166 167 #define XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE BIT(1) /* upper csb dword */ 168 #define XEHP_CSB_SW_CTX_ID_MASK GENMASK(31, 10) 169 #define XEHP_IDLE_CTX_ID 0xFFFF 170 #define XEHP_CSB_CTX_VALID(csb_dw) \ 171 (FIELD_GET(XEHP_CSB_SW_CTX_ID_MASK, csb_dw) != XEHP_IDLE_CTX_ID) 172 173 /* Typical size of the average request (2 pipecontrols and a MI_BB) */ 174 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ 175 176 struct virtual_engine { 177 struct intel_engine_cs base; 178 struct intel_context context; 179 struct rcu_work rcu; 180 181 /* 182 * We allow only a single request through the virtual engine at a time 183 * (each request in the timeline waits for the completion fence of 184 * the previous before being submitted). By restricting ourselves to 185 * only submitting a single request, each request is placed on to a 186 * physical to maximise load spreading (by virtue of the late greedy 187 * scheduling -- each real engine takes the next available request 188 * upon idling). 189 */ 190 struct i915_request *request; 191 192 /* 193 * We keep a rbtree of available virtual engines inside each physical 194 * engine, sorted by priority. Here we preallocate the nodes we need 195 * for the virtual engine, indexed by physical_engine->id. 196 */ 197 struct ve_node { 198 struct rb_node rb; 199 int prio; 200 } nodes[I915_NUM_ENGINES]; 201 202 /* And finally, which physical engines this virtual engine maps onto. */ 203 unsigned int num_siblings; 204 struct intel_engine_cs *siblings[]; 205 }; 206 207 static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) 208 { 209 GEM_BUG_ON(!intel_engine_is_virtual(engine)); 210 return container_of(engine, struct virtual_engine, base); 211 } 212 213 static struct intel_context * 214 execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count, 215 unsigned long flags); 216 217 static struct i915_request * 218 __active_request(const struct intel_timeline * const tl, 219 struct i915_request *rq, 220 int error) 221 { 222 struct i915_request *active = rq; 223 224 list_for_each_entry_from_reverse(rq, &tl->requests, link) { 225 if (__i915_request_is_complete(rq)) 226 break; 227 228 if (error) { 229 i915_request_set_error_once(rq, error); 230 __i915_request_skip(rq); 231 } 232 active = rq; 233 } 234 235 return active; 236 } 237 238 static struct i915_request * 239 active_request(const struct intel_timeline * const tl, struct i915_request *rq) 240 { 241 return __active_request(tl, rq, 0); 242 } 243 244 static void ring_set_paused(const struct intel_engine_cs *engine, int state) 245 { 246 /* 247 * We inspect HWS_PREEMPT with a semaphore inside 248 * engine->emit_fini_breadcrumb. If the dword is true, 249 * the ring is paused as the semaphore will busywait 250 * until the dword is false. 251 */ 252 engine->status_page.addr[I915_GEM_HWS_PREEMPT] = state; 253 if (state) 254 wmb(); 255 } 256 257 static struct i915_priolist *to_priolist(struct rb_node *rb) 258 { 259 return rb_entry(rb, struct i915_priolist, node); 260 } 261 262 static int rq_prio(const struct i915_request *rq) 263 { 264 return READ_ONCE(rq->sched.attr.priority); 265 } 266 267 static int effective_prio(const struct i915_request *rq) 268 { 269 int prio = rq_prio(rq); 270 271 /* 272 * If this request is special and must not be interrupted at any 273 * cost, so be it. Note we are only checking the most recent request 274 * in the context and so may be masking an earlier vip request. It 275 * is hoped that under the conditions where nopreempt is used, this 276 * will not matter (i.e. all requests to that context will be 277 * nopreempt for as long as desired). 278 */ 279 if (i915_request_has_nopreempt(rq)) 280 prio = I915_PRIORITY_UNPREEMPTABLE; 281 282 return prio; 283 } 284 285 static int queue_prio(const struct i915_sched_engine *sched_engine) 286 { 287 struct rb_node *rb; 288 289 rb = rb_first_cached(&sched_engine->queue); 290 if (!rb) 291 return INT_MIN; 292 293 return to_priolist(rb)->priority; 294 } 295 296 static int virtual_prio(const struct intel_engine_execlists *el) 297 { 298 struct rb_node *rb = rb_first_cached(&el->virtual); 299 300 return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; 301 } 302 303 static bool need_preempt(const struct intel_engine_cs *engine, 304 const struct i915_request *rq) 305 { 306 int last_prio; 307 308 if (!intel_engine_has_semaphores(engine)) 309 return false; 310 311 /* 312 * Check if the current priority hint merits a preemption attempt. 313 * 314 * We record the highest value priority we saw during rescheduling 315 * prior to this dequeue, therefore we know that if it is strictly 316 * less than the current tail of ESLP[0], we do not need to force 317 * a preempt-to-idle cycle. 318 * 319 * However, the priority hint is a mere hint that we may need to 320 * preempt. If that hint is stale or we may be trying to preempt 321 * ourselves, ignore the request. 322 * 323 * More naturally we would write 324 * prio >= max(0, last); 325 * except that we wish to prevent triggering preemption at the same 326 * priority level: the task that is running should remain running 327 * to preserve FIFO ordering of dependencies. 328 */ 329 last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); 330 if (engine->sched_engine->queue_priority_hint <= last_prio) 331 return false; 332 333 /* 334 * Check against the first request in ELSP[1], it will, thanks to the 335 * power of PI, be the highest priority of that context. 336 */ 337 if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) && 338 rq_prio(list_next_entry(rq, sched.link)) > last_prio) 339 return true; 340 341 /* 342 * If the inflight context did not trigger the preemption, then maybe 343 * it was the set of queued requests? Pick the highest priority in 344 * the queue (the first active priolist) and see if it deserves to be 345 * running instead of ELSP[0]. 346 * 347 * The highest priority request in the queue can not be either 348 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same 349 * context, it's priority would not exceed ELSP[0] aka last_prio. 350 */ 351 return max(virtual_prio(&engine->execlists), 352 queue_prio(engine->sched_engine)) > last_prio; 353 } 354 355 __maybe_unused static bool 356 assert_priority_queue(const struct i915_request *prev, 357 const struct i915_request *next) 358 { 359 /* 360 * Without preemption, the prev may refer to the still active element 361 * which we refuse to let go. 362 * 363 * Even with preemption, there are times when we think it is better not 364 * to preempt and leave an ostensibly lower priority request in flight. 365 */ 366 if (i915_request_is_active(prev)) 367 return true; 368 369 return rq_prio(prev) >= rq_prio(next); 370 } 371 372 static struct i915_request * 373 __unwind_incomplete_requests(struct intel_engine_cs *engine) 374 { 375 struct i915_request *rq, *rn, *active = NULL; 376 struct list_head *pl; 377 int prio = I915_PRIORITY_INVALID; 378 379 lockdep_assert_held(&engine->sched_engine->lock); 380 381 list_for_each_entry_safe_reverse(rq, rn, 382 &engine->sched_engine->requests, 383 sched.link) { 384 if (__i915_request_is_complete(rq)) { 385 list_del_init(&rq->sched.link); 386 continue; 387 } 388 389 __i915_request_unsubmit(rq); 390 391 GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); 392 if (rq_prio(rq) != prio) { 393 prio = rq_prio(rq); 394 pl = i915_sched_lookup_priolist(engine->sched_engine, 395 prio); 396 } 397 GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); 398 399 list_move(&rq->sched.link, pl); 400 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 401 402 /* Check in case we rollback so far we wrap [size/2] */ 403 if (intel_ring_direction(rq->ring, 404 rq->tail, 405 rq->ring->tail + 8) > 0) 406 rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; 407 408 active = rq; 409 } 410 411 return active; 412 } 413 414 static void 415 execlists_context_status_change(struct i915_request *rq, unsigned long status) 416 { 417 /* 418 * Only used when GVT-g is enabled now. When GVT-g is disabled, 419 * The compiler should eliminate this function as dead-code. 420 */ 421 if (!IS_ENABLED(CONFIG_DRM_I915_GVT)) 422 return; 423 424 atomic_notifier_call_chain(&rq->engine->context_status_notifier, 425 status, rq); 426 } 427 428 static void reset_active(struct i915_request *rq, 429 struct intel_engine_cs *engine) 430 { 431 struct intel_context * const ce = rq->context; 432 u32 head; 433 434 /* 435 * The executing context has been cancelled. We want to prevent 436 * further execution along this context and propagate the error on 437 * to anything depending on its results. 438 * 439 * In __i915_request_submit(), we apply the -EIO and remove the 440 * requests' payloads for any banned requests. But first, we must 441 * rewind the context back to the start of the incomplete request so 442 * that we do not jump back into the middle of the batch. 443 * 444 * We preserve the breadcrumbs and semaphores of the incomplete 445 * requests so that inter-timeline dependencies (i.e other timelines) 446 * remain correctly ordered. And we defer to __i915_request_submit() 447 * so that all asynchronous waits are correctly handled. 448 */ 449 ENGINE_TRACE(engine, "{ reset rq=%llx:%lld }\n", 450 rq->fence.context, rq->fence.seqno); 451 452 /* On resubmission of the active request, payload will be scrubbed */ 453 if (__i915_request_is_complete(rq)) 454 head = rq->tail; 455 else 456 head = __active_request(ce->timeline, rq, -EIO)->head; 457 head = intel_ring_wrap(ce->ring, head); 458 459 /* Scrub the context image to prevent replaying the previous batch */ 460 lrc_init_regs(ce, engine, true); 461 462 /* We've switched away, so this should be a no-op, but intent matters */ 463 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 464 } 465 466 static bool bad_request(const struct i915_request *rq) 467 { 468 return rq->fence.error && i915_request_started(rq); 469 } 470 471 static struct intel_engine_cs * 472 __execlists_schedule_in(struct i915_request *rq) 473 { 474 struct intel_engine_cs * const engine = rq->engine; 475 struct intel_context * const ce = rq->context; 476 477 intel_context_get(ce); 478 479 if (unlikely(intel_context_is_closed(ce) && 480 !intel_engine_has_heartbeat(engine))) 481 intel_context_set_exiting(ce); 482 483 if (unlikely(!intel_context_is_schedulable(ce) || bad_request(rq))) 484 reset_active(rq, engine); 485 486 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 487 lrc_check_regs(ce, engine, "before"); 488 489 if (ce->tag) { 490 /* Use a fixed tag for OA and friends */ 491 GEM_BUG_ON(ce->tag <= BITS_PER_LONG); 492 ce->lrc.ccid = ce->tag; 493 } else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) { 494 /* We don't need a strict matching tag, just different values */ 495 unsigned int tag = ffs(READ_ONCE(engine->context_tag)); 496 497 GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); 498 clear_bit(tag - 1, &engine->context_tag); 499 ce->lrc.ccid = tag << (XEHP_SW_CTX_ID_SHIFT - 32); 500 501 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 502 503 } else { 504 /* We don't need a strict matching tag, just different values */ 505 unsigned int tag = __ffs(engine->context_tag); 506 507 GEM_BUG_ON(tag >= BITS_PER_LONG); 508 __clear_bit(tag, &engine->context_tag); 509 ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); 510 511 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 512 } 513 514 ce->lrc.ccid |= engine->execlists.ccid; 515 516 __intel_gt_pm_get(engine->gt); 517 if (engine->fw_domain && !engine->fw_active++) 518 intel_uncore_forcewake_get(engine->uncore, engine->fw_domain); 519 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); 520 intel_engine_context_in(engine); 521 522 CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); 523 524 return engine; 525 } 526 527 static void execlists_schedule_in(struct i915_request *rq, int idx) 528 { 529 struct intel_context * const ce = rq->context; 530 struct intel_engine_cs *old; 531 532 GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); 533 trace_i915_request_in(rq, idx); 534 535 old = ce->inflight; 536 if (!old) 537 old = __execlists_schedule_in(rq); 538 WRITE_ONCE(ce->inflight, ptr_inc(old)); 539 540 GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); 541 } 542 543 static void 544 resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) 545 { 546 struct intel_engine_cs *engine = rq->engine; 547 548 spin_lock_irq(&engine->sched_engine->lock); 549 550 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 551 WRITE_ONCE(rq->engine, &ve->base); 552 ve->base.submit_request(rq); 553 554 spin_unlock_irq(&engine->sched_engine->lock); 555 } 556 557 static void kick_siblings(struct i915_request *rq, struct intel_context *ce) 558 { 559 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 560 struct intel_engine_cs *engine = rq->engine; 561 562 /* 563 * After this point, the rq may be transferred to a new sibling, so 564 * before we clear ce->inflight make sure that the context has been 565 * removed from the b->signalers and furthermore we need to make sure 566 * that the concurrent iterator in signal_irq_work is no longer 567 * following ce->signal_link. 568 */ 569 if (!list_empty(&ce->signals)) 570 intel_context_remove_breadcrumbs(ce, engine->breadcrumbs); 571 572 /* 573 * This engine is now too busy to run this virtual request, so 574 * see if we can find an alternative engine for it to execute on. 575 * Once a request has become bonded to this engine, we treat it the 576 * same as other native request. 577 */ 578 if (i915_request_in_priority_queue(rq) && 579 rq->execution_mask != engine->mask) 580 resubmit_virtual_request(rq, ve); 581 582 if (READ_ONCE(ve->request)) 583 tasklet_hi_schedule(&ve->base.sched_engine->tasklet); 584 } 585 586 static void __execlists_schedule_out(struct i915_request * const rq, 587 struct intel_context * const ce) 588 { 589 struct intel_engine_cs * const engine = rq->engine; 590 unsigned int ccid; 591 592 /* 593 * NB process_csb() is not under the engine->sched_engine->lock and hence 594 * schedule_out can race with schedule_in meaning that we should 595 * refrain from doing non-trivial work here. 596 */ 597 598 CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); 599 GEM_BUG_ON(ce->inflight != engine); 600 601 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 602 lrc_check_regs(ce, engine, "after"); 603 604 /* 605 * If we have just completed this context, the engine may now be 606 * idle and we want to re-enter powersaving. 607 */ 608 if (intel_timeline_is_last(ce->timeline, rq) && 609 __i915_request_is_complete(rq)) 610 intel_engine_add_retire(engine, ce->timeline); 611 612 ccid = ce->lrc.ccid; 613 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) { 614 ccid >>= XEHP_SW_CTX_ID_SHIFT - 32; 615 ccid &= XEHP_MAX_CONTEXT_HW_ID; 616 } else { 617 ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; 618 ccid &= GEN12_MAX_CONTEXT_HW_ID; 619 } 620 621 if (ccid < BITS_PER_LONG) { 622 GEM_BUG_ON(ccid == 0); 623 GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); 624 __set_bit(ccid - 1, &engine->context_tag); 625 } 626 intel_engine_context_out(engine); 627 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); 628 if (engine->fw_domain && !--engine->fw_active) 629 intel_uncore_forcewake_put(engine->uncore, engine->fw_domain); 630 intel_gt_pm_put_async_untracked(engine->gt); 631 632 /* 633 * If this is part of a virtual engine, its next request may 634 * have been blocked waiting for access to the active context. 635 * We have to kick all the siblings again in case we need to 636 * switch (e.g. the next request is not runnable on this 637 * engine). Hopefully, we will already have submitted the next 638 * request before the tasklet runs and do not need to rebuild 639 * each virtual tree and kick everyone again. 640 */ 641 if (ce->engine != engine) 642 kick_siblings(rq, ce); 643 644 WRITE_ONCE(ce->inflight, NULL); 645 intel_context_put(ce); 646 } 647 648 static inline void execlists_schedule_out(struct i915_request *rq) 649 { 650 struct intel_context * const ce = rq->context; 651 652 trace_i915_request_out(rq); 653 654 GEM_BUG_ON(!ce->inflight); 655 ce->inflight = ptr_dec(ce->inflight); 656 if (!__intel_context_inflight_count(ce->inflight)) 657 __execlists_schedule_out(rq, ce); 658 659 i915_request_put(rq); 660 } 661 662 static u32 map_i915_prio_to_lrc_desc_prio(int prio) 663 { 664 if (prio > I915_PRIORITY_NORMAL) 665 return GEN12_CTX_PRIORITY_HIGH; 666 else if (prio < I915_PRIORITY_NORMAL) 667 return GEN12_CTX_PRIORITY_LOW; 668 else 669 return GEN12_CTX_PRIORITY_NORMAL; 670 } 671 672 static u64 execlists_update_context(struct i915_request *rq) 673 { 674 struct intel_context *ce = rq->context; 675 u64 desc; 676 u32 tail, prev; 677 678 desc = ce->lrc.desc; 679 if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY) 680 desc |= map_i915_prio_to_lrc_desc_prio(rq_prio(rq)); 681 682 /* 683 * WaIdleLiteRestore:bdw,skl 684 * 685 * We should never submit the context with the same RING_TAIL twice 686 * just in case we submit an empty ring, which confuses the HW. 687 * 688 * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of 689 * the normal request to be able to always advance the RING_TAIL on 690 * subsequent resubmissions (for lite restore). Should that fail us, 691 * and we try and submit the same tail again, force the context 692 * reload. 693 * 694 * If we need to return to a preempted context, we need to skip the 695 * lite-restore and force it to reload the RING_TAIL. Otherwise, the 696 * HW has a tendency to ignore us rewinding the TAIL to the end of 697 * an earlier request. 698 */ 699 GEM_BUG_ON(ce->lrc_reg_state[CTX_RING_TAIL] != rq->ring->tail); 700 prev = rq->ring->tail; 701 tail = intel_ring_set_tail(rq->ring, rq->tail); 702 if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0)) 703 desc |= CTX_DESC_FORCE_RESTORE; 704 ce->lrc_reg_state[CTX_RING_TAIL] = tail; 705 rq->tail = rq->wa_tail; 706 707 /* 708 * Make sure the context image is complete before we submit it to HW. 709 * 710 * Ostensibly, writes (including the WCB) should be flushed prior to 711 * an uncached write such as our mmio register access, the empirical 712 * evidence (esp. on Braswell) suggests that the WC write into memory 713 * may not be visible to the HW prior to the completion of the UC 714 * register write and that we may begin execution from the context 715 * before its image is complete leading to invalid PD chasing. 716 */ 717 wmb(); 718 719 ce->lrc.desc &= ~CTX_DESC_FORCE_RESTORE; 720 return desc; 721 } 722 723 static void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) 724 { 725 if (execlists->ctrl_reg) { 726 writel(lower_32_bits(desc), execlists->submit_reg + port * 2); 727 writel(upper_32_bits(desc), execlists->submit_reg + port * 2 + 1); 728 } else { 729 writel(upper_32_bits(desc), execlists->submit_reg); 730 writel(lower_32_bits(desc), execlists->submit_reg); 731 } 732 } 733 734 static __maybe_unused char * 735 dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) 736 { 737 if (!rq) 738 return ""; 739 740 snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d", 741 prefix, 742 rq->context->lrc.ccid, 743 rq->fence.context, rq->fence.seqno, 744 __i915_request_is_complete(rq) ? "!" : 745 __i915_request_has_started(rq) ? "*" : 746 "", 747 rq_prio(rq)); 748 749 return buf; 750 } 751 752 static __maybe_unused noinline void 753 trace_ports(const struct intel_engine_execlists *execlists, 754 const char *msg, 755 struct i915_request * const *ports) 756 { 757 const struct intel_engine_cs *engine = 758 container_of(execlists, typeof(*engine), execlists); 759 char __maybe_unused p0[40], p1[40]; 760 761 if (!ports[0]) 762 return; 763 764 ENGINE_TRACE(engine, "%s { %s%s }\n", msg, 765 dump_port(p0, sizeof(p0), "", ports[0]), 766 dump_port(p1, sizeof(p1), ", ", ports[1])); 767 } 768 769 static bool 770 reset_in_progress(const struct intel_engine_cs *engine) 771 { 772 return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet)); 773 } 774 775 static __maybe_unused noinline bool 776 assert_pending_valid(const struct intel_engine_execlists *execlists, 777 const char *msg) 778 { 779 struct intel_engine_cs *engine = 780 container_of(execlists, typeof(*engine), execlists); 781 struct i915_request * const *port, *rq, *prev = NULL; 782 struct intel_context *ce = NULL; 783 u32 ccid = -1; 784 785 trace_ports(execlists, msg, execlists->pending); 786 787 /* We may be messing around with the lists during reset, lalala */ 788 if (reset_in_progress(engine)) 789 return true; 790 791 if (!execlists->pending[0]) { 792 GEM_TRACE_ERR("%s: Nothing pending for promotion!\n", 793 engine->name); 794 return false; 795 } 796 797 if (execlists->pending[execlists_num_ports(execlists)]) { 798 GEM_TRACE_ERR("%s: Excess pending[%d] for promotion!\n", 799 engine->name, execlists_num_ports(execlists)); 800 return false; 801 } 802 803 for (port = execlists->pending; (rq = *port); port++) { 804 unsigned long flags; 805 bool ok = true; 806 807 GEM_BUG_ON(!kref_read(&rq->fence.refcount)); 808 GEM_BUG_ON(!i915_request_is_active(rq)); 809 810 if (ce == rq->context) { 811 GEM_TRACE_ERR("%s: Dup context:%llx in pending[%zd]\n", 812 engine->name, 813 ce->timeline->fence_context, 814 port - execlists->pending); 815 return false; 816 } 817 ce = rq->context; 818 819 if (ccid == ce->lrc.ccid) { 820 GEM_TRACE_ERR("%s: Dup ccid:%x context:%llx in pending[%zd]\n", 821 engine->name, 822 ccid, ce->timeline->fence_context, 823 port - execlists->pending); 824 return false; 825 } 826 ccid = ce->lrc.ccid; 827 828 /* 829 * Sentinels are supposed to be the last request so they flush 830 * the current execution off the HW. Check that they are the only 831 * request in the pending submission. 832 * 833 * NB: Due to the async nature of preempt-to-busy and request 834 * cancellation we need to handle the case where request 835 * becomes a sentinel in parallel to CSB processing. 836 */ 837 if (prev && i915_request_has_sentinel(prev) && 838 !READ_ONCE(prev->fence.error)) { 839 GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n", 840 engine->name, 841 ce->timeline->fence_context, 842 port - execlists->pending); 843 return false; 844 } 845 prev = rq; 846 847 /* 848 * We want virtual requests to only be in the first slot so 849 * that they are never stuck behind a hog and can be immediately 850 * transferred onto the next idle engine. 851 */ 852 if (rq->execution_mask != engine->mask && 853 port != execlists->pending) { 854 GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", 855 engine->name, 856 ce->timeline->fence_context, 857 port - execlists->pending); 858 return false; 859 } 860 861 /* Hold tightly onto the lock to prevent concurrent retires! */ 862 if (!spin_trylock_irqsave(&rq->lock, flags)) 863 continue; 864 865 if (__i915_request_is_complete(rq)) 866 goto unlock; 867 868 if (i915_active_is_idle(&ce->active) && 869 !intel_context_is_barrier(ce)) { 870 GEM_TRACE_ERR("%s: Inactive context:%llx in pending[%zd]\n", 871 engine->name, 872 ce->timeline->fence_context, 873 port - execlists->pending); 874 ok = false; 875 goto unlock; 876 } 877 878 if (!i915_vma_is_pinned(ce->state)) { 879 GEM_TRACE_ERR("%s: Unpinned context:%llx in pending[%zd]\n", 880 engine->name, 881 ce->timeline->fence_context, 882 port - execlists->pending); 883 ok = false; 884 goto unlock; 885 } 886 887 if (!i915_vma_is_pinned(ce->ring->vma)) { 888 GEM_TRACE_ERR("%s: Unpinned ring:%llx in pending[%zd]\n", 889 engine->name, 890 ce->timeline->fence_context, 891 port - execlists->pending); 892 ok = false; 893 goto unlock; 894 } 895 896 unlock: 897 spin_unlock_irqrestore(&rq->lock, flags); 898 if (!ok) 899 return false; 900 } 901 902 return ce; 903 } 904 905 static void execlists_submit_ports(struct intel_engine_cs *engine) 906 { 907 struct intel_engine_execlists *execlists = &engine->execlists; 908 unsigned int n; 909 910 GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); 911 912 /* 913 * We can skip acquiring intel_runtime_pm_get() here as it was taken 914 * on our behalf by the request (see i915_gem_mark_busy()) and it will 915 * not be relinquished until the device is idle (see 916 * i915_gem_idle_work_handler()). As a precaution, we make sure 917 * that all ELSP are drained i.e. we have processed the CSB, 918 * before allowing ourselves to idle and calling intel_runtime_pm_put(). 919 */ 920 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 921 922 /* 923 * ELSQ note: the submit queue is not cleared after being submitted 924 * to the HW so we need to make sure we always clean it up. This is 925 * currently ensured by the fact that we always write the same number 926 * of elsq entries, keep this in mind before changing the loop below. 927 */ 928 for (n = execlists_num_ports(execlists); n--; ) { 929 struct i915_request *rq = execlists->pending[n]; 930 931 write_desc(execlists, 932 rq ? execlists_update_context(rq) : 0, 933 n); 934 } 935 936 /* we need to manually load the submit queue */ 937 if (execlists->ctrl_reg) 938 writel(EL_CTRL_LOAD, execlists->ctrl_reg); 939 } 940 941 static bool ctx_single_port_submission(const struct intel_context *ce) 942 { 943 return (IS_ENABLED(CONFIG_DRM_I915_GVT) && 944 intel_context_force_single_submission(ce)); 945 } 946 947 static bool can_merge_ctx(const struct intel_context *prev, 948 const struct intel_context *next) 949 { 950 if (prev != next) 951 return false; 952 953 if (ctx_single_port_submission(prev)) 954 return false; 955 956 return true; 957 } 958 959 static unsigned long i915_request_flags(const struct i915_request *rq) 960 { 961 return READ_ONCE(rq->fence.flags); 962 } 963 964 static bool can_merge_rq(const struct i915_request *prev, 965 const struct i915_request *next) 966 { 967 GEM_BUG_ON(prev == next); 968 GEM_BUG_ON(!assert_priority_queue(prev, next)); 969 970 /* 971 * We do not submit known completed requests. Therefore if the next 972 * request is already completed, we can pretend to merge it in 973 * with the previous context (and we will skip updating the ELSP 974 * and tracking). Thus hopefully keeping the ELSP full with active 975 * contexts, despite the best efforts of preempt-to-busy to confuse 976 * us. 977 */ 978 if (__i915_request_is_complete(next)) 979 return true; 980 981 if (unlikely((i915_request_flags(prev) | i915_request_flags(next)) & 982 (BIT(I915_FENCE_FLAG_NOPREEMPT) | 983 BIT(I915_FENCE_FLAG_SENTINEL)))) 984 return false; 985 986 if (!can_merge_ctx(prev->context, next->context)) 987 return false; 988 989 GEM_BUG_ON(i915_seqno_passed(prev->fence.seqno, next->fence.seqno)); 990 return true; 991 } 992 993 static bool virtual_matches(const struct virtual_engine *ve, 994 const struct i915_request *rq, 995 const struct intel_engine_cs *engine) 996 { 997 const struct intel_engine_cs *inflight; 998 999 if (!rq) 1000 return false; 1001 1002 if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ 1003 return false; 1004 1005 /* 1006 * We track when the HW has completed saving the context image 1007 * (i.e. when we have seen the final CS event switching out of 1008 * the context) and must not overwrite the context image before 1009 * then. This restricts us to only using the active engine 1010 * while the previous virtualized request is inflight (so 1011 * we reuse the register offsets). This is a very small 1012 * hystersis on the greedy seelction algorithm. 1013 */ 1014 inflight = intel_context_inflight(&ve->context); 1015 if (inflight && inflight != engine) 1016 return false; 1017 1018 return true; 1019 } 1020 1021 static struct virtual_engine * 1022 first_virtual_engine(struct intel_engine_cs *engine) 1023 { 1024 struct intel_engine_execlists *el = &engine->execlists; 1025 struct rb_node *rb = rb_first_cached(&el->virtual); 1026 1027 while (rb) { 1028 struct virtual_engine *ve = 1029 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1030 struct i915_request *rq = READ_ONCE(ve->request); 1031 1032 /* lazily cleanup after another engine handled rq */ 1033 if (!rq || !virtual_matches(ve, rq, engine)) { 1034 rb_erase_cached(rb, &el->virtual); 1035 RB_CLEAR_NODE(rb); 1036 rb = rb_first_cached(&el->virtual); 1037 continue; 1038 } 1039 1040 return ve; 1041 } 1042 1043 return NULL; 1044 } 1045 1046 static void virtual_xfer_context(struct virtual_engine *ve, 1047 struct intel_engine_cs *engine) 1048 { 1049 unsigned int n; 1050 1051 if (likely(engine == ve->siblings[0])) 1052 return; 1053 1054 GEM_BUG_ON(READ_ONCE(ve->context.inflight)); 1055 if (!intel_engine_has_relative_mmio(engine)) 1056 lrc_update_offsets(&ve->context, engine); 1057 1058 /* 1059 * Move the bound engine to the top of the list for 1060 * future execution. We then kick this tasklet first 1061 * before checking others, so that we preferentially 1062 * reuse this set of bound registers. 1063 */ 1064 for (n = 1; n < ve->num_siblings; n++) { 1065 if (ve->siblings[n] == engine) { 1066 swap(ve->siblings[n], ve->siblings[0]); 1067 break; 1068 } 1069 } 1070 } 1071 1072 static void defer_request(struct i915_request *rq, struct list_head * const pl) 1073 { 1074 LIST_HEAD(list); 1075 1076 /* 1077 * We want to move the interrupted request to the back of 1078 * the round-robin list (i.e. its priority level), but 1079 * in doing so, we must then move all requests that were in 1080 * flight and were waiting for the interrupted request to 1081 * be run after it again. 1082 */ 1083 do { 1084 struct i915_dependency *p; 1085 1086 GEM_BUG_ON(i915_request_is_active(rq)); 1087 list_move_tail(&rq->sched.link, pl); 1088 1089 for_each_waiter(p, rq) { 1090 struct i915_request *w = 1091 container_of(p->waiter, typeof(*w), sched); 1092 1093 if (p->flags & I915_DEPENDENCY_WEAK) 1094 continue; 1095 1096 /* Leave semaphores spinning on the other engines */ 1097 if (w->engine != rq->engine) 1098 continue; 1099 1100 /* No waiter should start before its signaler */ 1101 GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && 1102 __i915_request_has_started(w) && 1103 !__i915_request_is_complete(rq)); 1104 1105 if (!i915_request_is_ready(w)) 1106 continue; 1107 1108 if (rq_prio(w) < rq_prio(rq)) 1109 continue; 1110 1111 GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); 1112 GEM_BUG_ON(i915_request_is_active(w)); 1113 list_move_tail(&w->sched.link, &list); 1114 } 1115 1116 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 1117 } while (rq); 1118 } 1119 1120 static void defer_active(struct intel_engine_cs *engine) 1121 { 1122 struct i915_request *rq; 1123 1124 rq = __unwind_incomplete_requests(engine); 1125 if (!rq) 1126 return; 1127 1128 defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine, 1129 rq_prio(rq))); 1130 } 1131 1132 static bool 1133 timeslice_yield(const struct intel_engine_execlists *el, 1134 const struct i915_request *rq) 1135 { 1136 /* 1137 * Once bitten, forever smitten! 1138 * 1139 * If the active context ever busy-waited on a semaphore, 1140 * it will be treated as a hog until the end of its timeslice (i.e. 1141 * until it is scheduled out and replaced by a new submission, 1142 * possibly even its own lite-restore). The HW only sends an interrupt 1143 * on the first miss, and we do know if that semaphore has been 1144 * signaled, or even if it is now stuck on another semaphore. Play 1145 * safe, yield if it might be stuck -- it will be given a fresh 1146 * timeslice in the near future. 1147 */ 1148 return rq->context->lrc.ccid == READ_ONCE(el->yield); 1149 } 1150 1151 static bool needs_timeslice(const struct intel_engine_cs *engine, 1152 const struct i915_request *rq) 1153 { 1154 if (!intel_engine_has_timeslices(engine)) 1155 return false; 1156 1157 /* If not currently active, or about to switch, wait for next event */ 1158 if (!rq || __i915_request_is_complete(rq)) 1159 return false; 1160 1161 /* We do not need to start the timeslice until after the ACK */ 1162 if (READ_ONCE(engine->execlists.pending[0])) 1163 return false; 1164 1165 /* If ELSP[1] is occupied, always check to see if worth slicing */ 1166 if (!list_is_last_rcu(&rq->sched.link, 1167 &engine->sched_engine->requests)) { 1168 ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); 1169 return true; 1170 } 1171 1172 /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ 1173 if (!i915_sched_engine_is_empty(engine->sched_engine)) { 1174 ENGINE_TRACE(engine, "timeslice required for queue\n"); 1175 return true; 1176 } 1177 1178 if (!RB_EMPTY_ROOT(&engine->execlists.virtual.rb_root)) { 1179 ENGINE_TRACE(engine, "timeslice required for virtual\n"); 1180 return true; 1181 } 1182 1183 return false; 1184 } 1185 1186 static bool 1187 timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq) 1188 { 1189 const struct intel_engine_execlists *el = &engine->execlists; 1190 1191 if (i915_request_has_nopreempt(rq) && __i915_request_has_started(rq)) 1192 return false; 1193 1194 if (!needs_timeslice(engine, rq)) 1195 return false; 1196 1197 return timer_expired(&el->timer) || timeslice_yield(el, rq); 1198 } 1199 1200 static unsigned long timeslice(const struct intel_engine_cs *engine) 1201 { 1202 return READ_ONCE(engine->props.timeslice_duration_ms); 1203 } 1204 1205 static void start_timeslice(struct intel_engine_cs *engine) 1206 { 1207 struct intel_engine_execlists *el = &engine->execlists; 1208 unsigned long duration; 1209 1210 /* Disable the timer if there is nothing to switch to */ 1211 duration = 0; 1212 if (needs_timeslice(engine, *el->active)) { 1213 /* Avoid continually prolonging an active timeslice */ 1214 if (timer_active(&el->timer)) { 1215 /* 1216 * If we just submitted a new ELSP after an old 1217 * context, that context may have already consumed 1218 * its timeslice, so recheck. 1219 */ 1220 if (!timer_pending(&el->timer)) 1221 tasklet_hi_schedule(&engine->sched_engine->tasklet); 1222 return; 1223 } 1224 1225 duration = timeslice(engine); 1226 } 1227 1228 set_timer_ms(&el->timer, duration); 1229 } 1230 1231 static void record_preemption(struct intel_engine_execlists *execlists) 1232 { 1233 (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); 1234 } 1235 1236 static unsigned long active_preempt_timeout(struct intel_engine_cs *engine, 1237 const struct i915_request *rq) 1238 { 1239 if (!rq) 1240 return 0; 1241 1242 /* Only allow ourselves to force reset the currently active context */ 1243 engine->execlists.preempt_target = rq; 1244 1245 /* Force a fast reset for terminated contexts (ignoring sysfs!) */ 1246 if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq))) 1247 return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS; 1248 1249 return READ_ONCE(engine->props.preempt_timeout_ms); 1250 } 1251 1252 static void set_preempt_timeout(struct intel_engine_cs *engine, 1253 const struct i915_request *rq) 1254 { 1255 if (!intel_engine_has_preempt_reset(engine)) 1256 return; 1257 1258 set_timer_ms(&engine->execlists.preempt, 1259 active_preempt_timeout(engine, rq)); 1260 } 1261 1262 static bool completed(const struct i915_request *rq) 1263 { 1264 if (i915_request_has_sentinel(rq)) 1265 return false; 1266 1267 return __i915_request_is_complete(rq); 1268 } 1269 1270 static void execlists_dequeue(struct intel_engine_cs *engine) 1271 { 1272 struct intel_engine_execlists * const execlists = &engine->execlists; 1273 struct i915_sched_engine * const sched_engine = engine->sched_engine; 1274 struct i915_request **port = execlists->pending; 1275 struct i915_request ** const last_port = port + execlists->port_mask; 1276 struct i915_request *last, * const *active; 1277 struct virtual_engine *ve; 1278 struct rb_node *rb; 1279 bool submit = false; 1280 1281 /* 1282 * Hardware submission is through 2 ports. Conceptually each port 1283 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is 1284 * static for a context, and unique to each, so we only execute 1285 * requests belonging to a single context from each ring. RING_HEAD 1286 * is maintained by the CS in the context image, it marks the place 1287 * where it got up to last time, and through RING_TAIL we tell the CS 1288 * where we want to execute up to this time. 1289 * 1290 * In this list the requests are in order of execution. Consecutive 1291 * requests from the same context are adjacent in the ringbuffer. We 1292 * can combine these requests into a single RING_TAIL update: 1293 * 1294 * RING_HEAD...req1...req2 1295 * ^- RING_TAIL 1296 * since to execute req2 the CS must first execute req1. 1297 * 1298 * Our goal then is to point each port to the end of a consecutive 1299 * sequence of requests as being the most optimal (fewest wake ups 1300 * and context switches) submission. 1301 */ 1302 1303 spin_lock(&sched_engine->lock); 1304 1305 /* 1306 * If the queue is higher priority than the last 1307 * request in the currently active context, submit afresh. 1308 * We will resubmit again afterwards in case we need to split 1309 * the active context to interject the preemption request, 1310 * i.e. we will retrigger preemption following the ack in case 1311 * of trouble. 1312 * 1313 */ 1314 active = execlists->active; 1315 while ((last = *active) && completed(last)) 1316 active++; 1317 1318 if (last) { 1319 if (need_preempt(engine, last)) { 1320 ENGINE_TRACE(engine, 1321 "preempting last=%llx:%lld, prio=%d, hint=%d\n", 1322 last->fence.context, 1323 last->fence.seqno, 1324 last->sched.attr.priority, 1325 sched_engine->queue_priority_hint); 1326 record_preemption(execlists); 1327 1328 /* 1329 * Don't let the RING_HEAD advance past the breadcrumb 1330 * as we unwind (and until we resubmit) so that we do 1331 * not accidentally tell it to go backwards. 1332 */ 1333 ring_set_paused(engine, 1); 1334 1335 /* 1336 * Note that we have not stopped the GPU at this point, 1337 * so we are unwinding the incomplete requests as they 1338 * remain inflight and so by the time we do complete 1339 * the preemption, some of the unwound requests may 1340 * complete! 1341 */ 1342 __unwind_incomplete_requests(engine); 1343 1344 last = NULL; 1345 } else if (timeslice_expired(engine, last)) { 1346 ENGINE_TRACE(engine, 1347 "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", 1348 str_yes_no(timer_expired(&execlists->timer)), 1349 last->fence.context, last->fence.seqno, 1350 rq_prio(last), 1351 sched_engine->queue_priority_hint, 1352 str_yes_no(timeslice_yield(execlists, last))); 1353 1354 /* 1355 * Consume this timeslice; ensure we start a new one. 1356 * 1357 * The timeslice expired, and we will unwind the 1358 * running contexts and recompute the next ELSP. 1359 * If that submit will be the same pair of contexts 1360 * (due to dependency ordering), we will skip the 1361 * submission. If we don't cancel the timer now, 1362 * we will see that the timer has expired and 1363 * reschedule the tasklet; continually until the 1364 * next context switch or other preemption event. 1365 * 1366 * Since we have decided to reschedule based on 1367 * consumption of this timeslice, if we submit the 1368 * same context again, grant it a full timeslice. 1369 */ 1370 cancel_timer(&execlists->timer); 1371 ring_set_paused(engine, 1); 1372 defer_active(engine); 1373 1374 /* 1375 * Unlike for preemption, if we rewind and continue 1376 * executing the same context as previously active, 1377 * the order of execution will remain the same and 1378 * the tail will only advance. We do not need to 1379 * force a full context restore, as a lite-restore 1380 * is sufficient to resample the monotonic TAIL. 1381 * 1382 * If we switch to any other context, similarly we 1383 * will not rewind TAIL of current context, and 1384 * normal save/restore will preserve state and allow 1385 * us to later continue executing the same request. 1386 */ 1387 last = NULL; 1388 } else { 1389 /* 1390 * Otherwise if we already have a request pending 1391 * for execution after the current one, we can 1392 * just wait until the next CS event before 1393 * queuing more. In either case we will force a 1394 * lite-restore preemption event, but if we wait 1395 * we hopefully coalesce several updates into a single 1396 * submission. 1397 */ 1398 if (active[1]) { 1399 /* 1400 * Even if ELSP[1] is occupied and not worthy 1401 * of timeslices, our queue might be. 1402 */ 1403 spin_unlock(&sched_engine->lock); 1404 return; 1405 } 1406 } 1407 } 1408 1409 /* XXX virtual is always taking precedence */ 1410 while ((ve = first_virtual_engine(engine))) { 1411 struct i915_request *rq; 1412 1413 spin_lock(&ve->base.sched_engine->lock); 1414 1415 rq = ve->request; 1416 if (unlikely(!virtual_matches(ve, rq, engine))) 1417 goto unlock; /* lost the race to a sibling */ 1418 1419 GEM_BUG_ON(rq->engine != &ve->base); 1420 GEM_BUG_ON(rq->context != &ve->context); 1421 1422 if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) { 1423 spin_unlock(&ve->base.sched_engine->lock); 1424 break; 1425 } 1426 1427 if (last && !can_merge_rq(last, rq)) { 1428 spin_unlock(&ve->base.sched_engine->lock); 1429 spin_unlock(&engine->sched_engine->lock); 1430 return; /* leave this for another sibling */ 1431 } 1432 1433 ENGINE_TRACE(engine, 1434 "virtual rq=%llx:%lld%s, new engine? %s\n", 1435 rq->fence.context, 1436 rq->fence.seqno, 1437 __i915_request_is_complete(rq) ? "!" : 1438 __i915_request_has_started(rq) ? "*" : 1439 "", 1440 str_yes_no(engine != ve->siblings[0])); 1441 1442 WRITE_ONCE(ve->request, NULL); 1443 WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN); 1444 1445 rb = &ve->nodes[engine->id].rb; 1446 rb_erase_cached(rb, &execlists->virtual); 1447 RB_CLEAR_NODE(rb); 1448 1449 GEM_BUG_ON(!(rq->execution_mask & engine->mask)); 1450 WRITE_ONCE(rq->engine, engine); 1451 1452 if (__i915_request_submit(rq)) { 1453 /* 1454 * Only after we confirm that we will submit 1455 * this request (i.e. it has not already 1456 * completed), do we want to update the context. 1457 * 1458 * This serves two purposes. It avoids 1459 * unnecessary work if we are resubmitting an 1460 * already completed request after timeslicing. 1461 * But more importantly, it prevents us altering 1462 * ve->siblings[] on an idle context, where 1463 * we may be using ve->siblings[] in 1464 * virtual_context_enter / virtual_context_exit. 1465 */ 1466 virtual_xfer_context(ve, engine); 1467 GEM_BUG_ON(ve->siblings[0] != engine); 1468 1469 submit = true; 1470 last = rq; 1471 } 1472 1473 i915_request_put(rq); 1474 unlock: 1475 spin_unlock(&ve->base.sched_engine->lock); 1476 1477 /* 1478 * Hmm, we have a bunch of virtual engine requests, 1479 * but the first one was already completed (thanks 1480 * preempt-to-busy!). Keep looking at the veng queue 1481 * until we have no more relevant requests (i.e. 1482 * the normal submit queue has higher priority). 1483 */ 1484 if (submit) 1485 break; 1486 } 1487 1488 while ((rb = rb_first_cached(&sched_engine->queue))) { 1489 struct i915_priolist *p = to_priolist(rb); 1490 struct i915_request *rq, *rn; 1491 1492 priolist_for_each_request_consume(rq, rn, p) { 1493 bool merge = true; 1494 1495 /* 1496 * Can we combine this request with the current port? 1497 * It has to be the same context/ringbuffer and not 1498 * have any exceptions (e.g. GVT saying never to 1499 * combine contexts). 1500 * 1501 * If we can combine the requests, we can execute both 1502 * by updating the RING_TAIL to point to the end of the 1503 * second request, and so we never need to tell the 1504 * hardware about the first. 1505 */ 1506 if (last && !can_merge_rq(last, rq)) { 1507 /* 1508 * If we are on the second port and cannot 1509 * combine this request with the last, then we 1510 * are done. 1511 */ 1512 if (port == last_port) 1513 goto done; 1514 1515 /* 1516 * We must not populate both ELSP[] with the 1517 * same LRCA, i.e. we must submit 2 different 1518 * contexts if we submit 2 ELSP. 1519 */ 1520 if (last->context == rq->context) 1521 goto done; 1522 1523 if (i915_request_has_sentinel(last)) 1524 goto done; 1525 1526 /* 1527 * We avoid submitting virtual requests into 1528 * the secondary ports so that we can migrate 1529 * the request immediately to another engine 1530 * rather than wait for the primary request. 1531 */ 1532 if (rq->execution_mask != engine->mask) 1533 goto done; 1534 1535 /* 1536 * If GVT overrides us we only ever submit 1537 * port[0], leaving port[1] empty. Note that we 1538 * also have to be careful that we don't queue 1539 * the same context (even though a different 1540 * request) to the second port. 1541 */ 1542 if (ctx_single_port_submission(last->context) || 1543 ctx_single_port_submission(rq->context)) 1544 goto done; 1545 1546 merge = false; 1547 } 1548 1549 if (__i915_request_submit(rq)) { 1550 if (!merge) { 1551 *port++ = i915_request_get(last); 1552 last = NULL; 1553 } 1554 1555 GEM_BUG_ON(last && 1556 !can_merge_ctx(last->context, 1557 rq->context)); 1558 GEM_BUG_ON(last && 1559 i915_seqno_passed(last->fence.seqno, 1560 rq->fence.seqno)); 1561 1562 submit = true; 1563 last = rq; 1564 } 1565 } 1566 1567 rb_erase_cached(&p->node, &sched_engine->queue); 1568 i915_priolist_free(p); 1569 } 1570 done: 1571 *port++ = i915_request_get(last); 1572 1573 /* 1574 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. 1575 * 1576 * We choose the priority hint such that if we add a request of greater 1577 * priority than this, we kick the submission tasklet to decide on 1578 * the right order of submitting the requests to hardware. We must 1579 * also be prepared to reorder requests as they are in-flight on the 1580 * HW. We derive the priority hint then as the first "hole" in 1581 * the HW submission ports and if there are no available slots, 1582 * the priority of the lowest executing request, i.e. last. 1583 * 1584 * When we do receive a higher priority request ready to run from the 1585 * user, see queue_request(), the priority hint is bumped to that 1586 * request triggering preemption on the next dequeue (or subsequent 1587 * interrupt for secondary ports). 1588 */ 1589 sched_engine->queue_priority_hint = queue_prio(sched_engine); 1590 i915_sched_engine_reset_on_empty(sched_engine); 1591 spin_unlock(&sched_engine->lock); 1592 1593 /* 1594 * We can skip poking the HW if we ended up with exactly the same set 1595 * of requests as currently running, e.g. trying to timeslice a pair 1596 * of ordered contexts. 1597 */ 1598 if (submit && 1599 memcmp(active, 1600 execlists->pending, 1601 (port - execlists->pending) * sizeof(*port))) { 1602 *port = NULL; 1603 while (port-- != execlists->pending) 1604 execlists_schedule_in(*port, port - execlists->pending); 1605 1606 WRITE_ONCE(execlists->yield, -1); 1607 set_preempt_timeout(engine, *active); 1608 execlists_submit_ports(engine); 1609 } else { 1610 ring_set_paused(engine, 0); 1611 while (port-- != execlists->pending) 1612 i915_request_put(*port); 1613 *execlists->pending = NULL; 1614 } 1615 } 1616 1617 static void execlists_dequeue_irq(struct intel_engine_cs *engine) 1618 { 1619 local_irq_disable(); /* Suspend interrupts across request submission */ 1620 execlists_dequeue(engine); 1621 local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ 1622 } 1623 1624 static void clear_ports(struct i915_request **ports, int count) 1625 { 1626 memset_p((void **)ports, NULL, count); 1627 } 1628 1629 static void 1630 copy_ports(struct i915_request **dst, struct i915_request **src, int count) 1631 { 1632 /* A memcpy_p() would be very useful here! */ 1633 while (count--) 1634 WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ 1635 } 1636 1637 static struct i915_request ** 1638 cancel_port_requests(struct intel_engine_execlists * const execlists, 1639 struct i915_request **inactive) 1640 { 1641 struct i915_request * const *port; 1642 1643 for (port = execlists->pending; *port; port++) 1644 *inactive++ = *port; 1645 clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); 1646 1647 /* Mark the end of active before we overwrite *active */ 1648 for (port = xchg(&execlists->active, execlists->pending); *port; port++) 1649 *inactive++ = *port; 1650 clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); 1651 1652 smp_wmb(); /* complete the seqlock for execlists_active() */ 1653 WRITE_ONCE(execlists->active, execlists->inflight); 1654 1655 /* Having cancelled all outstanding process_csb(), stop their timers */ 1656 GEM_BUG_ON(execlists->pending[0]); 1657 cancel_timer(&execlists->timer); 1658 cancel_timer(&execlists->preempt); 1659 1660 return inactive; 1661 } 1662 1663 /* 1664 * Starting with Gen12, the status has a new format: 1665 * 1666 * bit 0: switched to new queue 1667 * bit 1: reserved 1668 * bit 2: semaphore wait mode (poll or signal), only valid when 1669 * switch detail is set to "wait on semaphore" 1670 * bits 3-5: engine class 1671 * bits 6-11: engine instance 1672 * bits 12-14: reserved 1673 * bits 15-25: sw context id of the lrc the GT switched to 1674 * bits 26-31: sw counter of the lrc the GT switched to 1675 * bits 32-35: context switch detail 1676 * - 0: ctx complete 1677 * - 1: wait on sync flip 1678 * - 2: wait on vblank 1679 * - 3: wait on scanline 1680 * - 4: wait on semaphore 1681 * - 5: context preempted (not on SEMAPHORE_WAIT or 1682 * WAIT_FOR_EVENT) 1683 * bit 36: reserved 1684 * bits 37-43: wait detail (for switch detail 1 to 4) 1685 * bits 44-46: reserved 1686 * bits 47-57: sw context id of the lrc the GT switched away from 1687 * bits 58-63: sw counter of the lrc the GT switched away from 1688 * 1689 * Xe_HP csb shuffles things around compared to TGL: 1690 * 1691 * bits 0-3: context switch detail (same possible values as TGL) 1692 * bits 4-9: engine instance 1693 * bits 10-25: sw context id of the lrc the GT switched to 1694 * bits 26-31: sw counter of the lrc the GT switched to 1695 * bit 32: semaphore wait mode (poll or signal), Only valid when 1696 * switch detail is set to "wait on semaphore" 1697 * bit 33: switched to new queue 1698 * bits 34-41: wait detail (for switch detail 1 to 4) 1699 * bits 42-57: sw context id of the lrc the GT switched away from 1700 * bits 58-63: sw counter of the lrc the GT switched away from 1701 */ 1702 static inline bool 1703 __gen12_csb_parse(bool ctx_to_valid, bool ctx_away_valid, bool new_queue, 1704 u8 switch_detail) 1705 { 1706 /* 1707 * The context switch detail is not guaranteed to be 5 when a preemption 1708 * occurs, so we can't just check for that. The check below works for 1709 * all the cases we care about, including preemptions of WAIT 1710 * instructions and lite-restore. Preempt-to-idle via the CTRL register 1711 * would require some extra handling, but we don't support that. 1712 */ 1713 if (!ctx_away_valid || new_queue) { 1714 GEM_BUG_ON(!ctx_to_valid); 1715 return true; 1716 } 1717 1718 /* 1719 * switch detail = 5 is covered by the case above and we do not expect a 1720 * context switch on an unsuccessful wait instruction since we always 1721 * use polling mode. 1722 */ 1723 GEM_BUG_ON(switch_detail); 1724 return false; 1725 } 1726 1727 static bool xehp_csb_parse(const u64 csb) 1728 { 1729 return __gen12_csb_parse(XEHP_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ 1730 XEHP_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ 1731 upper_32_bits(csb) & XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, 1732 GEN12_CTX_SWITCH_DETAIL(lower_32_bits(csb))); 1733 } 1734 1735 static bool gen12_csb_parse(const u64 csb) 1736 { 1737 return __gen12_csb_parse(GEN12_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ 1738 GEN12_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ 1739 lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, 1740 GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); 1741 } 1742 1743 static bool gen8_csb_parse(const u64 csb) 1744 { 1745 return csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); 1746 } 1747 1748 static noinline u64 1749 wa_csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1750 { 1751 u64 entry; 1752 1753 /* 1754 * Reading from the HWSP has one particular advantage: we can detect 1755 * a stale entry. Since the write into HWSP is broken, we have no reason 1756 * to trust the HW at all, the mmio entry may equally be unordered, so 1757 * we prefer the path that is self-checking and as a last resort, 1758 * return the mmio value. 1759 * 1760 * tgl,dg1:HSDES#22011327657 1761 */ 1762 preempt_disable(); 1763 if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 10)) { 1764 int idx = csb - engine->execlists.csb_status; 1765 int status; 1766 1767 status = GEN8_EXECLISTS_STATUS_BUF; 1768 if (idx >= 6) { 1769 status = GEN11_EXECLISTS_STATUS_BUF2; 1770 idx -= 6; 1771 } 1772 status += sizeof(u64) * idx; 1773 1774 entry = intel_uncore_read64(engine->uncore, 1775 _MMIO(engine->mmio_base + status)); 1776 } 1777 preempt_enable(); 1778 1779 return entry; 1780 } 1781 1782 static u64 csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1783 { 1784 u64 entry = READ_ONCE(*csb); 1785 1786 /* 1787 * Unfortunately, the GPU does not always serialise its write 1788 * of the CSB entries before its write of the CSB pointer, at least 1789 * from the perspective of the CPU, using what is known as a Global 1790 * Observation Point. We may read a new CSB tail pointer, but then 1791 * read the stale CSB entries, causing us to misinterpret the 1792 * context-switch events, and eventually declare the GPU hung. 1793 * 1794 * icl:HSDES#1806554093 1795 * tgl:HSDES#22011248461 1796 */ 1797 if (unlikely(entry == -1)) 1798 entry = wa_csb_read(engine, csb); 1799 1800 /* Consume this entry so that we can spot its future reuse. */ 1801 WRITE_ONCE(*csb, -1); 1802 1803 /* ELSP is an implicit wmb() before the GPU wraps and overwrites csb */ 1804 return entry; 1805 } 1806 1807 static void new_timeslice(struct intel_engine_execlists *el) 1808 { 1809 /* By cancelling, we will start afresh in start_timeslice() */ 1810 cancel_timer(&el->timer); 1811 } 1812 1813 static struct i915_request ** 1814 process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 1815 { 1816 struct intel_engine_execlists * const execlists = &engine->execlists; 1817 u64 * const buf = execlists->csb_status; 1818 const u8 num_entries = execlists->csb_size; 1819 struct i915_request **prev; 1820 u8 head, tail; 1821 1822 /* 1823 * As we modify our execlists state tracking we require exclusive 1824 * access. Either we are inside the tasklet, or the tasklet is disabled 1825 * and we assume that is only inside the reset paths and so serialised. 1826 */ 1827 GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) && 1828 !reset_in_progress(engine)); 1829 1830 /* 1831 * Note that csb_write, csb_status may be either in HWSP or mmio. 1832 * When reading from the csb_write mmio register, we have to be 1833 * careful to only use the GEN8_CSB_WRITE_PTR portion, which is 1834 * the low 4bits. As it happens we know the next 4bits are always 1835 * zero and so we can simply masked off the low u8 of the register 1836 * and treat it identically to reading from the HWSP (without having 1837 * to use explicit shifting and masking, and probably bifurcating 1838 * the code to handle the legacy mmio read). 1839 */ 1840 head = execlists->csb_head; 1841 tail = READ_ONCE(*execlists->csb_write); 1842 if (unlikely(head == tail)) 1843 return inactive; 1844 1845 /* 1846 * We will consume all events from HW, or at least pretend to. 1847 * 1848 * The sequence of events from the HW is deterministic, and derived 1849 * from our writes to the ELSP, with a smidgen of variability for 1850 * the arrival of the asynchronous requests wrt to the inflight 1851 * execution. If the HW sends an event that does not correspond with 1852 * the one we are expecting, we have to abandon all hope as we lose 1853 * all tracking of what the engine is actually executing. We will 1854 * only detect we are out of sequence with the HW when we get an 1855 * 'impossible' event because we have already drained our own 1856 * preemption/promotion queue. If this occurs, we know that we likely 1857 * lost track of execution earlier and must unwind and restart, the 1858 * simplest way is by stop processing the event queue and force the 1859 * engine to reset. 1860 */ 1861 execlists->csb_head = tail; 1862 ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail); 1863 1864 /* 1865 * Hopefully paired with a wmb() in HW! 1866 * 1867 * We must complete the read of the write pointer before any reads 1868 * from the CSB, so that we do not see stale values. Without an rmb 1869 * (lfence) the HW may speculatively perform the CSB[] reads *before* 1870 * we perform the READ_ONCE(*csb_write). 1871 */ 1872 rmb(); 1873 1874 /* Remember who was last running under the timer */ 1875 prev = inactive; 1876 *prev = NULL; 1877 1878 do { 1879 bool promote; 1880 u64 csb; 1881 1882 if (++head == num_entries) 1883 head = 0; 1884 1885 /* 1886 * We are flying near dragons again. 1887 * 1888 * We hold a reference to the request in execlist_port[] 1889 * but no more than that. We are operating in softirq 1890 * context and so cannot hold any mutex or sleep. That 1891 * prevents us stopping the requests we are processing 1892 * in port[] from being retired simultaneously (the 1893 * breadcrumb will be complete before we see the 1894 * context-switch). As we only hold the reference to the 1895 * request, any pointer chasing underneath the request 1896 * is subject to a potential use-after-free. Thus we 1897 * store all of the bookkeeping within port[] as 1898 * required, and avoid using unguarded pointers beneath 1899 * request itself. The same applies to the atomic 1900 * status notifier. 1901 */ 1902 1903 csb = csb_read(engine, buf + head); 1904 ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", 1905 head, upper_32_bits(csb), lower_32_bits(csb)); 1906 1907 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) 1908 promote = xehp_csb_parse(csb); 1909 else if (GRAPHICS_VER(engine->i915) >= 12) 1910 promote = gen12_csb_parse(csb); 1911 else 1912 promote = gen8_csb_parse(csb); 1913 if (promote) { 1914 struct i915_request * const *old = execlists->active; 1915 1916 if (GEM_WARN_ON(!*execlists->pending)) { 1917 execlists->error_interrupt |= ERROR_CSB; 1918 break; 1919 } 1920 1921 ring_set_paused(engine, 0); 1922 1923 /* Point active to the new ELSP; prevent overwriting */ 1924 WRITE_ONCE(execlists->active, execlists->pending); 1925 smp_wmb(); /* notify execlists_active() */ 1926 1927 /* cancel old inflight, prepare for switch */ 1928 trace_ports(execlists, "preempted", old); 1929 while (*old) 1930 *inactive++ = *old++; 1931 1932 /* switch pending to inflight */ 1933 GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); 1934 copy_ports(execlists->inflight, 1935 execlists->pending, 1936 execlists_num_ports(execlists)); 1937 smp_wmb(); /* complete the seqlock */ 1938 WRITE_ONCE(execlists->active, execlists->inflight); 1939 1940 /* XXX Magic delay for tgl */ 1941 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 1942 1943 WRITE_ONCE(execlists->pending[0], NULL); 1944 } else { 1945 if (GEM_WARN_ON(!*execlists->active)) { 1946 execlists->error_interrupt |= ERROR_CSB; 1947 break; 1948 } 1949 1950 /* port0 completed, advanced to port1 */ 1951 trace_ports(execlists, "completed", execlists->active); 1952 1953 /* 1954 * We rely on the hardware being strongly 1955 * ordered, that the breadcrumb write is 1956 * coherent (visible from the CPU) before the 1957 * user interrupt is processed. One might assume 1958 * that the breadcrumb write being before the 1959 * user interrupt and the CS event for the context 1960 * switch would therefore be before the CS event 1961 * itself... 1962 */ 1963 if (GEM_SHOW_DEBUG() && 1964 !__i915_request_is_complete(*execlists->active)) { 1965 struct i915_request *rq = *execlists->active; 1966 const u32 *regs __maybe_unused = 1967 rq->context->lrc_reg_state; 1968 1969 ENGINE_TRACE(engine, 1970 "context completed before request!\n"); 1971 ENGINE_TRACE(engine, 1972 "ring:{start:0x%08x, head:%04x, tail:%04x, ctl:%08x, mode:%08x}\n", 1973 ENGINE_READ(engine, RING_START), 1974 ENGINE_READ(engine, RING_HEAD) & HEAD_ADDR, 1975 ENGINE_READ(engine, RING_TAIL) & TAIL_ADDR, 1976 ENGINE_READ(engine, RING_CTL), 1977 ENGINE_READ(engine, RING_MI_MODE)); 1978 ENGINE_TRACE(engine, 1979 "rq:{start:%08x, head:%04x, tail:%04x, seqno:%llx:%d, hwsp:%d}, ", 1980 i915_ggtt_offset(rq->ring->vma), 1981 rq->head, rq->tail, 1982 rq->fence.context, 1983 lower_32_bits(rq->fence.seqno), 1984 hwsp_seqno(rq)); 1985 ENGINE_TRACE(engine, 1986 "ctx:{start:%08x, head:%04x, tail:%04x}, ", 1987 regs[CTX_RING_START], 1988 regs[CTX_RING_HEAD], 1989 regs[CTX_RING_TAIL]); 1990 } 1991 1992 *inactive++ = *execlists->active++; 1993 1994 GEM_BUG_ON(execlists->active - execlists->inflight > 1995 execlists_num_ports(execlists)); 1996 } 1997 } while (head != tail); 1998 1999 /* 2000 * Gen11 has proven to fail wrt global observation point between 2001 * entry and tail update, failing on the ordering and thus 2002 * we see an old entry in the context status buffer. 2003 * 2004 * Forcibly evict out entries for the next gpu csb update, 2005 * to increase the odds that we get a fresh entries with non 2006 * working hardware. The cost for doing so comes out mostly with 2007 * the wash as hardware, working or not, will need to do the 2008 * invalidation before. 2009 */ 2010 drm_clflush_virt_range(&buf[0], num_entries * sizeof(buf[0])); 2011 2012 /* 2013 * We assume that any event reflects a change in context flow 2014 * and merits a fresh timeslice. We reinstall the timer after 2015 * inspecting the queue to see if we need to resumbit. 2016 */ 2017 if (*prev != *execlists->active) { /* elide lite-restores */ 2018 struct intel_context *prev_ce = NULL, *active_ce = NULL; 2019 2020 /* 2021 * Note the inherent discrepancy between the HW runtime, 2022 * recorded as part of the context switch, and the CPU 2023 * adjustment for active contexts. We have to hope that 2024 * the delay in processing the CS event is very small 2025 * and consistent. It works to our advantage to have 2026 * the CPU adjustment _undershoot_ (i.e. start later than) 2027 * the CS timestamp so we never overreport the runtime 2028 * and correct overselves later when updating from HW. 2029 */ 2030 if (*prev) 2031 prev_ce = (*prev)->context; 2032 if (*execlists->active) 2033 active_ce = (*execlists->active)->context; 2034 if (prev_ce != active_ce) { 2035 if (prev_ce) 2036 lrc_runtime_stop(prev_ce); 2037 if (active_ce) 2038 lrc_runtime_start(active_ce); 2039 } 2040 new_timeslice(execlists); 2041 } 2042 2043 return inactive; 2044 } 2045 2046 static void post_process_csb(struct i915_request **port, 2047 struct i915_request **last) 2048 { 2049 while (port != last) 2050 execlists_schedule_out(*port++); 2051 } 2052 2053 static void __execlists_hold(struct i915_request *rq) 2054 { 2055 LIST_HEAD(list); 2056 2057 do { 2058 struct i915_dependency *p; 2059 2060 if (i915_request_is_active(rq)) 2061 __i915_request_unsubmit(rq); 2062 2063 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2064 list_move_tail(&rq->sched.link, 2065 &rq->engine->sched_engine->hold); 2066 i915_request_set_hold(rq); 2067 RQ_TRACE(rq, "on hold\n"); 2068 2069 for_each_waiter(p, rq) { 2070 struct i915_request *w = 2071 container_of(p->waiter, typeof(*w), sched); 2072 2073 if (p->flags & I915_DEPENDENCY_WEAK) 2074 continue; 2075 2076 /* Leave semaphores spinning on the other engines */ 2077 if (w->engine != rq->engine) 2078 continue; 2079 2080 if (!i915_request_is_ready(w)) 2081 continue; 2082 2083 if (__i915_request_is_complete(w)) 2084 continue; 2085 2086 if (i915_request_on_hold(w)) 2087 continue; 2088 2089 list_move_tail(&w->sched.link, &list); 2090 } 2091 2092 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2093 } while (rq); 2094 } 2095 2096 static bool execlists_hold(struct intel_engine_cs *engine, 2097 struct i915_request *rq) 2098 { 2099 if (i915_request_on_hold(rq)) 2100 return false; 2101 2102 spin_lock_irq(&engine->sched_engine->lock); 2103 2104 if (__i915_request_is_complete(rq)) { /* too late! */ 2105 rq = NULL; 2106 goto unlock; 2107 } 2108 2109 /* 2110 * Transfer this request onto the hold queue to prevent it 2111 * being resumbitted to HW (and potentially completed) before we have 2112 * released it. Since we may have already submitted following 2113 * requests, we need to remove those as well. 2114 */ 2115 GEM_BUG_ON(i915_request_on_hold(rq)); 2116 GEM_BUG_ON(rq->engine != engine); 2117 __execlists_hold(rq); 2118 GEM_BUG_ON(list_empty(&engine->sched_engine->hold)); 2119 2120 unlock: 2121 spin_unlock_irq(&engine->sched_engine->lock); 2122 return rq; 2123 } 2124 2125 static bool hold_request(const struct i915_request *rq) 2126 { 2127 struct i915_dependency *p; 2128 bool result = false; 2129 2130 /* 2131 * If one of our ancestors is on hold, we must also be on hold, 2132 * otherwise we will bypass it and execute before it. 2133 */ 2134 rcu_read_lock(); 2135 for_each_signaler(p, rq) { 2136 const struct i915_request *s = 2137 container_of(p->signaler, typeof(*s), sched); 2138 2139 if (s->engine != rq->engine) 2140 continue; 2141 2142 result = i915_request_on_hold(s); 2143 if (result) 2144 break; 2145 } 2146 rcu_read_unlock(); 2147 2148 return result; 2149 } 2150 2151 static void __execlists_unhold(struct i915_request *rq) 2152 { 2153 LIST_HEAD(list); 2154 2155 do { 2156 struct i915_dependency *p; 2157 2158 RQ_TRACE(rq, "hold release\n"); 2159 2160 GEM_BUG_ON(!i915_request_on_hold(rq)); 2161 GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); 2162 2163 i915_request_clear_hold(rq); 2164 list_move_tail(&rq->sched.link, 2165 i915_sched_lookup_priolist(rq->engine->sched_engine, 2166 rq_prio(rq))); 2167 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2168 2169 /* Also release any children on this engine that are ready */ 2170 for_each_waiter(p, rq) { 2171 struct i915_request *w = 2172 container_of(p->waiter, typeof(*w), sched); 2173 2174 if (p->flags & I915_DEPENDENCY_WEAK) 2175 continue; 2176 2177 if (w->engine != rq->engine) 2178 continue; 2179 2180 if (!i915_request_on_hold(w)) 2181 continue; 2182 2183 /* Check that no other parents are also on hold */ 2184 if (hold_request(w)) 2185 continue; 2186 2187 list_move_tail(&w->sched.link, &list); 2188 } 2189 2190 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2191 } while (rq); 2192 } 2193 2194 static void execlists_unhold(struct intel_engine_cs *engine, 2195 struct i915_request *rq) 2196 { 2197 spin_lock_irq(&engine->sched_engine->lock); 2198 2199 /* 2200 * Move this request back to the priority queue, and all of its 2201 * children and grandchildren that were suspended along with it. 2202 */ 2203 __execlists_unhold(rq); 2204 2205 if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) { 2206 engine->sched_engine->queue_priority_hint = rq_prio(rq); 2207 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2208 } 2209 2210 spin_unlock_irq(&engine->sched_engine->lock); 2211 } 2212 2213 struct execlists_capture { 2214 struct work_struct work; 2215 struct i915_request *rq; 2216 struct i915_gpu_coredump *error; 2217 }; 2218 2219 static void execlists_capture_work(struct work_struct *work) 2220 { 2221 struct execlists_capture *cap = container_of(work, typeof(*cap), work); 2222 const gfp_t gfp = __GFP_KSWAPD_RECLAIM | __GFP_RETRY_MAYFAIL | 2223 __GFP_NOWARN; 2224 struct intel_engine_cs *engine = cap->rq->engine; 2225 struct intel_gt_coredump *gt = cap->error->gt; 2226 struct intel_engine_capture_vma *vma; 2227 2228 /* Compress all the objects attached to the request, slow! */ 2229 vma = intel_engine_coredump_add_request(gt->engine, cap->rq, gfp); 2230 if (vma) { 2231 struct i915_vma_compress *compress = 2232 i915_vma_capture_prepare(gt); 2233 2234 intel_engine_coredump_add_vma(gt->engine, vma, compress); 2235 i915_vma_capture_finish(gt, compress); 2236 } 2237 2238 gt->simulated = gt->engine->simulated; 2239 cap->error->simulated = gt->simulated; 2240 2241 /* Publish the error state, and announce it to the world */ 2242 i915_error_state_store(cap->error); 2243 i915_gpu_coredump_put(cap->error); 2244 2245 /* Return this request and all that depend upon it for signaling */ 2246 execlists_unhold(engine, cap->rq); 2247 i915_request_put(cap->rq); 2248 2249 kfree(cap); 2250 } 2251 2252 static struct execlists_capture *capture_regs(struct intel_engine_cs *engine) 2253 { 2254 const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; 2255 struct execlists_capture *cap; 2256 2257 cap = kmalloc_obj(*cap, gfp); 2258 if (!cap) 2259 return NULL; 2260 2261 cap->error = i915_gpu_coredump_alloc(engine->i915, gfp); 2262 if (!cap->error) 2263 goto err_cap; 2264 2265 cap->error->gt = intel_gt_coredump_alloc(engine->gt, gfp, CORE_DUMP_FLAG_NONE); 2266 if (!cap->error->gt) 2267 goto err_gpu; 2268 2269 cap->error->gt->engine = intel_engine_coredump_alloc(engine, gfp, CORE_DUMP_FLAG_NONE); 2270 if (!cap->error->gt->engine) 2271 goto err_gt; 2272 2273 cap->error->gt->engine->hung = true; 2274 2275 return cap; 2276 2277 err_gt: 2278 kfree(cap->error->gt); 2279 err_gpu: 2280 kfree(cap->error); 2281 err_cap: 2282 kfree(cap); 2283 return NULL; 2284 } 2285 2286 static struct i915_request * 2287 active_context(struct intel_engine_cs *engine, u32 ccid) 2288 { 2289 const struct intel_engine_execlists * const el = &engine->execlists; 2290 struct i915_request * const *port, *rq; 2291 2292 /* 2293 * Use the most recent result from process_csb(), but just in case 2294 * we trigger an error (via interrupt) before the first CS event has 2295 * been written, peek at the next submission. 2296 */ 2297 2298 for (port = el->active; (rq = *port); port++) { 2299 if (rq->context->lrc.ccid == ccid) { 2300 ENGINE_TRACE(engine, 2301 "ccid:%x found at active:%zd\n", 2302 ccid, port - el->active); 2303 return rq; 2304 } 2305 } 2306 2307 for (port = el->pending; (rq = *port); port++) { 2308 if (rq->context->lrc.ccid == ccid) { 2309 ENGINE_TRACE(engine, 2310 "ccid:%x found at pending:%zd\n", 2311 ccid, port - el->pending); 2312 return rq; 2313 } 2314 } 2315 2316 ENGINE_TRACE(engine, "ccid:%x not found\n", ccid); 2317 return NULL; 2318 } 2319 2320 static u32 active_ccid(struct intel_engine_cs *engine) 2321 { 2322 return ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI); 2323 } 2324 2325 static void execlists_capture(struct intel_engine_cs *engine) 2326 { 2327 struct drm_i915_private *i915 = engine->i915; 2328 struct execlists_capture *cap; 2329 2330 if (!IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)) 2331 return; 2332 2333 /* 2334 * We need to _quickly_ capture the engine state before we reset. 2335 * We are inside an atomic section (softirq) here and we are delaying 2336 * the forced preemption event. 2337 */ 2338 cap = capture_regs(engine); 2339 if (!cap) 2340 return; 2341 2342 spin_lock_irq(&engine->sched_engine->lock); 2343 cap->rq = active_context(engine, active_ccid(engine)); 2344 if (cap->rq) { 2345 cap->rq = active_request(cap->rq->context->timeline, cap->rq); 2346 cap->rq = i915_request_get_rcu(cap->rq); 2347 } 2348 spin_unlock_irq(&engine->sched_engine->lock); 2349 if (!cap->rq) 2350 goto err_free; 2351 2352 /* 2353 * Remove the request from the execlists queue, and take ownership 2354 * of the request. We pass it to our worker who will _slowly_ compress 2355 * all the pages the _user_ requested for debugging their batch, after 2356 * which we return it to the queue for signaling. 2357 * 2358 * By removing them from the execlists queue, we also remove the 2359 * requests from being processed by __unwind_incomplete_requests() 2360 * during the intel_engine_reset(), and so they will *not* be replayed 2361 * afterwards. 2362 * 2363 * Note that because we have not yet reset the engine at this point, 2364 * it is possible for the request that we have identified as being 2365 * guilty, did in fact complete and we will then hit an arbitration 2366 * point allowing the outstanding preemption to succeed. The likelihood 2367 * of that is very low (as capturing of the engine registers should be 2368 * fast enough to run inside an irq-off atomic section!), so we will 2369 * simply hold that request accountable for being non-preemptible 2370 * long enough to force the reset. 2371 */ 2372 if (!execlists_hold(engine, cap->rq)) 2373 goto err_rq; 2374 2375 INIT_WORK(&cap->work, execlists_capture_work); 2376 queue_work(i915->unordered_wq, &cap->work); 2377 return; 2378 2379 err_rq: 2380 i915_request_put(cap->rq); 2381 err_free: 2382 i915_gpu_coredump_put(cap->error); 2383 kfree(cap); 2384 } 2385 2386 static void execlists_reset(struct intel_engine_cs *engine, const char *msg) 2387 { 2388 const unsigned int bit = I915_RESET_ENGINE + engine->id; 2389 unsigned long *lock = &engine->gt->reset.flags; 2390 2391 if (!intel_has_reset_engine(engine->gt)) 2392 return; 2393 2394 if (test_and_set_bit(bit, lock)) 2395 return; 2396 2397 ENGINE_TRACE(engine, "reset for %s\n", msg); 2398 2399 /* Mark this tasklet as disabled to avoid waiting for it to complete */ 2400 tasklet_disable_nosync(&engine->sched_engine->tasklet); 2401 2402 ring_set_paused(engine, 1); /* Freeze the current request in place */ 2403 execlists_capture(engine); 2404 intel_engine_reset(engine, msg); 2405 2406 tasklet_enable(&engine->sched_engine->tasklet); 2407 clear_and_wake_up_bit(bit, lock); 2408 } 2409 2410 static bool preempt_timeout(const struct intel_engine_cs *const engine) 2411 { 2412 const struct timer_list *t = &engine->execlists.preempt; 2413 2414 if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) 2415 return false; 2416 2417 if (!timer_expired(t)) 2418 return false; 2419 2420 return engine->execlists.pending[0]; 2421 } 2422 2423 /* 2424 * Check the unread Context Status Buffers and manage the submission of new 2425 * contexts to the ELSP accordingly. 2426 */ 2427 static void execlists_submission_tasklet(struct tasklet_struct *t) 2428 { 2429 struct i915_sched_engine *sched_engine = 2430 from_tasklet(sched_engine, t, tasklet); 2431 struct intel_engine_cs * const engine = sched_engine->private_data; 2432 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2433 struct i915_request **inactive; 2434 2435 rcu_read_lock(); 2436 inactive = process_csb(engine, post); 2437 GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); 2438 2439 if (unlikely(preempt_timeout(engine))) { 2440 const struct i915_request *rq = *engine->execlists.active; 2441 2442 /* 2443 * If after the preempt-timeout expired, we are still on the 2444 * same active request/context as before we initiated the 2445 * preemption, reset the engine. 2446 * 2447 * However, if we have processed a CS event to switch contexts, 2448 * but not yet processed the CS event for the pending 2449 * preemption, reset the timer allowing the new context to 2450 * gracefully exit. 2451 */ 2452 cancel_timer(&engine->execlists.preempt); 2453 if (rq == engine->execlists.preempt_target) 2454 engine->execlists.error_interrupt |= ERROR_PREEMPT; 2455 else 2456 set_timer_ms(&engine->execlists.preempt, 2457 active_preempt_timeout(engine, rq)); 2458 } 2459 2460 if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { 2461 const char *msg; 2462 2463 /* Generate the error message in priority wrt to the user! */ 2464 if (engine->execlists.error_interrupt & GENMASK(15, 0)) 2465 msg = "CS error"; /* thrown by a user payload */ 2466 else if (engine->execlists.error_interrupt & ERROR_CSB) 2467 msg = "invalid CSB event"; 2468 else if (engine->execlists.error_interrupt & ERROR_PREEMPT) 2469 msg = "preemption time out"; 2470 else 2471 msg = "internal error"; 2472 2473 engine->execlists.error_interrupt = 0; 2474 execlists_reset(engine, msg); 2475 } 2476 2477 if (!engine->execlists.pending[0]) { 2478 execlists_dequeue_irq(engine); 2479 start_timeslice(engine); 2480 } 2481 2482 post_process_csb(post, inactive); 2483 rcu_read_unlock(); 2484 } 2485 2486 static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir) 2487 { 2488 bool tasklet = false; 2489 2490 if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) { 2491 u32 eir; 2492 2493 /* Upper 16b are the enabling mask, rsvd for internal errors */ 2494 eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0); 2495 ENGINE_TRACE(engine, "CS error: %x\n", eir); 2496 2497 /* Disable the error interrupt until after the reset */ 2498 if (likely(eir)) { 2499 ENGINE_WRITE(engine, RING_EMR, ~0u); 2500 ENGINE_WRITE(engine, RING_EIR, eir); 2501 WRITE_ONCE(engine->execlists.error_interrupt, eir); 2502 tasklet = true; 2503 } 2504 } 2505 2506 if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) { 2507 WRITE_ONCE(engine->execlists.yield, 2508 ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI)); 2509 ENGINE_TRACE(engine, "semaphore yield: %08x\n", 2510 engine->execlists.yield); 2511 if (timer_delete(&engine->execlists.timer)) 2512 tasklet = true; 2513 } 2514 2515 if (iir & GT_CONTEXT_SWITCH_INTERRUPT) 2516 tasklet = true; 2517 2518 if (iir & GT_RENDER_USER_INTERRUPT) 2519 intel_engine_signal_breadcrumbs(engine); 2520 2521 if (tasklet) 2522 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2523 } 2524 2525 static void __execlists_kick(struct intel_engine_execlists *execlists) 2526 { 2527 struct intel_engine_cs *engine = 2528 container_of(execlists, typeof(*engine), execlists); 2529 2530 /* Kick the tasklet for some interrupt coalescing and reset handling */ 2531 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2532 } 2533 2534 #define execlists_kick(t, member) \ 2535 __execlists_kick(container_of(t, struct intel_engine_execlists, member)) 2536 2537 static void execlists_timeslice(struct timer_list *timer) 2538 { 2539 execlists_kick(timer, timer); 2540 } 2541 2542 static void execlists_preempt(struct timer_list *timer) 2543 { 2544 execlists_kick(timer, preempt); 2545 } 2546 2547 static void queue_request(struct intel_engine_cs *engine, 2548 struct i915_request *rq) 2549 { 2550 GEM_BUG_ON(!list_empty(&rq->sched.link)); 2551 list_add_tail(&rq->sched.link, 2552 i915_sched_lookup_priolist(engine->sched_engine, 2553 rq_prio(rq))); 2554 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2555 } 2556 2557 static bool submit_queue(struct intel_engine_cs *engine, 2558 const struct i915_request *rq) 2559 { 2560 struct i915_sched_engine *sched_engine = engine->sched_engine; 2561 2562 if (rq_prio(rq) <= sched_engine->queue_priority_hint) 2563 return false; 2564 2565 sched_engine->queue_priority_hint = rq_prio(rq); 2566 return true; 2567 } 2568 2569 static bool ancestor_on_hold(const struct intel_engine_cs *engine, 2570 const struct i915_request *rq) 2571 { 2572 GEM_BUG_ON(i915_request_on_hold(rq)); 2573 return !list_empty(&engine->sched_engine->hold) && hold_request(rq); 2574 } 2575 2576 static void execlists_submit_request(struct i915_request *request) 2577 { 2578 struct intel_engine_cs *engine = request->engine; 2579 unsigned long flags; 2580 2581 /* Will be called from irq-context when using foreign fences. */ 2582 spin_lock_irqsave(&engine->sched_engine->lock, flags); 2583 2584 if (unlikely(ancestor_on_hold(engine, request))) { 2585 RQ_TRACE(request, "ancestor on hold\n"); 2586 list_add_tail(&request->sched.link, 2587 &engine->sched_engine->hold); 2588 i915_request_set_hold(request); 2589 } else { 2590 queue_request(engine, request); 2591 2592 GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); 2593 GEM_BUG_ON(list_empty(&request->sched.link)); 2594 2595 if (submit_queue(engine, request)) 2596 __execlists_kick(&engine->execlists); 2597 } 2598 2599 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 2600 } 2601 2602 static int 2603 __execlists_context_pre_pin(struct intel_context *ce, 2604 struct intel_engine_cs *engine, 2605 struct i915_gem_ww_ctx *ww, void **vaddr) 2606 { 2607 int err; 2608 2609 err = lrc_pre_pin(ce, engine, ww, vaddr); 2610 if (err) 2611 return err; 2612 2613 if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) { 2614 lrc_init_state(ce, engine, *vaddr); 2615 2616 __i915_gem_object_flush_map(ce->state->obj, 0, engine->context_size); 2617 } 2618 2619 return 0; 2620 } 2621 2622 static int execlists_context_pre_pin(struct intel_context *ce, 2623 struct i915_gem_ww_ctx *ww, 2624 void **vaddr) 2625 { 2626 return __execlists_context_pre_pin(ce, ce->engine, ww, vaddr); 2627 } 2628 2629 static int execlists_context_pin(struct intel_context *ce, void *vaddr) 2630 { 2631 return lrc_pin(ce, ce->engine, vaddr); 2632 } 2633 2634 static int execlists_context_alloc(struct intel_context *ce) 2635 { 2636 return lrc_alloc(ce, ce->engine); 2637 } 2638 2639 static void execlists_context_cancel_request(struct intel_context *ce, 2640 struct i915_request *rq) 2641 { 2642 struct intel_engine_cs *engine = NULL; 2643 2644 i915_request_active_engine(rq, &engine); 2645 2646 if (engine && intel_engine_pulse(engine)) 2647 intel_gt_handle_error(engine->gt, engine->mask, 0, 2648 "request cancellation by %s", 2649 current->comm); 2650 } 2651 2652 static struct intel_context * 2653 execlists_create_parallel(struct intel_engine_cs **engines, 2654 unsigned int num_siblings, 2655 unsigned int width) 2656 { 2657 struct intel_context *parent = NULL, *ce, *err; 2658 int i; 2659 2660 GEM_BUG_ON(num_siblings != 1); 2661 2662 for (i = 0; i < width; ++i) { 2663 ce = intel_context_create(engines[i]); 2664 if (IS_ERR(ce)) { 2665 err = ce; 2666 goto unwind; 2667 } 2668 2669 if (i == 0) 2670 parent = ce; 2671 else 2672 intel_context_bind_parent_child(parent, ce); 2673 } 2674 2675 parent->parallel.fence_context = dma_fence_context_alloc(1); 2676 2677 intel_context_set_nopreempt(parent); 2678 for_each_child(parent, ce) 2679 intel_context_set_nopreempt(ce); 2680 2681 return parent; 2682 2683 unwind: 2684 if (parent) 2685 intel_context_put(parent); 2686 return err; 2687 } 2688 2689 static const struct intel_context_ops execlists_context_ops = { 2690 .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES, 2691 2692 .alloc = execlists_context_alloc, 2693 2694 .cancel_request = execlists_context_cancel_request, 2695 2696 .pre_pin = execlists_context_pre_pin, 2697 .pin = execlists_context_pin, 2698 .unpin = lrc_unpin, 2699 .post_unpin = lrc_post_unpin, 2700 2701 .enter = intel_context_enter_engine, 2702 .exit = intel_context_exit_engine, 2703 2704 .reset = lrc_reset, 2705 .destroy = lrc_destroy, 2706 2707 .create_parallel = execlists_create_parallel, 2708 .create_virtual = execlists_create_virtual, 2709 }; 2710 2711 static int emit_pdps(struct i915_request *rq) 2712 { 2713 const struct intel_engine_cs * const engine = rq->engine; 2714 struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(rq->context->vm); 2715 int err, i; 2716 u32 *cs; 2717 2718 GEM_BUG_ON(intel_vgpu_active(rq->i915)); 2719 2720 /* 2721 * Beware ye of the dragons, this sequence is magic! 2722 * 2723 * Small changes to this sequence can cause anything from 2724 * GPU hangs to forcewake errors and machine lockups! 2725 */ 2726 2727 cs = intel_ring_begin(rq, 2); 2728 if (IS_ERR(cs)) 2729 return PTR_ERR(cs); 2730 2731 *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 2732 *cs++ = MI_NOOP; 2733 intel_ring_advance(rq, cs); 2734 2735 /* Flush any residual operations from the context load */ 2736 err = engine->emit_flush(rq, EMIT_FLUSH); 2737 if (err) 2738 return err; 2739 2740 /* Magic required to prevent forcewake errors! */ 2741 err = engine->emit_flush(rq, EMIT_INVALIDATE); 2742 if (err) 2743 return err; 2744 2745 cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); 2746 if (IS_ERR(cs)) 2747 return PTR_ERR(cs); 2748 2749 /* Ensure the LRI have landed before we invalidate & continue */ 2750 *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED; 2751 for (i = GEN8_3LVL_PDPES; i--; ) { 2752 const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i); 2753 u32 base = engine->mmio_base; 2754 2755 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i)); 2756 *cs++ = upper_32_bits(pd_daddr); 2757 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i)); 2758 *cs++ = lower_32_bits(pd_daddr); 2759 } 2760 *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 2761 intel_ring_advance(rq, cs); 2762 2763 intel_ring_advance(rq, cs); 2764 2765 return 0; 2766 } 2767 2768 static int execlists_request_alloc(struct i915_request *request) 2769 { 2770 int ret; 2771 2772 GEM_BUG_ON(!intel_context_is_pinned(request->context)); 2773 2774 /* 2775 * Flush enough space to reduce the likelihood of waiting after 2776 * we start building the request - in which case we will just 2777 * have to repeat work. 2778 */ 2779 request->reserved_space += EXECLISTS_REQUEST_SIZE; 2780 2781 /* 2782 * Note that after this point, we have committed to using 2783 * this request as it is being used to both track the 2784 * state of engine initialisation and liveness of the 2785 * golden renderstate above. Think twice before you try 2786 * to cancel/unwind this request now. 2787 */ 2788 2789 if (!i915_vm_is_4lvl(request->context->vm)) { 2790 ret = emit_pdps(request); 2791 if (ret) 2792 return ret; 2793 } 2794 2795 /* Unconditionally invalidate GPU caches and TLBs. */ 2796 ret = request->engine->emit_flush(request, EMIT_INVALIDATE); 2797 if (ret) 2798 return ret; 2799 2800 request->reserved_space -= EXECLISTS_REQUEST_SIZE; 2801 return 0; 2802 } 2803 2804 static void reset_csb_pointers(struct intel_engine_cs *engine) 2805 { 2806 struct intel_engine_execlists * const execlists = &engine->execlists; 2807 const unsigned int reset_value = execlists->csb_size - 1; 2808 2809 ring_set_paused(engine, 0); 2810 2811 /* 2812 * Sometimes Icelake forgets to reset its pointers on a GPU reset. 2813 * Bludgeon them with a mmio update to be sure. 2814 */ 2815 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2816 0xffff << 16 | reset_value << 8 | reset_value); 2817 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2818 2819 /* 2820 * After a reset, the HW starts writing into CSB entry [0]. We 2821 * therefore have to set our HEAD pointer back one entry so that 2822 * the *first* entry we check is entry 0. To complicate this further, 2823 * as we don't wait for the first interrupt after reset, we have to 2824 * fake the HW write to point back to the last entry so that our 2825 * inline comparison of our cached head position against the last HW 2826 * write works even before the first interrupt. 2827 */ 2828 execlists->csb_head = reset_value; 2829 WRITE_ONCE(*execlists->csb_write, reset_value); 2830 wmb(); /* Make sure this is visible to HW (paranoia?) */ 2831 2832 /* Check that the GPU does indeed update the CSB entries! */ 2833 memset(execlists->csb_status, -1, (reset_value + 1) * sizeof(u64)); 2834 drm_clflush_virt_range(execlists->csb_status, 2835 execlists->csb_size * 2836 sizeof(execlists->csb_status)); 2837 2838 /* Once more for luck and our trusty paranoia */ 2839 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2840 0xffff << 16 | reset_value << 8 | reset_value); 2841 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2842 2843 GEM_BUG_ON(READ_ONCE(*execlists->csb_write) != reset_value); 2844 } 2845 2846 static void sanitize_hwsp(struct intel_engine_cs *engine) 2847 { 2848 struct intel_timeline *tl; 2849 2850 list_for_each_entry(tl, &engine->status_page.timelines, engine_link) 2851 intel_timeline_reset_seqno(tl); 2852 } 2853 2854 static void execlists_sanitize(struct intel_engine_cs *engine) 2855 { 2856 GEM_BUG_ON(execlists_active(&engine->execlists)); 2857 2858 /* 2859 * Poison residual state on resume, in case the suspend didn't! 2860 * 2861 * We have to assume that across suspend/resume (or other loss 2862 * of control) that the contents of our pinned buffers has been 2863 * lost, replaced by garbage. Since this doesn't always happen, 2864 * let's poison such state so that we more quickly spot when 2865 * we falsely assume it has been preserved. 2866 */ 2867 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2868 memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE); 2869 2870 reset_csb_pointers(engine); 2871 2872 /* 2873 * The kernel_context HWSP is stored in the status_page. As above, 2874 * that may be lost on resume/initialisation, and so we need to 2875 * reset the value in the HWSP. 2876 */ 2877 sanitize_hwsp(engine); 2878 2879 /* And scrub the dirty cachelines for the HWSP */ 2880 drm_clflush_virt_range(engine->status_page.addr, PAGE_SIZE); 2881 2882 intel_engine_reset_pinned_contexts(engine); 2883 } 2884 2885 static void enable_error_interrupt(struct intel_engine_cs *engine) 2886 { 2887 u32 status; 2888 2889 engine->execlists.error_interrupt = 0; 2890 ENGINE_WRITE(engine, RING_EMR, ~0u); 2891 ENGINE_WRITE(engine, RING_EIR, ~0u); /* clear all existing errors */ 2892 2893 status = ENGINE_READ(engine, RING_ESR); 2894 if (unlikely(status)) { 2895 drm_err(&engine->i915->drm, 2896 "engine '%s' resumed still in error: %08x\n", 2897 engine->name, status); 2898 intel_gt_reset_engine(engine); 2899 } 2900 2901 /* 2902 * On current gen8+, we have 2 signals to play with 2903 * 2904 * - I915_ERROR_INSTUCTION (bit 0) 2905 * 2906 * Generate an error if the command parser encounters an invalid 2907 * instruction 2908 * 2909 * This is a fatal error. 2910 * 2911 * - CP_PRIV (bit 2) 2912 * 2913 * Generate an error on privilege violation (where the CP replaces 2914 * the instruction with a no-op). This also fires for writes into 2915 * read-only scratch pages. 2916 * 2917 * This is a non-fatal error, parsing continues. 2918 * 2919 * * there are a few others defined for odd HW that we do not use 2920 * 2921 * Since CP_PRIV fires for cases where we have chosen to ignore the 2922 * error (as the HW is validating and suppressing the mistakes), we 2923 * only unmask the instruction error bit. 2924 */ 2925 ENGINE_WRITE(engine, RING_EMR, ~I915_ERROR_INSTRUCTION); 2926 } 2927 2928 static void enable_execlists(struct intel_engine_cs *engine) 2929 { 2930 u32 mode; 2931 2932 assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL); 2933 2934 intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */ 2935 2936 if (GRAPHICS_VER(engine->i915) >= 11) 2937 mode = REG_MASKED_FIELD_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE); 2938 else 2939 mode = REG_MASKED_FIELD_ENABLE(GFX_RUN_LIST_ENABLE); 2940 ENGINE_WRITE_FW(engine, RING_MODE_GEN7, mode); 2941 2942 ENGINE_WRITE_FW(engine, RING_MI_MODE, REG_MASKED_FIELD_DISABLE(STOP_RING)); 2943 2944 ENGINE_WRITE_FW(engine, 2945 RING_HWS_PGA, 2946 i915_ggtt_offset(engine->status_page.vma)); 2947 ENGINE_POSTING_READ(engine, RING_HWS_PGA); 2948 2949 enable_error_interrupt(engine); 2950 } 2951 2952 static int execlists_resume(struct intel_engine_cs *engine) 2953 { 2954 intel_mocs_init_engine(engine); 2955 intel_breadcrumbs_reset(engine->breadcrumbs); 2956 2957 enable_execlists(engine); 2958 2959 if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) 2960 xehp_enable_ccs_engines(engine); 2961 2962 return 0; 2963 } 2964 2965 static void execlists_reset_prepare(struct intel_engine_cs *engine) 2966 { 2967 ENGINE_TRACE(engine, "depth<-%d\n", 2968 atomic_read(&engine->sched_engine->tasklet.count)); 2969 2970 /* 2971 * Prevent request submission to the hardware until we have 2972 * completed the reset in i915_gem_reset_finish(). If a request 2973 * is completed by one engine, it may then queue a request 2974 * to a second via its execlists->tasklet *just* as we are 2975 * calling engine->resume() and also writing the ELSP. 2976 * Turning off the execlists->tasklet until the reset is over 2977 * prevents the race. 2978 */ 2979 __tasklet_disable_sync_once(&engine->sched_engine->tasklet); 2980 GEM_BUG_ON(!reset_in_progress(engine)); 2981 2982 /* 2983 * We stop engines, otherwise we might get failed reset and a 2984 * dead gpu (on elk). Also as modern gpu as kbl can suffer 2985 * from system hang if batchbuffer is progressing when 2986 * the reset is issued, regardless of READY_TO_RESET ack. 2987 * Thus assume it is best to stop engines on all gens 2988 * where we have a gpu reset. 2989 * 2990 * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) 2991 * 2992 * FIXME: Wa for more modern gens needs to be validated 2993 */ 2994 ring_set_paused(engine, 1); 2995 intel_engine_stop_cs(engine); 2996 2997 /* 2998 * Wa_22011802037: In addition to stopping the cs, we need 2999 * to wait for any pending mi force wakeups 3000 */ 3001 if (intel_engine_reset_needs_wa_22011802037(engine->gt)) 3002 intel_engine_wait_for_pending_mi_fw(engine); 3003 3004 engine->execlists.reset_ccid = active_ccid(engine); 3005 } 3006 3007 static struct i915_request ** 3008 reset_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 3009 { 3010 struct intel_engine_execlists * const execlists = &engine->execlists; 3011 3012 drm_clflush_virt_range(execlists->csb_write, 3013 sizeof(execlists->csb_write[0])); 3014 3015 inactive = process_csb(engine, inactive); /* drain preemption events */ 3016 3017 /* Following the reset, we need to reload the CSB read/write pointers */ 3018 reset_csb_pointers(engine); 3019 3020 return inactive; 3021 } 3022 3023 static void 3024 execlists_reset_active(struct intel_engine_cs *engine, bool stalled) 3025 { 3026 struct intel_context *ce; 3027 struct i915_request *rq; 3028 u32 head; 3029 3030 /* 3031 * Save the currently executing context, even if we completed 3032 * its request, it was still running at the time of the 3033 * reset and will have been clobbered. 3034 */ 3035 rq = active_context(engine, engine->execlists.reset_ccid); 3036 if (!rq) 3037 return; 3038 3039 ce = rq->context; 3040 GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 3041 3042 if (__i915_request_is_complete(rq)) { 3043 /* Idle context; tidy up the ring so we can restart afresh */ 3044 head = intel_ring_wrap(ce->ring, rq->tail); 3045 goto out_replay; 3046 } 3047 3048 /* We still have requests in-flight; the engine should be active */ 3049 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 3050 3051 /* Context has requests still in-flight; it should not be idle! */ 3052 GEM_BUG_ON(i915_active_is_idle(&ce->active)); 3053 3054 rq = active_request(ce->timeline, rq); 3055 head = intel_ring_wrap(ce->ring, rq->head); 3056 GEM_BUG_ON(head == ce->ring->tail); 3057 3058 /* 3059 * If this request hasn't started yet, e.g. it is waiting on a 3060 * semaphore, we need to avoid skipping the request or else we 3061 * break the signaling chain. However, if the context is corrupt 3062 * the request will not restart and we will be stuck with a wedged 3063 * device. It is quite often the case that if we issue a reset 3064 * while the GPU is loading the context image, that the context 3065 * image becomes corrupt. 3066 * 3067 * Otherwise, if we have not started yet, the request should replay 3068 * perfectly and we do not need to flag the result as being erroneous. 3069 */ 3070 if (!__i915_request_has_started(rq)) 3071 goto out_replay; 3072 3073 /* 3074 * If the request was innocent, we leave the request in the ELSP 3075 * and will try to replay it on restarting. The context image may 3076 * have been corrupted by the reset, in which case we may have 3077 * to service a new GPU hang, but more likely we can continue on 3078 * without impact. 3079 * 3080 * If the request was guilty, we presume the context is corrupt 3081 * and have to at least restore the RING register in the context 3082 * image back to the expected values to skip over the guilty request. 3083 */ 3084 __i915_request_reset(rq, stalled); 3085 3086 /* 3087 * We want a simple context + ring to execute the breadcrumb update. 3088 * We cannot rely on the context being intact across the GPU hang, 3089 * so clear it and rebuild just what we need for the breadcrumb. 3090 * All pending requests for this context will be zapped, and any 3091 * future request will be after userspace has had the opportunity 3092 * to recreate its own state. 3093 */ 3094 out_replay: 3095 ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n", 3096 head, ce->ring->tail); 3097 lrc_reset_regs(ce, engine); 3098 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 3099 } 3100 3101 static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) 3102 { 3103 struct intel_engine_execlists * const execlists = &engine->execlists; 3104 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 3105 struct i915_request **inactive; 3106 3107 rcu_read_lock(); 3108 inactive = reset_csb(engine, post); 3109 3110 execlists_reset_active(engine, true); 3111 3112 inactive = cancel_port_requests(execlists, inactive); 3113 post_process_csb(post, inactive); 3114 rcu_read_unlock(); 3115 } 3116 3117 static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) 3118 { 3119 unsigned long flags; 3120 3121 ENGINE_TRACE(engine, "\n"); 3122 3123 /* Process the csb, find the guilty context and throw away */ 3124 execlists_reset_csb(engine, stalled); 3125 3126 /* Push back any incomplete requests for replay after the reset. */ 3127 rcu_read_lock(); 3128 spin_lock_irqsave(&engine->sched_engine->lock, flags); 3129 __unwind_incomplete_requests(engine); 3130 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 3131 rcu_read_unlock(); 3132 } 3133 3134 static void nop_submission_tasklet(struct tasklet_struct *t) 3135 { 3136 struct i915_sched_engine *sched_engine = 3137 from_tasklet(sched_engine, t, tasklet); 3138 struct intel_engine_cs * const engine = sched_engine->private_data; 3139 3140 /* The driver is wedged; don't process any more events. */ 3141 WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN); 3142 } 3143 3144 static void execlists_reset_cancel(struct intel_engine_cs *engine) 3145 { 3146 struct intel_engine_execlists * const execlists = &engine->execlists; 3147 struct i915_sched_engine * const sched_engine = engine->sched_engine; 3148 struct i915_request *rq, *rn; 3149 struct rb_node *rb; 3150 unsigned long flags; 3151 3152 ENGINE_TRACE(engine, "\n"); 3153 3154 /* 3155 * Before we call engine->cancel_requests(), we should have exclusive 3156 * access to the submission state. This is arranged for us by the 3157 * caller disabling the interrupt generation, the tasklet and other 3158 * threads that may then access the same state, giving us a free hand 3159 * to reset state. However, we still need to let lockdep be aware that 3160 * we know this state may be accessed in hardirq context, so we 3161 * disable the irq around this manipulation and we want to keep 3162 * the spinlock focused on its duties and not accidentally conflate 3163 * coverage to the submission's irq state. (Similarly, although we 3164 * shouldn't need to disable irq around the manipulation of the 3165 * submission's irq state, we also wish to remind ourselves that 3166 * it is irq state.) 3167 */ 3168 execlists_reset_csb(engine, true); 3169 3170 rcu_read_lock(); 3171 spin_lock_irqsave(&engine->sched_engine->lock, flags); 3172 3173 /* Mark all executing requests as skipped. */ 3174 list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) 3175 i915_request_put(i915_request_mark_eio(rq)); 3176 intel_engine_signal_breadcrumbs(engine); 3177 3178 /* Flush the queued requests to the timeline list (for retiring). */ 3179 while ((rb = rb_first_cached(&sched_engine->queue))) { 3180 struct i915_priolist *p = to_priolist(rb); 3181 3182 priolist_for_each_request_consume(rq, rn, p) { 3183 if (i915_request_mark_eio(rq)) { 3184 __i915_request_submit(rq); 3185 i915_request_put(rq); 3186 } 3187 } 3188 3189 rb_erase_cached(&p->node, &sched_engine->queue); 3190 i915_priolist_free(p); 3191 } 3192 3193 /* On-hold requests will be flushed to timeline upon their release */ 3194 list_for_each_entry(rq, &sched_engine->hold, sched.link) 3195 i915_request_put(i915_request_mark_eio(rq)); 3196 3197 /* Cancel all attached virtual engines */ 3198 while ((rb = rb_first_cached(&execlists->virtual))) { 3199 struct virtual_engine *ve = 3200 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 3201 3202 rb_erase_cached(rb, &execlists->virtual); 3203 RB_CLEAR_NODE(rb); 3204 3205 spin_lock(&ve->base.sched_engine->lock); 3206 rq = fetch_and_zero(&ve->request); 3207 if (rq) { 3208 if (i915_request_mark_eio(rq)) { 3209 rq->engine = engine; 3210 __i915_request_submit(rq); 3211 i915_request_put(rq); 3212 } 3213 i915_request_put(rq); 3214 3215 ve->base.sched_engine->queue_priority_hint = INT_MIN; 3216 } 3217 spin_unlock(&ve->base.sched_engine->lock); 3218 } 3219 3220 /* Remaining _unready_ requests will be nop'ed when submitted */ 3221 3222 sched_engine->queue_priority_hint = INT_MIN; 3223 sched_engine->queue = RB_ROOT_CACHED; 3224 3225 GEM_BUG_ON(__tasklet_is_enabled(&engine->sched_engine->tasklet)); 3226 engine->sched_engine->tasklet.callback = nop_submission_tasklet; 3227 3228 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 3229 rcu_read_unlock(); 3230 } 3231 3232 static void execlists_reset_finish(struct intel_engine_cs *engine) 3233 { 3234 struct intel_engine_execlists * const execlists = &engine->execlists; 3235 3236 /* 3237 * After a GPU reset, we may have requests to replay. Do so now while 3238 * we still have the forcewake to be sure that the GPU is not allowed 3239 * to sleep before we restart and reload a context. 3240 * 3241 * If the GPU reset fails, the engine may still be alive with requests 3242 * inflight. We expect those to complete, or for the device to be 3243 * reset as the next level of recovery, and as a final resort we 3244 * will declare the device wedged. 3245 */ 3246 GEM_BUG_ON(!reset_in_progress(engine)); 3247 3248 /* And kick in case we missed a new request submission. */ 3249 if (__tasklet_enable(&engine->sched_engine->tasklet)) 3250 __execlists_kick(execlists); 3251 3252 ENGINE_TRACE(engine, "depth->%d\n", 3253 atomic_read(&engine->sched_engine->tasklet.count)); 3254 } 3255 3256 static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) 3257 { 3258 ENGINE_WRITE(engine, RING_IMR, 3259 ~(engine->irq_enable_mask | engine->irq_keep_mask)); 3260 ENGINE_POSTING_READ(engine, RING_IMR); 3261 } 3262 3263 static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) 3264 { 3265 ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); 3266 } 3267 3268 static void execlists_park(struct intel_engine_cs *engine) 3269 { 3270 cancel_timer(&engine->execlists.timer); 3271 cancel_timer(&engine->execlists.preempt); 3272 3273 /* Reset upon idling, or we may delay the busy wakeup. */ 3274 WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN); 3275 } 3276 3277 static void add_to_engine(struct i915_request *rq) 3278 { 3279 lockdep_assert_held(&rq->engine->sched_engine->lock); 3280 list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); 3281 } 3282 3283 static void remove_from_engine(struct i915_request *rq) 3284 { 3285 struct intel_engine_cs *engine, *locked; 3286 3287 /* 3288 * Virtual engines complicate acquiring the engine timeline lock, 3289 * as their rq->engine pointer is not stable until under that 3290 * engine lock. The simple ploy we use is to take the lock then 3291 * check that the rq still belongs to the newly locked engine. 3292 */ 3293 locked = READ_ONCE(rq->engine); 3294 spin_lock_irq(&locked->sched_engine->lock); 3295 while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { 3296 spin_unlock(&locked->sched_engine->lock); 3297 spin_lock(&engine->sched_engine->lock); 3298 locked = engine; 3299 } 3300 list_del_init(&rq->sched.link); 3301 3302 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 3303 clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); 3304 3305 /* Prevent further __await_execution() registering a cb, then flush */ 3306 set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); 3307 3308 spin_unlock_irq(&locked->sched_engine->lock); 3309 3310 i915_request_notify_execute_cb_imm(rq); 3311 } 3312 3313 static bool can_preempt(struct intel_engine_cs *engine) 3314 { 3315 return GRAPHICS_VER(engine->i915) > 8; 3316 } 3317 3318 static void kick_execlists(const struct i915_request *rq, int prio) 3319 { 3320 struct intel_engine_cs *engine = rq->engine; 3321 struct i915_sched_engine *sched_engine = engine->sched_engine; 3322 const struct i915_request *inflight; 3323 3324 /* 3325 * We only need to kick the tasklet once for the high priority 3326 * new context we add into the queue. 3327 */ 3328 if (prio <= sched_engine->queue_priority_hint) 3329 return; 3330 3331 rcu_read_lock(); 3332 3333 /* Nothing currently active? We're overdue for a submission! */ 3334 inflight = execlists_active(&engine->execlists); 3335 if (!inflight) 3336 goto unlock; 3337 3338 /* 3339 * If we are already the currently executing context, don't 3340 * bother evaluating if we should preempt ourselves. 3341 */ 3342 if (inflight->context == rq->context) 3343 goto unlock; 3344 3345 ENGINE_TRACE(engine, 3346 "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n", 3347 prio, 3348 rq->fence.context, rq->fence.seqno, 3349 inflight->fence.context, inflight->fence.seqno, 3350 inflight->sched.attr.priority); 3351 3352 sched_engine->queue_priority_hint = prio; 3353 3354 /* 3355 * Allow preemption of low -> normal -> high, but we do 3356 * not allow low priority tasks to preempt other low priority 3357 * tasks under the impression that latency for low priority 3358 * tasks does not matter (as much as background throughput), 3359 * so kiss. 3360 */ 3361 if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight))) 3362 tasklet_hi_schedule(&sched_engine->tasklet); 3363 3364 unlock: 3365 rcu_read_unlock(); 3366 } 3367 3368 static void execlists_set_default_submission(struct intel_engine_cs *engine) 3369 { 3370 engine->submit_request = execlists_submit_request; 3371 engine->sched_engine->schedule = i915_schedule; 3372 engine->sched_engine->kick_backend = kick_execlists; 3373 engine->sched_engine->tasklet.callback = execlists_submission_tasklet; 3374 } 3375 3376 static void execlists_shutdown(struct intel_engine_cs *engine) 3377 { 3378 /* Synchronise with residual timers and any softirq they raise */ 3379 timer_delete_sync(&engine->execlists.timer); 3380 timer_delete_sync(&engine->execlists.preempt); 3381 tasklet_kill(&engine->sched_engine->tasklet); 3382 } 3383 3384 static void execlists_release(struct intel_engine_cs *engine) 3385 { 3386 engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ 3387 3388 execlists_shutdown(engine); 3389 3390 intel_engine_cleanup_common(engine); 3391 lrc_fini_wa_ctx(engine); 3392 } 3393 3394 static ktime_t __execlists_engine_busyness(struct intel_engine_cs *engine, 3395 ktime_t *now) 3396 { 3397 struct intel_engine_execlists_stats *stats = &engine->stats.execlists; 3398 ktime_t total = stats->total; 3399 3400 /* 3401 * If the engine is executing something at the moment 3402 * add it to the total. 3403 */ 3404 *now = ktime_get(); 3405 if (READ_ONCE(stats->active)) 3406 total = ktime_add(total, ktime_sub(*now, stats->start)); 3407 3408 return total; 3409 } 3410 3411 static ktime_t execlists_engine_busyness(struct intel_engine_cs *engine, 3412 ktime_t *now) 3413 { 3414 struct intel_engine_execlists_stats *stats = &engine->stats.execlists; 3415 unsigned int seq; 3416 ktime_t total; 3417 3418 do { 3419 seq = read_seqcount_begin(&stats->lock); 3420 total = __execlists_engine_busyness(engine, now); 3421 } while (read_seqcount_retry(&stats->lock, seq)); 3422 3423 return total; 3424 } 3425 3426 static void 3427 logical_ring_default_vfuncs(struct intel_engine_cs *engine) 3428 { 3429 /* Default vfuncs which can be overridden by each engine. */ 3430 3431 engine->resume = execlists_resume; 3432 3433 engine->cops = &execlists_context_ops; 3434 engine->request_alloc = execlists_request_alloc; 3435 engine->add_active_request = add_to_engine; 3436 engine->remove_active_request = remove_from_engine; 3437 3438 engine->reset.prepare = execlists_reset_prepare; 3439 engine->reset.rewind = execlists_reset_rewind; 3440 engine->reset.cancel = execlists_reset_cancel; 3441 engine->reset.finish = execlists_reset_finish; 3442 3443 engine->park = execlists_park; 3444 engine->unpark = NULL; 3445 3446 engine->emit_flush = gen8_emit_flush_xcs; 3447 engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; 3448 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; 3449 if (GRAPHICS_VER(engine->i915) >= 12) { 3450 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_xcs; 3451 engine->emit_flush = gen12_emit_flush_xcs; 3452 } 3453 engine->set_default_submission = execlists_set_default_submission; 3454 3455 if (GRAPHICS_VER(engine->i915) < 11) { 3456 engine->irq_enable = gen8_logical_ring_enable_irq; 3457 engine->irq_disable = gen8_logical_ring_disable_irq; 3458 } else { 3459 /* 3460 * TODO: On Gen11 interrupt masks need to be clear 3461 * to allow C6 entry. Keep interrupts enabled at 3462 * and take the hit of generating extra interrupts 3463 * until a more refined solution exists. 3464 */ 3465 } 3466 intel_engine_set_irq_handler(engine, execlists_irq_handler); 3467 3468 engine->flags |= I915_ENGINE_SUPPORTS_STATS; 3469 if (!intel_vgpu_active(engine->i915)) { 3470 engine->flags |= I915_ENGINE_HAS_SEMAPHORES; 3471 if (can_preempt(engine)) { 3472 engine->flags |= I915_ENGINE_HAS_PREEMPTION; 3473 if (CONFIG_DRM_I915_TIMESLICE_DURATION) 3474 engine->flags |= I915_ENGINE_HAS_TIMESLICES; 3475 } 3476 } 3477 3478 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) { 3479 if (intel_engine_has_preemption(engine)) 3480 engine->emit_bb_start = xehp_emit_bb_start; 3481 else 3482 engine->emit_bb_start = xehp_emit_bb_start_noarb; 3483 } else { 3484 if (intel_engine_has_preemption(engine)) 3485 engine->emit_bb_start = gen8_emit_bb_start; 3486 else 3487 engine->emit_bb_start = gen8_emit_bb_start_noarb; 3488 } 3489 3490 engine->busyness = execlists_engine_busyness; 3491 } 3492 3493 static void logical_ring_default_irqs(struct intel_engine_cs *engine) 3494 { 3495 unsigned int shift = 0; 3496 3497 if (GRAPHICS_VER(engine->i915) < 11) { 3498 const u8 irq_shifts[] = { 3499 [RCS0] = GEN8_RCS_IRQ_SHIFT, 3500 [BCS0] = GEN8_BCS_IRQ_SHIFT, 3501 [VCS0] = GEN8_VCS0_IRQ_SHIFT, 3502 [VCS1] = GEN8_VCS1_IRQ_SHIFT, 3503 [VECS0] = GEN8_VECS_IRQ_SHIFT, 3504 }; 3505 3506 shift = irq_shifts[engine->id]; 3507 } 3508 3509 engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift; 3510 engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift; 3511 engine->irq_keep_mask |= GT_CS_MASTER_ERROR_INTERRUPT << shift; 3512 engine->irq_keep_mask |= GT_WAIT_SEMAPHORE_INTERRUPT << shift; 3513 } 3514 3515 static void rcs_submission_override(struct intel_engine_cs *engine) 3516 { 3517 switch (GRAPHICS_VER(engine->i915)) { 3518 case 12: 3519 engine->emit_flush = gen12_emit_flush_rcs; 3520 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs; 3521 break; 3522 case 11: 3523 engine->emit_flush = gen11_emit_flush_rcs; 3524 engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs; 3525 break; 3526 default: 3527 engine->emit_flush = gen8_emit_flush_rcs; 3528 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; 3529 break; 3530 } 3531 } 3532 3533 int intel_execlists_submission_setup(struct intel_engine_cs *engine) 3534 { 3535 struct intel_engine_execlists * const execlists = &engine->execlists; 3536 struct drm_i915_private *i915 = engine->i915; 3537 struct intel_uncore *uncore = engine->uncore; 3538 u32 base = engine->mmio_base; 3539 3540 tasklet_setup(&engine->sched_engine->tasklet, execlists_submission_tasklet); 3541 timer_setup(&engine->execlists.timer, execlists_timeslice, 0); 3542 timer_setup(&engine->execlists.preempt, execlists_preempt, 0); 3543 3544 logical_ring_default_vfuncs(engine); 3545 logical_ring_default_irqs(engine); 3546 3547 seqcount_init(&engine->stats.execlists.lock); 3548 3549 if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) 3550 rcs_submission_override(engine); 3551 3552 lrc_init_wa_ctx(engine); 3553 3554 if (HAS_LOGICAL_RING_ELSQ(i915)) { 3555 execlists->submit_reg = intel_uncore_regs(uncore) + 3556 i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 3557 execlists->ctrl_reg = intel_uncore_regs(uncore) + 3558 i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 3559 3560 engine->fw_domain = intel_uncore_forcewake_for_reg(engine->uncore, 3561 RING_EXECLIST_CONTROL(engine->mmio_base), 3562 FW_REG_WRITE); 3563 } else { 3564 execlists->submit_reg = intel_uncore_regs(uncore) + 3565 i915_mmio_reg_offset(RING_ELSP(base)); 3566 } 3567 3568 execlists->csb_status = 3569 (u64 *)&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; 3570 3571 execlists->csb_write = 3572 &engine->status_page.addr[INTEL_HWS_CSB_WRITE_INDEX(i915)]; 3573 3574 if (GRAPHICS_VER(i915) < 11) 3575 execlists->csb_size = GEN8_CSB_ENTRIES; 3576 else 3577 execlists->csb_size = GEN11_CSB_ENTRIES; 3578 3579 engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); 3580 if (GRAPHICS_VER(engine->i915) >= 11 && 3581 GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 55)) { 3582 execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); 3583 execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); 3584 } 3585 3586 /* Finally, take ownership and responsibility for cleanup! */ 3587 engine->sanitize = execlists_sanitize; 3588 engine->release = execlists_release; 3589 3590 return 0; 3591 } 3592 3593 static struct list_head *virtual_queue(struct virtual_engine *ve) 3594 { 3595 return &ve->base.sched_engine->default_priolist.requests; 3596 } 3597 3598 static void rcu_virtual_context_destroy(struct work_struct *wrk) 3599 { 3600 struct virtual_engine *ve = 3601 container_of(wrk, typeof(*ve), rcu.work); 3602 unsigned int n; 3603 3604 GEM_BUG_ON(ve->context.inflight); 3605 3606 /* Preempt-to-busy may leave a stale request behind. */ 3607 if (unlikely(ve->request)) { 3608 struct i915_request *old; 3609 3610 spin_lock_irq(&ve->base.sched_engine->lock); 3611 3612 old = fetch_and_zero(&ve->request); 3613 if (old) { 3614 GEM_BUG_ON(!__i915_request_is_complete(old)); 3615 __i915_request_submit(old); 3616 i915_request_put(old); 3617 } 3618 3619 spin_unlock_irq(&ve->base.sched_engine->lock); 3620 } 3621 3622 /* 3623 * Flush the tasklet in case it is still running on another core. 3624 * 3625 * This needs to be done before we remove ourselves from the siblings' 3626 * rbtrees as in the case it is running in parallel, it may reinsert 3627 * the rb_node into a sibling. 3628 */ 3629 tasklet_kill(&ve->base.sched_engine->tasklet); 3630 3631 /* Decouple ourselves from the siblings, no more access allowed. */ 3632 for (n = 0; n < ve->num_siblings; n++) { 3633 struct intel_engine_cs *sibling = ve->siblings[n]; 3634 struct rb_node *node = &ve->nodes[sibling->id].rb; 3635 3636 if (RB_EMPTY_NODE(node)) 3637 continue; 3638 3639 spin_lock_irq(&sibling->sched_engine->lock); 3640 3641 /* Detachment is lazily performed in the sched_engine->tasklet */ 3642 if (!RB_EMPTY_NODE(node)) 3643 rb_erase_cached(node, &sibling->execlists.virtual); 3644 3645 spin_unlock_irq(&sibling->sched_engine->lock); 3646 } 3647 GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet)); 3648 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3649 3650 lrc_fini(&ve->context); 3651 intel_context_fini(&ve->context); 3652 3653 if (ve->base.breadcrumbs) 3654 intel_breadcrumbs_put(ve->base.breadcrumbs); 3655 if (ve->base.sched_engine) 3656 i915_sched_engine_put(ve->base.sched_engine); 3657 intel_engine_free_request_pool(&ve->base); 3658 3659 kfree(ve); 3660 } 3661 3662 static void virtual_context_destroy(struct kref *kref) 3663 { 3664 struct virtual_engine *ve = 3665 container_of(kref, typeof(*ve), context.ref); 3666 3667 GEM_BUG_ON(!list_empty(&ve->context.signals)); 3668 3669 /* 3670 * When destroying the virtual engine, we have to be aware that 3671 * it may still be in use from an hardirq/softirq context causing 3672 * the resubmission of a completed request (background completion 3673 * due to preempt-to-busy). Before we can free the engine, we need 3674 * to flush the submission code and tasklets that are still potentially 3675 * accessing the engine. Flushing the tasklets requires process context, 3676 * and since we can guard the resubmit onto the engine with an RCU read 3677 * lock, we can delegate the free of the engine to an RCU worker. 3678 */ 3679 INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy); 3680 queue_rcu_work(ve->context.engine->i915->unordered_wq, &ve->rcu); 3681 } 3682 3683 static void virtual_engine_initial_hint(struct virtual_engine *ve) 3684 { 3685 int swp; 3686 3687 /* 3688 * Pick a random sibling on starting to help spread the load around. 3689 * 3690 * New contexts are typically created with exactly the same order 3691 * of siblings, and often started in batches. Due to the way we iterate 3692 * the array of sibling when submitting requests, sibling[0] is 3693 * prioritised for dequeuing. If we make sure that sibling[0] is fairly 3694 * randomised across the system, we also help spread the load by the 3695 * first engine we inspect being different each time. 3696 * 3697 * NB This does not force us to execute on this engine, it will just 3698 * typically be the first we inspect for submission. 3699 */ 3700 swp = get_random_u32_below(ve->num_siblings); 3701 if (swp) 3702 swap(ve->siblings[swp], ve->siblings[0]); 3703 } 3704 3705 static int virtual_context_alloc(struct intel_context *ce) 3706 { 3707 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3708 3709 return lrc_alloc(ce, ve->siblings[0]); 3710 } 3711 3712 static int virtual_context_pre_pin(struct intel_context *ce, 3713 struct i915_gem_ww_ctx *ww, 3714 void **vaddr) 3715 { 3716 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3717 3718 /* Note: we must use a real engine class for setting up reg state */ 3719 return __execlists_context_pre_pin(ce, ve->siblings[0], ww, vaddr); 3720 } 3721 3722 static int virtual_context_pin(struct intel_context *ce, void *vaddr) 3723 { 3724 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3725 3726 return lrc_pin(ce, ve->siblings[0], vaddr); 3727 } 3728 3729 static void virtual_context_enter(struct intel_context *ce) 3730 { 3731 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3732 unsigned int n; 3733 3734 for (n = 0; n < ve->num_siblings; n++) 3735 intel_engine_pm_get(ve->siblings[n]); 3736 3737 intel_timeline_enter(ce->timeline); 3738 } 3739 3740 static void virtual_context_exit(struct intel_context *ce) 3741 { 3742 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3743 unsigned int n; 3744 3745 intel_timeline_exit(ce->timeline); 3746 3747 for (n = 0; n < ve->num_siblings; n++) 3748 intel_engine_pm_put(ve->siblings[n]); 3749 } 3750 3751 static struct intel_engine_cs * 3752 virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) 3753 { 3754 struct virtual_engine *ve = to_virtual_engine(engine); 3755 3756 if (sibling >= ve->num_siblings) 3757 return NULL; 3758 3759 return ve->siblings[sibling]; 3760 } 3761 3762 static const struct intel_context_ops virtual_context_ops = { 3763 .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES, 3764 3765 .alloc = virtual_context_alloc, 3766 3767 .cancel_request = execlists_context_cancel_request, 3768 3769 .pre_pin = virtual_context_pre_pin, 3770 .pin = virtual_context_pin, 3771 .unpin = lrc_unpin, 3772 .post_unpin = lrc_post_unpin, 3773 3774 .enter = virtual_context_enter, 3775 .exit = virtual_context_exit, 3776 3777 .destroy = virtual_context_destroy, 3778 3779 .get_sibling = virtual_get_sibling, 3780 }; 3781 3782 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) 3783 { 3784 struct i915_request *rq; 3785 intel_engine_mask_t mask; 3786 3787 rq = READ_ONCE(ve->request); 3788 if (!rq) 3789 return 0; 3790 3791 /* The rq is ready for submission; rq->execution_mask is now stable. */ 3792 mask = rq->execution_mask; 3793 if (unlikely(!mask)) { 3794 /* Invalid selection, submit to a random engine in error */ 3795 i915_request_set_error_once(rq, -ENODEV); 3796 mask = ve->siblings[0]->mask; 3797 } 3798 3799 ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", 3800 rq->fence.context, rq->fence.seqno, 3801 mask, ve->base.sched_engine->queue_priority_hint); 3802 3803 return mask; 3804 } 3805 3806 static void virtual_submission_tasklet(struct tasklet_struct *t) 3807 { 3808 struct i915_sched_engine *sched_engine = 3809 from_tasklet(sched_engine, t, tasklet); 3810 struct virtual_engine * const ve = 3811 (struct virtual_engine *)sched_engine->private_data; 3812 const int prio = READ_ONCE(sched_engine->queue_priority_hint); 3813 intel_engine_mask_t mask; 3814 unsigned int n; 3815 3816 rcu_read_lock(); 3817 mask = virtual_submission_mask(ve); 3818 rcu_read_unlock(); 3819 if (unlikely(!mask)) 3820 return; 3821 3822 for (n = 0; n < ve->num_siblings; n++) { 3823 struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); 3824 struct ve_node * const node = &ve->nodes[sibling->id]; 3825 struct rb_node **parent, *rb; 3826 bool first; 3827 3828 if (!READ_ONCE(ve->request)) 3829 break; /* already handled by a sibling's tasklet */ 3830 3831 spin_lock_irq(&sibling->sched_engine->lock); 3832 3833 if (unlikely(!(mask & sibling->mask))) { 3834 if (!RB_EMPTY_NODE(&node->rb)) { 3835 rb_erase_cached(&node->rb, 3836 &sibling->execlists.virtual); 3837 RB_CLEAR_NODE(&node->rb); 3838 } 3839 3840 goto unlock_engine; 3841 } 3842 3843 if (unlikely(!RB_EMPTY_NODE(&node->rb))) { 3844 /* 3845 * Cheat and avoid rebalancing the tree if we can 3846 * reuse this node in situ. 3847 */ 3848 first = rb_first_cached(&sibling->execlists.virtual) == 3849 &node->rb; 3850 if (prio == node->prio || (prio > node->prio && first)) 3851 goto submit_engine; 3852 3853 rb_erase_cached(&node->rb, &sibling->execlists.virtual); 3854 } 3855 3856 rb = NULL; 3857 first = true; 3858 parent = &sibling->execlists.virtual.rb_root.rb_node; 3859 while (*parent) { 3860 struct ve_node *other; 3861 3862 rb = *parent; 3863 other = rb_entry(rb, typeof(*other), rb); 3864 if (prio > other->prio) { 3865 parent = &rb->rb_left; 3866 } else { 3867 parent = &rb->rb_right; 3868 first = false; 3869 } 3870 } 3871 3872 rb_link_node(&node->rb, rb, parent); 3873 rb_insert_color_cached(&node->rb, 3874 &sibling->execlists.virtual, 3875 first); 3876 3877 submit_engine: 3878 GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); 3879 node->prio = prio; 3880 if (first && prio > sibling->sched_engine->queue_priority_hint) 3881 tasklet_hi_schedule(&sibling->sched_engine->tasklet); 3882 3883 unlock_engine: 3884 spin_unlock_irq(&sibling->sched_engine->lock); 3885 3886 if (intel_context_inflight(&ve->context)) 3887 break; 3888 } 3889 } 3890 3891 static void virtual_submit_request(struct i915_request *rq) 3892 { 3893 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3894 unsigned long flags; 3895 3896 ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", 3897 rq->fence.context, 3898 rq->fence.seqno); 3899 3900 GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); 3901 3902 spin_lock_irqsave(&ve->base.sched_engine->lock, flags); 3903 3904 /* By the time we resubmit a request, it may be completed */ 3905 if (__i915_request_is_complete(rq)) { 3906 __i915_request_submit(rq); 3907 goto unlock; 3908 } 3909 3910 if (ve->request) { /* background completion from preempt-to-busy */ 3911 GEM_BUG_ON(!__i915_request_is_complete(ve->request)); 3912 __i915_request_submit(ve->request); 3913 i915_request_put(ve->request); 3914 } 3915 3916 ve->base.sched_engine->queue_priority_hint = rq_prio(rq); 3917 ve->request = i915_request_get(rq); 3918 3919 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3920 list_move_tail(&rq->sched.link, virtual_queue(ve)); 3921 3922 tasklet_hi_schedule(&ve->base.sched_engine->tasklet); 3923 3924 unlock: 3925 spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags); 3926 } 3927 3928 static struct intel_context * 3929 execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count, 3930 unsigned long flags) 3931 { 3932 struct drm_i915_private *i915 = siblings[0]->i915; 3933 struct virtual_engine *ve; 3934 unsigned int n; 3935 int err; 3936 3937 ve = kzalloc_flex(*ve, siblings, count); 3938 if (!ve) 3939 return ERR_PTR(-ENOMEM); 3940 3941 ve->base.i915 = i915; 3942 ve->base.gt = siblings[0]->gt; 3943 ve->base.uncore = siblings[0]->uncore; 3944 ve->base.id = -1; 3945 3946 ve->base.class = OTHER_CLASS; 3947 ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; 3948 ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3949 ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3950 3951 /* 3952 * The decision on whether to submit a request using semaphores 3953 * depends on the saturated state of the engine. We only compute 3954 * this during HW submission of the request, and we need for this 3955 * state to be globally applied to all requests being submitted 3956 * to this engine. Virtual engines encompass more than one physical 3957 * engine and so we cannot accurately tell in advance if one of those 3958 * engines is already saturated and so cannot afford to use a semaphore 3959 * and be pessimized in priority for doing so -- if we are the only 3960 * context using semaphores after all other clients have stopped, we 3961 * will be starved on the saturated system. Such a global switch for 3962 * semaphores is less than ideal, but alas is the current compromise. 3963 */ 3964 ve->base.saturated = ALL_ENGINES; 3965 3966 snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); 3967 3968 intel_engine_init_execlists(&ve->base); 3969 3970 ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL); 3971 if (!ve->base.sched_engine) { 3972 err = -ENOMEM; 3973 goto err_put; 3974 } 3975 ve->base.sched_engine->private_data = &ve->base; 3976 3977 ve->base.cops = &virtual_context_ops; 3978 ve->base.request_alloc = execlists_request_alloc; 3979 3980 ve->base.sched_engine->schedule = i915_schedule; 3981 ve->base.sched_engine->kick_backend = kick_execlists; 3982 ve->base.submit_request = virtual_submit_request; 3983 3984 INIT_LIST_HEAD(virtual_queue(ve)); 3985 tasklet_setup(&ve->base.sched_engine->tasklet, virtual_submission_tasklet); 3986 3987 intel_context_init(&ve->context, &ve->base); 3988 3989 ve->base.breadcrumbs = intel_breadcrumbs_create(NULL); 3990 if (!ve->base.breadcrumbs) { 3991 err = -ENOMEM; 3992 goto err_put; 3993 } 3994 3995 for (n = 0; n < count; n++) { 3996 struct intel_engine_cs *sibling = siblings[n]; 3997 3998 GEM_BUG_ON(!is_power_of_2(sibling->mask)); 3999 if (sibling->mask & ve->base.mask) { 4000 drm_dbg(&i915->drm, 4001 "duplicate %s entry in load balancer\n", 4002 sibling->name); 4003 err = -EINVAL; 4004 goto err_put; 4005 } 4006 4007 /* 4008 * The virtual engine implementation is tightly coupled to 4009 * the execlists backend -- we push out request directly 4010 * into a tree inside each physical engine. We could support 4011 * layering if we handle cloning of the requests and 4012 * submitting a copy into each backend. 4013 */ 4014 if (sibling->sched_engine->tasklet.callback != 4015 execlists_submission_tasklet) { 4016 err = -ENODEV; 4017 goto err_put; 4018 } 4019 4020 GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb)); 4021 RB_CLEAR_NODE(&ve->nodes[sibling->id].rb); 4022 4023 ve->siblings[ve->num_siblings++] = sibling; 4024 ve->base.mask |= sibling->mask; 4025 ve->base.logical_mask |= sibling->logical_mask; 4026 4027 /* 4028 * All physical engines must be compatible for their emission 4029 * functions (as we build the instructions during request 4030 * construction and do not alter them before submission 4031 * on the physical engine). We use the engine class as a guide 4032 * here, although that could be refined. 4033 */ 4034 if (ve->base.class != OTHER_CLASS) { 4035 if (ve->base.class != sibling->class) { 4036 drm_dbg(&i915->drm, 4037 "invalid mixing of engine class, sibling %d, already %d\n", 4038 sibling->class, ve->base.class); 4039 err = -EINVAL; 4040 goto err_put; 4041 } 4042 continue; 4043 } 4044 4045 ve->base.class = sibling->class; 4046 ve->base.uabi_class = sibling->uabi_class; 4047 snprintf(ve->base.name, sizeof(ve->base.name), 4048 "v%dx%d", ve->base.class, count); 4049 ve->base.context_size = sibling->context_size; 4050 4051 ve->base.add_active_request = sibling->add_active_request; 4052 ve->base.remove_active_request = sibling->remove_active_request; 4053 ve->base.emit_bb_start = sibling->emit_bb_start; 4054 ve->base.emit_flush = sibling->emit_flush; 4055 ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; 4056 ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb; 4057 ve->base.emit_fini_breadcrumb_dw = 4058 sibling->emit_fini_breadcrumb_dw; 4059 4060 ve->base.flags = sibling->flags; 4061 } 4062 4063 ve->base.flags |= I915_ENGINE_IS_VIRTUAL; 4064 4065 virtual_engine_initial_hint(ve); 4066 return &ve->context; 4067 4068 err_put: 4069 intel_context_put(&ve->context); 4070 return ERR_PTR(err); 4071 } 4072 4073 void intel_execlists_show_requests(struct intel_engine_cs *engine, 4074 struct drm_printer *m, 4075 void (*show_request)(struct drm_printer *m, 4076 const struct i915_request *rq, 4077 const char *prefix, 4078 int indent), 4079 unsigned int max) 4080 { 4081 const struct intel_engine_execlists *execlists = &engine->execlists; 4082 struct i915_sched_engine *sched_engine = engine->sched_engine; 4083 struct i915_request *rq, *last; 4084 unsigned long flags; 4085 unsigned int count; 4086 struct rb_node *rb; 4087 4088 spin_lock_irqsave(&sched_engine->lock, flags); 4089 4090 last = NULL; 4091 count = 0; 4092 list_for_each_entry(rq, &sched_engine->requests, sched.link) { 4093 if (count++ < max - 1) 4094 show_request(m, rq, "\t\t", 0); 4095 else 4096 last = rq; 4097 } 4098 if (last) { 4099 if (count > max) { 4100 drm_printf(m, 4101 "\t\t...skipping %d executing requests...\n", 4102 count - max); 4103 } 4104 show_request(m, last, "\t\t", 0); 4105 } 4106 4107 if (sched_engine->queue_priority_hint != INT_MIN) 4108 drm_printf(m, "\t\tQueue priority hint: %d\n", 4109 READ_ONCE(sched_engine->queue_priority_hint)); 4110 4111 last = NULL; 4112 count = 0; 4113 for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) { 4114 struct i915_priolist *p = rb_entry(rb, typeof(*p), node); 4115 4116 priolist_for_each_request(rq, p) { 4117 if (count++ < max - 1) 4118 show_request(m, rq, "\t\t", 0); 4119 else 4120 last = rq; 4121 } 4122 } 4123 if (last) { 4124 if (count > max) { 4125 drm_printf(m, 4126 "\t\t...skipping %d queued requests...\n", 4127 count - max); 4128 } 4129 show_request(m, last, "\t\t", 0); 4130 } 4131 4132 last = NULL; 4133 count = 0; 4134 for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) { 4135 struct virtual_engine *ve = 4136 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 4137 struct i915_request *rq = READ_ONCE(ve->request); 4138 4139 if (rq) { 4140 if (count++ < max - 1) 4141 show_request(m, rq, "\t\t", 0); 4142 else 4143 last = rq; 4144 } 4145 } 4146 if (last) { 4147 if (count > max) { 4148 drm_printf(m, 4149 "\t\t...skipping %d virtual requests...\n", 4150 count - max); 4151 } 4152 show_request(m, last, "\t\t", 0); 4153 } 4154 4155 spin_unlock_irqrestore(&sched_engine->lock, flags); 4156 } 4157 4158 void intel_execlists_dump_active_requests(struct intel_engine_cs *engine, 4159 struct i915_request *hung_rq, 4160 struct drm_printer *m) 4161 { 4162 unsigned long flags; 4163 4164 spin_lock_irqsave(&engine->sched_engine->lock, flags); 4165 4166 intel_engine_dump_active_requests(&engine->sched_engine->requests, hung_rq, m); 4167 4168 drm_printf(m, "\tOn hold?: %zu\n", 4169 list_count_nodes(&engine->sched_engine->hold)); 4170 4171 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 4172 } 4173 4174 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) 4175 #include "selftest_execlists.c" 4176 #endif 4177