Lines Matching full:we
72 * We could extend the life of a context to beyond that of all in i915_fence_get_timeline_name()
74 * or we just give them a false name. Since in i915_fence_get_timeline_name()
129 * freed when the slab cache itself is freed, and so we would get in i915_fence_release()
138 * We do not hold a reference to the engine here and so have to be in i915_fence_release()
139 * very careful in what rq->engine we poke. The virtual engine is in i915_fence_release()
140 * referenced via the rq->context and we released that ref during in i915_fence_release()
141 * i915_request_retire(), ergo we must not dereference a virtual in i915_fence_release()
142 * engine here. Not that we would want to, as the only consumer of in i915_fence_release()
147 * we know that it will have been processed by the HW and will in i915_fence_release()
153 * power-of-two we assume that rq->engine may still be a virtual in i915_fence_release()
154 * engine and so a dangling invalid pointer that we cannot dereference in i915_fence_release()
158 * that we might execute on). On processing the bond, the request mask in i915_fence_release()
162 * after timeslicing away, see __unwind_incomplete_requests(). Thus we in i915_fence_release()
252 * is-banned?, or we know the request is already inflight. in i915_request_active_engine()
254 * Note that rq->engine is unstable, and so we double in i915_request_active_engine()
255 * check that we have acquired the lock on the final engine. in i915_request_active_engine()
370 * We know the GPU must have read the request to have in i915_request_retire()
375 * Note this requires that we are always called in request in i915_request_retire()
381 /* Poison before we release our space in the ring */ in i915_request_retire()
395 * We only loosely track inflight requests across preemption, in i915_request_retire()
396 * and so we may find ourselves attempting to retire a _completed_ in i915_request_retire()
397 * request that we have removed from the HW and put back on a run in i915_request_retire()
400 * As we set I915_FENCE_FLAG_ACTIVE on the request, this should be in i915_request_retire()
401 * after removing the breadcrumb and signaling it, so that we do not in i915_request_retire()
447 * Even if we have unwound the request, it may still be on in __request_in_flight()
454 * As we know that there are always preemption points between in __request_in_flight()
455 * requests, we know that only the currently executing request in __request_in_flight()
456 * may be still active even though we have cleared the flag. in __request_in_flight()
457 * However, we can't rely on our tracking of ELSP[0] to know in __request_in_flight()
466 * latter, it may send the ACK and we process the event copying the in __request_in_flight()
468 * this implies the HW is arbitrating and not struck in *active, we do in __request_in_flight()
469 * not worry about complete accuracy, but we do require no read/write in __request_in_flight()
471 * as the array is being overwritten, for which we require the writes in __request_in_flight()
477 * that we received an ACK from the HW, and so the context is not in __request_in_flight()
478 * stuck -- if we do not see ourselves in *active, the inflight status in __request_in_flight()
479 * is valid. If instead we see ourselves being copied into *active, in __request_in_flight()
480 * we are inflight and may signal the callback. in __request_in_flight()
520 * active. This ensures that if we race with the in __await_execution()
521 * __notify_execute_cb from i915_request_submit() and we are not in __await_execution()
522 * included in that list, we get a second bite of the cherry and in __await_execution()
526 * In i915_request_retire() we set the ACTIVE bit on a completed in __await_execution()
528 * callback first, then checking the ACTIVE bit, we serialise with in __await_execution()
564 * breadcrumb at the end (so we get the fence notifications). in __i915_request_skip()
615 * With the advent of preempt-to-busy, we frequently encounter in __i915_request_submit()
616 * requests that we have unsubmitted from HW, but left running in __i915_request_submit()
618 * resubmission of that completed request, we can skip in __i915_request_submit()
622 * We must remove the request from the caller's priority queue, in __i915_request_submit()
625 * request has *not* yet been retired and we can safely move in __i915_request_submit()
642 * Are we using semaphores when the gpu is already saturated? in __i915_request_submit()
650 * If we installed a semaphore on this request and we only submit in __i915_request_submit()
653 * increases the amount of work we are doing. If so, we disable in __i915_request_submit()
654 * further use of semaphores until we are idle again, whence we in __i915_request_submit()
681 * In the future, perhaps when we have an active time-slicing scheduler, in __i915_request_submit()
684 * quite hairy, we have to carefully rollback the fence and do a in __i915_request_submit()
690 /* We may be recursing from the signal callback of another i915 fence */ in __i915_request_submit()
724 * Before we remove this breadcrumb from the signal list, we have in __i915_request_unsubmit()
726 * attach itself. We first mark the request as no longer active and in __i915_request_unsubmit()
735 /* We've already spun, don't charge on resubmitting. */ in __i915_request_unsubmit()
740 * We don't need to wake_up any waiters on request->execute, they in __i915_request_unsubmit()
787 * We need to serialize use of the submit_request() callback in submit_notify()
789 * i915_gem_set_wedged(). We use the RCU mechanism to mark the in submit_notify()
840 /* If we cannot wait, dip into our reserves */ in request_alloc_slow()
865 /* Retire our old requests in the hope that we free some */ in request_alloc_slow()
909 * We use RCU to look up requests in flight. The lookups may in __i915_request_create()
911 * That is the request we are writing to here, may be in the process in __i915_request_create()
913 * we have to be very careful when overwriting the contents. During in __i915_request_create()
914 * the RCU lookup, we change chase the request->engine pointer, in __i915_request_create()
920 * with dma_fence_init(). This increment is safe for release as we in __i915_request_create()
921 * check that the request we have a reference to and matches the active in __i915_request_create()
924 * Before we increment the refcount, we chase the request->engine in __i915_request_create()
925 * pointer. We must not call kmem_cache_zalloc() or else we set in __i915_request_create()
927 * we see the request is completed (based on the value of the in __i915_request_create()
929 * If we decide the request is not completed (new engine or seqno), in __i915_request_create()
930 * then we grab a reference and double check that it is still the in __i915_request_create()
966 /* We bump the ref for the fence chain */ in __i915_request_create()
986 * Note that due to how we add reserved_space to intel_ring_begin() in __i915_request_create()
987 * we need to double our request to ensure that if we need to wrap in __i915_request_create()
996 * should we detect the updated seqno part-way through the in __i915_request_create()
997 * GPU processing the request, we never over-estimate the in __i915_request_create()
1016 /* Make sure we didn't add ourselves to external state before freeing */ in __i915_request_create()
1048 /* Check that we do not interrupt ourselves with a new request */ in i915_request_create()
1071 * The caller holds a reference on @signal, but we do not serialise in i915_request_await_start()
1074 * We do not hold a reference to the request before @signal, and in i915_request_await_start()
1076 * we follow the link backwards. in i915_request_await_start()
1129 * both the GPU and CPU. We want to limit the impact on others, in already_busywaiting()
1131 * latency. Therefore we restrict ourselves to not using more in already_busywaiting()
1133 * if we have detected the engine is saturated (i.e. would not be in already_busywaiting()
1137 * See the are-we-too-late? check in __i915_request_submit(). in already_busywaiting()
1155 /* We need to pin the signaler's HWSP until we are finished reading. */ in __emit_semaphore_wait()
1169 * Using greater-than-or-equal here means we have to worry in __emit_semaphore_wait()
1170 * about seqno wraparound. To side step that issue, we swap in __emit_semaphore_wait()
1218 * that may fail catastrophically, then we want to avoid using in emit_semaphore_wait()
1219 * semaphores as they bypass the fence signaling metadata, and we in emit_semaphore_wait()
1225 /* Just emit the first semaphore we see as request space is limited. */ in emit_semaphore_wait()
1283 * The execution cb fires when we submit the request to HW. But in in __i915_request_await_execution()
1285 * run (consider that we submit 2 requests for the same context, where in __i915_request_await_execution()
1286 * the request of interest is behind an indefinite spinner). So we hook in __i915_request_await_execution()
1288 * in the worst case, though we hope that the await_start is elided. in __i915_request_await_execution()
1297 * Now that we are queued to the HW at roughly the same time (thanks in __i915_request_await_execution()
1301 * signaler depends on a semaphore, so indirectly do we, and we do not in __i915_request_await_execution()
1303 * So we wait. in __i915_request_await_execution()
1305 * However, there is also a second condition for which we need to wait in __i915_request_await_execution()
1310 * immediate execution, and so we must wait until it reaches the in __i915_request_await_execution()
1337 * The downside of using semaphores is that we lose metadata passing in mark_external()
1338 * along the signaling chain. This is particularly nasty when we in mark_external()
1340 * fatal errors we want to scrub the request before it is executed, in mark_external()
1341 * which means that we cannot preload the request onto HW and have in mark_external()
1429 * We don't squash repeated fence dependencies here as we in i915_request_await_execution()
1452 * If we are waiting on a virtual engine, then it may be in await_request_submit()
1455 * engine and then passed to the physical engine. We cannot allow in await_request_submit()
1508 * we should *not* decompose it into its individual fences. However, in i915_request_await_dma_fence()
1509 * we don't currently store which mode the fence-array is operating in i915_request_await_dma_fence()
1511 * amdgpu and we should not see any incoming fence-array from in i915_request_await_dma_fence()
1563 * @rq: request we are wishing to use
1583 * @to: request we are wishing to use
1588 * Conceptually we serialise writes between engines inside the GPU.
1589 * We only allow one engine to write into a buffer at any time, but
1590 * multiple readers. To ensure each has a coherent view of memory, we must:
1596 * - If we are a write request (pending_write_domain is set), the new
1688 * we need to be wary in case the timeline->last_request in __i915_request_ensure_ordering()
1728 * we still expect the window between us starting to accept submissions in __i915_request_add_to_timeline()
1736 * is special cased so that we can eliminate redundant ordering in __i915_request_add_to_timeline()
1737 * operations while building the request (we know that the timeline in __i915_request_add_to_timeline()
1738 * itself is ordered, and here we guarantee it). in __i915_request_add_to_timeline()
1740 * As we know we will need to emit tracking along the timeline, in __i915_request_add_to_timeline()
1741 * we embed the hooks into our request struct -- at the cost of in __i915_request_add_to_timeline()
1746 * that we can apply a slight variant of the rules specialised in __i915_request_add_to_timeline()
1748 * If we consider the case of virtual engine, we must emit a dma-fence in __i915_request_add_to_timeline()
1754 * We do not order parallel submission requests on the timeline as each in __i915_request_add_to_timeline()
1758 * timeline we store a pointer to last request submitted in the in __i915_request_add_to_timeline()
1761 * alternatively we use completion fence if gem context has a single in __i915_request_add_to_timeline()
1805 * should we detect the updated seqno part-way through the in __i915_request_commit()
1806 * GPU processing the request, we never over-estimate the in __i915_request_commit()
1828 * request - i.e. we may want to preempt the current request in order in __i915_request_queue()
1829 * to run a high priority dependency chain *before* we can execute this in __i915_request_queue()
1832 * This is called before the request is ready to run so that we can in __i915_request_queue()
1879 * the comparisons are no longer valid if we switch CPUs. Instead of in local_clock_ns()
1880 * blocking preemption for the entire busywait, we can detect the CPU in local_clock_ns()
1907 * Only wait for the request if we know it is likely to complete. in __i915_spin_request()
1909 * We don't track the timestamps around requests, nor the average in __i915_spin_request()
1910 * request length, so we do not have a good indicator that this in __i915_spin_request()
1911 * request will complete within the timeout. What we do know is the in __i915_spin_request()
1912 * order in which requests are executed by the context and so we can in __i915_spin_request()
1924 * rate. By busywaiting on the request completion for a short while we in __i915_spin_request()
1926 * if it is a slow request, we want to sleep as quickly as possible. in __i915_spin_request()
2000 * We must never wait on the GPU while holding a lock as we in i915_request_wait_timeout()
2001 * may need to perform a GPU reset. So while we don't need to in i915_request_wait_timeout()
2002 * serialise wait/reset with an explicit lock, we do want in i915_request_wait_timeout()
2010 * We may use a rather large value here to offset the penalty of in i915_request_wait_timeout()
2017 * short wait, we first spin to see if the request would have completed in i915_request_wait_timeout()
2020 * We need upto 5us to enable the irq, and upto 20us to hide the in i915_request_wait_timeout()
2028 * duration, which we currently lack. in i915_request_wait_timeout()
2040 * We can circumvent that by promoting the GPU frequency to maximum in i915_request_wait_timeout()
2041 * before we sleep. This makes the GPU throttle up much more quickly in i915_request_wait_timeout()
2056 * We sometimes experience some latency between the HW interrupts and in i915_request_wait_timeout()
2063 * If the HW is being lazy, this is the last chance before we go to in i915_request_wait_timeout()
2064 * sleep to catch any pending events. We will check periodically in in i915_request_wait_timeout()
2192 * The prefix is used to show the queue status, for which we use in i915_request_show()