xref: /freebsd/crypto/openssl/doc/designs/quic-design/server/quic-polling.md (revision 24e4dcf4ba5e9dedcf89efd358ea3e1fe5867020)
1QUIC Polling API Design
2=======================
3
4- [QUIC Polling API Design](#quic-polling-api-design)
5  * [Background](#background)
6  * [Requirements](#requirements)
7  * [Reflections on Past Mistakes in Poller Interface Design](#reflections-on-past-mistakes-in-poller-interface-design)
8  * [Example Use Cases](#example-use-cases)
9    + [Use Case A: Simple Blocking or Non-Blocking Application](#use-case-a--simple-blocking-or-non-blocking-application)
10    + [Use Case B: Application-Controlled Hierarchical Polling](#use-case-b--application-controlled-hierarchical-polling)
11  * [Use of Poll Descriptors](#use-of-poll-descriptors)
12  * [Event Types and Representation](#event-types-and-representation)
13  * [Designs](#designs)
14    + [Sketch A: One-Shot/Immediate Mode API](#sketch-a--one-shot-immediate-mode-api)
15    + [Sketch B: Registered/Retained Mode API](#sketch-b--registered-retained-mode-api)
16      - [Use Case Examples](#use-case-examples)
17  * [Proposal](#proposal)
18  * [Custom Poller Methods](#custom-poller-methods)
19    + [Translation](#translation)
20    + [Custom Poller Methods API](#custom-poller-methods-api)
21    + [Internal Polling: Usage within SSL Objects](#internal-polling--usage-within-ssl-objects)
22    + [External Polling: Usage over SSL Objects](#external-polling--usage-over-ssl-objects)
23    + [Future Adaptation to Internal Pollable Resources](#future-adaptation-to-internal-pollable-resources)
24  * [Worked Examples](#worked-examples)
25    + [Internal Polling — Default Poll Method](#internal-polling---default-poll-method)
26    + [Internal Polling — Custom Poll Method](#internal-polling---custom-poll-method)
27    + [External Polling — Immediate Mode](#external-polling---immediate-mode)
28    + [External Polling — Retained Mode](#external-polling---retained-mode)
29    + [External Polling — Immediate Mode Without Event Handling](#external-polling---immediate-mode-without-event-handling)
30  * [Change Notification Callback Mechanism](#change-notification-callback-mechanism)
31  * [Q&A](#q-a)
32  * [Windows support](#windows-support)
33  * [Extra features on QUIC objects](#extra-features-on-quic-objects)
34    + [Low-watermark functionality](#low-watermark-functionality)
35    + [Timeouts](#timeouts)
36    + [Autotick control](#autotick-control)
37
38Background
39----------
40
41An application can create multiple QLSOs (see the [server API design
42document](quic-server-api.md)), each bound to a single read/write network BIO
43pair. Therefore an application needs to be able to poll:
44
45- a QLSO for new incoming connection events;
46- a QCSO for new incoming stream events;
47- a QCSO for new incoming datagram events (when we support the datagram
48  extension);
49- a QCSO for stream creatability events;
50- a QCSO for new connection error events;
51- a QSSO (or QCSO with a default stream attached) for readability events;
52- a QSSO (or QCSO with a default stream attached) for writeability events;
53- non-OpenSSL objects, such as OS socket handles.
54
55Observations:
56
57- There are a large number of event types an application might want to poll on.
58
59- There are different object types we might want to poll on.
60
61- These object types are currently all SSL objects, though we should not assume
62  that this will always be the case.
63
64- The nature of a polling interface means that it must be possible to
65  poll (i.e., block) on all desired objects in a single call. i.e., polling
66  cannot really be composed using multiple sequential calls. Thus, it must be
67  able for an application to request wakeup on the first of an arbitrary subset
68  of any of the above kinds of events in a single polling call.
69
70Requirements
71------------
72
73- **Universal cross-pollability.** Ability to poll on any combination of the above
74  event types and pollable objects in a single poller call.
75
76- **Support external polling.** An application must be able to be in control
77  of its own polling if desired. This means no libssl code does any blocking I/O
78  or poll(2)-style calls; the application handles all poll(2)-like calls to the
79  OS. The application must thereafter be able to find out from us what QUIC
80  objects are ready to be serviced.
81
82- **Support internal polling.** Support a blocking poll(2)-like call provided
83  by libssl for applications that want us to arrange OS polling.
84
85- **Timeouts.** Support for optional timeouts.
86
87- **Multi-threading.** The API must have semantics suitable for performant
88  multi-threaded use, including for concurrent access to the same QUIC objects
89  where supported by our API contract. This includes in particular
90  avoidance of the thundering herd problem.
91
92Desirable:
93
94- Avoid needless impedance discontinuities with COTS polling interfaces (e.g.
95  select(2), poll(2)).
96
97- Efficient and performant design.
98
99- Future extensibility.
100
101Reflections on Past Mistakes in Poller Interface Design
102-------------------------------------------------------
103
104The deficiencies of select(2) are fairly evident and essentially attested to by
105its replacement with poll(2) in POSIX operating systems. To the extent that
106poll(2) has been replaced, it is largely due to the performance issues it poses
107when evaluating large numbers of file descriptors. However, this design
108is also unable to address the thundering herd problem, which we discuss
109subsequently.
110
111The replacements for poll(2) include Linux's epoll(2) and BSD's kqueue(2).
112
113The design of Linux's epoll(2) interface in particular has often been noted to
114contain a large number of design issues:
115
116- It is designed to poll only FDs; this is probably a partial cause behind
117  Linux's adaptation of everything into a FD (PIDs, signals, timers, eventfd,
118  etc.)
119
120- Events registered with epoll are associated with the underlying kernel
121  object (file description), rather than a file descriptor; therefore events can
122  still be received for a FD after the FD is closed(!) by a process, even
123  quoting an incorrect FD in the reported events, unless a process takes care to
124  unregister the FD prior to calling close(2).
125
126- There are separate `EPOLL_CTL_ADD` and `EPOLL_CTL_MOD` calls which are needed
127  to add a new FD registration and modify an existing FD registration, when
128  most of the time what is desired is an “upsert” (update or insert) call. Thus
129  callers have to track whether an FD has already been added or not.
130
131- Only one FD can be registered, modified, or unregistered per syscall, rather
132  than several FDs at once (syscall overhead).
133
134- The design is poorly equipped to handle multithreaded use due to the
135  thundering herd issue. If a single UDP datagram arrives and multiple threads
136  are polling for such an event, only one of these threads should be woken up.
137
138BSD's kqueue(2) has generally been regarded as a good, well thought out design,
139and avoids most or all of these issues.
140
141Example Use Cases
142-----------------
143
144Suppose there exists a hypothetical poll(2)-like API called `SSL_poll`. We
145explore various possible use cases below:
146
147### Use Case A: Simple Blocking or Non-Blocking Application
148
149An application has two QCSOs open each with one QSSO. The QCSOs and QSSOs might
150be in blocking or non-blocking mode. It wants to block until any of these have
151data ready to read (or a connection error) and wants to know which SSL object is
152ready and for what reason. It also wants to timeout after 1 second.
153
154```text
155SSL_poll([qcso0, qcso1, qsso0, qsso1],
156         [READ|ERR, READ|ERR, READ|ERR, READ|ERR], timeout=1sec)
157    → (OK, [qcso0], [READ])
158    | Timeout
159```
160
161### Use Case B: Application-Controlled Hierarchical Polling
162
163An application has two QCSOs open each with one QSSO, all in non-blocking mode.
164It wants to block until any of these have data ready to read (or a connection
165error) and wants to know which SSL object is ready and for what reason, but also
166wants to block until various other application-specific non-QUIC events occur.
167As such, it wants to handle its own polling.
168
169This usage pattern is supported via hierarchical polling:
170
171- An application collects file descriptors and event flags to poll from our QUIC
172  implementation, either by using `SSL_get_[rw]poll_descriptor` and
173  `SSL_net_(read|write)_desired` on each QCSO and deduplicating the results, or
174  using those calls on each QLSO. It also determines the QUIC event handling
175  timeout using `SSL_get_event_timeout`.
176
177- An application does its own polling and timeout handling.
178
179- An application calls `SSL_handle_events` if the polling process indicated
180  an event for either of the QUIC poll descriptors or the QUIC event handling
181  timeout has expired. The call need be made only on an Event Leader but can
182  be made on any QUIC SSL object in the hierarchy.
183
184- An application calls `SSL_poll` similarly to the above example, but with
185  timeout set to 0 (and possibly with some kind of `NO_HANDLE_EVENTS` flag). The
186  purpose of this call is **not** to block but to narrow down what QUIC objects
187  are now ready for servicing.
188
189This demonstrates the principle of hierarchical polling, whereby an application
190can do its own polling and then use a poller in a mode where it always returns
191immediately to narrow things down to specific QUIC objects. This is necessary as
192one QCSO may obviously service many QSSOs, etc.
193
194The requirement implied by this use case are:
195
196- An application must be able to use our polling interface without blocking and
197  without having `SSL_handle_events` or OS polling APIs be called, if desired.
198
199Use of Poll Descriptors
200-----------------------
201
202As discussed in the [I/O Architecture Design Document](../quic-io-arch.md), the
203notion of poll descriptors is used to provide an abstraction over arbitrary
204pollable resources. A `BIO_POLL_DESCRIPTOR` is a tagged union structure which
205can contain different kinds of handles.
206
207This concept maps directly to our capacity for application-level polling of the
208QUIC stack defined in this document, so it is used here. This creates a
209consistent interface around polling.
210
211To date, `BIO_POLL_DESCRIPTOR` structures have been used to contain an OS socket
212file descriptor (`int` for POSIX, `SOCKET` for Win32), which can be used with
213APIs such as `select(2)`. The tagged union structure is extended to support
214specifying a SSL object pointer:
215
216```c
217#define BIO_POLL_DESCRIPTOR_SSL 2   /* (SSL *) */
218
219typedef struct bio_poll_descriptor_st {
220    uint32_t type;
221    union {
222        ...
223        SSL     *ssl;
224    } value;
225} BIO_POLL_DESCRIPTOR;
226```
227
228Event Types and Representation
229------------------------------
230
231Regardless of the API design chosen, event types can first be defined.
232
233### Summary of Event Types
234
235We define the following event types:
236
237- **R (Readable):** There is application data available to be read.
238
239- **W (Writable):** It is currently possible to write more application data.
240
241- **ER (Exception on Read):** The receive part of a stream has been remotely
242  reset via a `RESET_STREAM` frame.
243
244- **EW (Exception on Write):** The send part of a stream has been remotely
245  reset via a `STOP_SENDING` frame.
246
247- **EC (Exception on Connection):** A connection has started terminating
248  (Terminating or Terminated states).
249
250- **EL (Exception on Listener):** A QUIC listener SSL object has failed,
251  for example due to a permanent error on an underlying network BIO.
252
253- **ECD (Exception on Connection Drained):** A connection has *finished*
254  terminating (Terminated state).
255
256- **IC (Incoming Connection):** There is at least one incoming connection
257  available to be popped using `SSL_accept_connection()`.
258
259- **ISB (Incoming Stream — Bidirectional):** There is at least one
260  bidirectional stream incoming and available to be popped using
261  `SSL_accept_stream()`.
262
263- **ISU (Incoming Stream — Unidirectional):** There is at least one
264  unidirectional stream incoming and available to be popped using
265  `SSL_accept_stream()`.
266
267- **OSB (Outgoing Stream — Bidirectional):** It is currently possible
268  to create at least one additional bidirectional stream.
269
270- **OSU (Outgoing Stream — Unidirectional):** It is currently possible
271  to create at least one additional unidirectional stream.
272
273- **F (Failure):** Identifies failure of the `SSL_poll()` mechanism itself.
274
275While this is a fairly large number of event types, there are valid use cases
276for all of these and reasons why they need to be separate from one another. The
277following dialogue explores the various design considerations.
278
279### General Principles
280
281From our discussion below we derive some general principles:
282
283- It is important to provide an adequate granularity of event types so as to
284  ensure an application can avoid wakeups it doesn't want.
285
286- Event types which are not given by a particular object are simply ignored
287  if requested by the application and never raised, similar to `poll(2)`.
288
289- While not all event masks may make sense (e.g. `R` but not `ER`), we do not
290  seek to prescribe combinations at this time. This is dissimilar to `poll(2)`
291  which makes some event types “mandatory”. We may evolve this in future.
292
293- Exception events on some successfully polled resource are not the same as the
294  failure of the `SSL_poll()` mechanism itself (`SSL_poll()` returning 0).
295
296### Header File Definitions
297
298```c
299#define SSL_POLL_EVENT_NONE         0
300
301/*
302 * Fundamental Definitions
303 * -----------------------
304 */
305
306/* F (Failure) */
307#define SSL_POLL_EVENT_F            (1U << 0)
308
309/* EL (Exception on Listener) */
310#define SSL_POLL_EVENT_EL           (1U << 1)
311
312/* EC (Exception on Connection) */
313#define SSL_POLL_EVENT_EC           (1U << 2)
314
315/* ECD (Exception on Connection Drained) */
316#define SSL_POLL_EVENT_ECD          (1U << 3)
317
318/* ER (Exception on Read) */
319#define SSL_POLL_EVENT_ER           (1U << 4)
320
321/* EW (Exception on Write) */
322#define SSL_POLL_EVENT_EW           (1U << 5)
323
324/* R (Readable) */
325#define SSL_POLL_EVENT_R            (1U << 6)
326
327/* W (Writable) */
328#define SSL_POLL_EVENT_W            (1U << 7)
329
330/* IC (Incoming Connection) */
331#define SSL_POLL_EVENT_IC           (1U << 8)
332
333/* ISB (Incoming Stream: Bidirectional) */
334#define SSL_POLL_EVENT_ISB          (1U << 9)
335
336/* ISU (Incoming Stream: Unidirectional) */
337#define SSL_POLL_EVENT_ISU          (1U << 10)
338
339/* OSB (Outgoing Stream: Bidirectional) */
340#define SSL_POLL_EVENT_OSB          (1U << 11)
341
342/* OSU (Outgoing Stream: Unidirectional) */
343#define SSL_POLL_EVENT_OSU          (1U << 12)
344
345/*
346 * Composite Definitions
347 * ---------------------
348 */
349
350/* Read/write. */
351#define SSL_POLL_EVENT_RW           (SSL_POLL_EVENT_R  | SSL_POLL_EVENT_W)
352
353/* Read/write and associated exception event types. */
354#define SSL_POLL_EVENT_RE           (SSL_POLL_EVENT_R  | SSL_POLL_EVENT_ER)
355#define SSL_POLL_EVENT_WE           (SSL_POLL_EVENT_W  | SSL_POLL_EVENT_EW)
356#define SSL_POLL_EVENT_RWE          (SSL_POLL_EVENT_RE | SSL_POLL_EVENT_WE)
357
358/* All exception event types. */
359#define SSL_POLL_EVENT_E            (SSL_POLL_EVENT_EL | SSL_POLL_EVENT_EC \
360                                     | SSL_POLL_EVENT_ER | SSL_POLL_EVENT_EW)
361
362/* Streams and connections. */
363#define SSL_POLL_EVENT_IS           (SSL_POLL_EVENT_ISB | SSL_POLL_EVENT_ISU)
364#define SSL_POLL_EVENT_I            (SSL_POLL_EVENT_IS  | SSL_POLL_EVENT_IC)
365#define SSL_POLL_EVENT_OS           (SSL_POLL_EVENT_OSB | SSL_POLL_EVENT_OSU)
366```
367
368### Discussion
369
370#### `EL`: Exception on Listener
371
372**Q. When is this event type raised?**
373
374A. This event type is raised only on listener (port) failure, which occurs when
375an underlying network BIO encounters a permanent error.
376
377**Q. Does `EL` imply `EC` and `ECD` on all child connections?**
378
379A. Yes. A permanent network BIO failure causes immediate failure of all
380connections dependent on it without first going through `TERMINATING` (except
381possibly in the future with multipath for connections which aren't exclusively
382reliant on that port).
383
384**Q. What SSL object types can raise this event type?**
385
386A. The event type is raised on a QLSO only. This may be revisited in future
387(e.g. having it also be raised on child QCSOs.)
388
389**Q. Why does this event type need to be distinct from `EC`?**
390
391A. An application which is not immediately concerned by the failure of an
392individual connection likely still needs to be notified if an entire port fails.
393
394#### `EC`, `ECD`: Exception on Connection (/Drained)
395
396**Q. Should this event be reported when a connection begins shutdown, begins
397terminating, or finishes terminating?**
398
399A.
400
401- There is a use case to learn when we finish terminating because that is when
402  we can throw away our port safely (raised on `TERMINATED`);
403
404- there is a use case for learning as soon as we start terminating (raised on
405  `TERMINATING` or `TERMINATED`);
406
407- shutdown (i.e., waiting for streams to be done transmitting and then
408  terminating, as per `SSL_shutdown_ex()`) is always initiated by the local
409  application, thus there is no valid need for an application to poll on it.
410
411As such, separate event types must be available both for the start of the
412termination process and the conclusion of the termination process. `EC`
413corresponds to `TERMINATING` or `TERMINATED` and `ECD` corresponds to
414`TERMINATED` only.
415
416**Q. What happens in the event of idle timeout?**
417
418A. Idle timeout is an immediate transition to `TERMINATED` as per the channel
419code.
420
421**Q. Does `ECD` imply `EC`?**
422
423A. Yes, as `EC` is raised in both the `TERMINATING` and `TERMINATED` states.
424
425**Q. Can `ECD` occur without `EC` also occurring?**
426
427A. No, this is not possible.
428
429**Q. Does it make sense for an application to be able to mask this?**
430
431A. Possibly not, though there is nothing particularly requiring us to prevent
432this at this time.
433
434**Q. Does it make sense for an application to be able to listen for this but not
435`EL`?**
436
437A. Yes, since `EL` implies `EC`, it is valid for an application to handle
438port/listener failure purely in terms of the emergent consequence of all
439connections failing.
440
441#### `R`: Readable
442
443Application data or FIN is available for popping via `SSL_read`. Never raised
444after a stream FIN has been retired.
445
446**Q. Is this raised on `RESET_STREAM`?**
447
448A. No. Applications which wish to know of receive stream part failure should
449listen for `ER`.
450
451**Q. Should this be reported if the connection fails?**
452
453A. If there is still application data that can be read, yes. Otherwise, no.
454
455**Q. Should this be reported if shutdown has commenced?**
456
457A. Potentially — if there is still data to be read or more data arrives at the
458last minute.
459
460**Q. What happens if this event is enabled on a send-only stream?**
461
462A. The event is never raised.
463
464**Q. Can this event be received before a connection has been (fully)
465established?**
466
467A. Potentially on the server side in the future due to incoming 0-RTT data.
468
469#### `ER`: Error on Read
470
471Raised only when the receive part of a stream has been reset by the remote peer
472using a `RESET_STREAM` frame.
473
474**Q. Should this be reported if a stream has already been concluded normally and
475that FIN has been retired by the application by calling `SSL_read()`?**
476
477A. No. We consider FIN retirement a success condition for our purposes here, so
478normal stream conclusion and the retirement of that event does not cause ER.
479
480**Q. Should this be reported if the connection fails?**
481
482A. No, because that can be separately determined via the `EC` event and this
483provides greater clarity as to what event is occurring and why. Also, it is
484possible that a connection could fail and some application data is still
485buffered to be read by the application, so `EC` does not imply `!R`.
486
487**Q. Should this be reported if shutdown has been commenced?**
488
489A. No — so long as the connection is alive more data could still be received at
490the last minute.
491
492**Q. What happens if this event is enabled on a send-only stream?**
493
494A. The event is never raised.
495
496**Q. What happens if this event is enabled on a QCSO?**
497
498A. The event is applicable if the QCSO has a default stream attached. Otherwise,
499it is never raised.
500
501**Q. Why is this event separate from `R`?**
502
503A. If an application receives an `R` event, this means more application data is
504available to be read but this may be a business-as-usual circumstance which the
505application does not feel obliged to handle urgently; therefore, it might mask
506`R` in some circumstances.
507
508If a stream reset is triggered by a peer, this needs to be notifiable to an
509application immediately even if the application would not care about more
510ordinary application data arriving on a stream for now.
511
512Therefore, `ER` *must* be separate from `R`, otherwise such applications would
513be unable to prevent spurious wakeups due to normal application data when they
514only care about the possibility of a stream reset.
515
516**Q. Should applications be able to listen on `R` but not `ER`?**
517
518A. This would enable an application to listen for more application data but not
519care about stream resets. This can be permitted for now even if it raises some
520questions about the robustness of such applications.
521
522**Q. How will the future reliable stream resets extension be handled?**
523
524A. `R` will be raised until all data up to the reliable reset point has been
525retired by the application, then `ER` is raised and `R` is never again raised.
526
527**Q. What happens if a stream is reset after the FIN has been retired by the
528application?**
529
530A. The reset is ignored; as per RFC 9000 s. 3.2, the Data Read state is terminal
531and has no `RESET_STREAM` transition. Moreover, after an application is done
532with a stream it can free the QSSO, which means a post-FIN-retirement reset
533cannot be reliably received anyway.
534
535Note that this does not preclude handling of `RESET_STREAM` in the normal way
536for a stream which was concluded normally but where the application has *not*
537yet read all data, which is potentially useful.
538
539#### `W`: Writable
540
541Raised when send buffer space is available, so that it is possible to write
542application data via `SSL_write`.
543
544**Q. Is this raised on `STOP_SENDING`?**
545
546A. No. Applications which wish to know of remotely-triggered send stream part
547reset should listen for `EW`.
548
549**Q. Should this be reported if the connection fails?**
550
551A. No.
552
553**Q. Should this be reported if shutdown has commenced?**
554
555A. No.
556
557**Q. What happens if this event is enabled on a concluded send part?**
558
559A. The event is never raised after the stream is concluded.
560
561**Q. What happens if this event is enabled on a receive-only stream?**
562
563A. The event is never raised.
564
565**Q. What happens if this event is enabled on a QCSO?**
566
567A. The event is applicable if the QCSO has a default stream attached. Otherwise,
568it is never raised.
569
570**Q. Can this event be raised before a connection has been established?**
571
572A. Potentially in the future, if 0-RTT is in use and we have a cached 0-RTT
573session including flow control budgets which establish we have room to write
574more data for 0-RTT.
575
576#### `ER`: Error on Write
577
578Raised only when the send part of a stream has been reset by the remote peer via
579`STOP_SENDING`.
580
581**Q. Should this be raised if a stream's send part has been concluded
582normally?**
583
584A. No. We consider that a success condition for our purposes here.
585
586**Q. Should this be reported if the connection fails?**
587
588A. No, because that can be separately determined via the `EC` event and this
589provides greater clarity as to what event is occurring and why.
590
591**Q. What happens if this event is enabled on a receive-only stream?**
592
593A. The event is never raised.
594
595**Q. Should this be reported if the send part was reset locally via
596`SSL_reset_stream()`?**
597
598A. There is no need for this since the application knows what it did, though
599there is no particular harm in doing so. Current decision: do not report it.
600
601**Q. What if the send part was reset locally and then we also received a
602`STOP_SENDING` frame for it?**
603
604A. If the local application has reset a stream locally, it knows about this fact
605therefore there is no need to raise `EW`. The local reset takes precedence.
606
607**Q. Should this be reported if shutdown has commenced?**
608
609A. Probably not, since shutdown is under local application control and so if an
610application does this it already knows about it. Therefore there is no reason to
611poll for it.
612
613**Q. Why is this event separate from `W`?**
614
615A. It is useful for an application to be able to determine if something odd has
616happened on a stream (like it being reset remotely via `STOP_SENDING`) even if
617it does not currently want to write anything (and therefore is not listening for
618`W`). Since stream resets can occur asynchronously and have application
619protocol-defined semantics, it is important an application can be notified of
620them immediately.
621
622**Q. Should applications be able to listen on `W` but not `EW`?**
623
624A. This would enable an application to listen for the opportunity to write but
625not care about `STOP_SENDING` events. This is probably valid even if it raises
626some questions about the robustness of such applications. It can be allowed,
627even if not recommended (see the General Principles section below).
628
629**Q. How will the future reliable stream resets extension be handled?**
630
631A. The extension does not offer a `STOP_SENDING` equivalent so this is not a
632relevant concern.
633
634#### `ISB`, `ISU`: Incoming Stream Availability
635
636Indicates one or more incoming bidrectional or unidirectional streams which have
637yet to be popped via `SSL_accept_stream()`.
638
639**Q. Is this raised on `RESET_STREAM`?**
640
641A. It is raised on anything that would cause `SSL_accept_stream()` to return a
642stream. This could include a stream which was created by being reset.
643
644**Q. What happens if this event is enabled on a QSSO or QLSO?**
645
646A. The event is never raised.
647
648**Q. If a stream is in the accept queue and then the connection fails, should it
649still be reported?**
650
651A. Yes. The application may be able to accept the stream and pop any application
652data which was already received in future. It is the application's choice to
653listen for EC and have it take priority if it wishes.
654
655**Q. Can this event be raised before a connection has been established?**
656
657A. Client — no. Server — no initially, except possibly during 0-RTT when a
658connection is not considered fully established yet.
659
660#### `OSB`, `OSU`: Outgoing Stream Readiness
661
662Indicates we have the ability, based on current stream count flow control state,
663to initiate an outgoing bidirectional or unidirectional stream.
664
665**Q. Should this be reported if the connection fails?**
666
667A. No.
668
669**Q. Should this be reported if shutdown has commenced?**
670
671A. No.
672
673**Q. What happens if this event is enabled on a QLSO or QSSO?**
674
675A. The event is never raised.
676
677**Q. Can this event be raised before a connection has been established?**
678
679A. Potentially in future, on the client side only, if 0-RTT is in use and we
680have a cached 0-RTT session including flow control budgets which establish we
681have room to write more data for 0-RTT.
682
683#### `IC`: Incoming Connection
684
685Indicates at least one incoming connection is available to be popped using
686`SSL_accept_connection()`.
687
688**Q. Should this be reported if the port fails?**
689
690A. Potentially. A connection could have already been able to receive application
691data prior to it being popped from the accept queue by the application calling
692`SSL_accept_connection()`. Whether or not application data was received on any
693stream, a successfully established connection should be reported so that the
694application knows it happened.
695
696**Q. Can this event be raised before a connection has been established?**
697
698A. Potentially in future, if 0-RTT is in use; we could receive connection data
699before the connection process is complete (handshake confirmation).
700
701**Q. What happens if this event is enabled on a QCSO or QSSO?**
702
703A. The event is never raised.
704
705#### `F`: Failure
706
707Indicates that the `SSL_poll` mechanism itself has failed. This may be due to
708specifying an unsupported `BIO_POLL_DESCRIPTOR` type, or an unsupported `SSL`
709object, or so on. This indicates a caller usage error. It is wholly distinct
710from an exception condition on a successfully polled resource (e.g. `ER`, `EW`,
711`EC`, `EP`).
712
713**Q. Can this event type be masked?**
714
715A. No — this event type may always be raised even if not requested. Requesting
716it is a no-op (similar to `poll(2)` `POLLERR`). This is the only non-maskable
717event type.
718
719**Q. What happens if an `F` event is raised?**
720
721The `F` event is reported in one or more elements of the items array. The
722`result_count` output value reflects the number of items in the items array with
723non-zero `revents` fields, as always. This includes any `F` events (there may be
724multiple), and any non-`F` events which were output for earlier entries in the
725items array (where a `F` event occurs for a subsequent entry in the items
726array).
727
728`SSL_poll()` then returns 0. The ERR stack *always* has at least one entry
729placed on it, which reflects the first `F` event which was output. Any
730subsequent `F` events do not have error information available.
731
732Designs
733-------
734
735Two designs are considered here:
736
737- Sketch A: An “immediate-mode” poller interface similar to poll(2).
738
739- Sketch B: A “registered” poller interface similar to BSD's kqueue(2) (or Linux's
740  epoll(2)).
741
742Sketch A is simpler but is likely to be less performant. Sketch B is a bit more
743elaborate but can offer more performance. It is possible to offer both APIs if
744desired.
745
746### Sketch A: One-Shot/Immediate Mode API
747
748We define a common structure for representing polled events:
749
750```c
751typedef struct ssl_poll_item_st {
752    BIO_POLL_DESCRIPTOR desc;
753    uint64_t            events, revents;
754} SSL_POLL_ITEM;
755```
756
757This structure works similarly to the `struct pollfd` structure used by poll(2).
758`desc` describes the object to be polled, `events` is a bitmask of
759`SSL_POLL_EVENT` values describing what events to listen for, and `revents` is
760a bitmask of zero or more events which are actually raised.
761
762Polling implementations are only permitted to modify the `revents` field in a
763`SSL_POLL_ITEM` structure passed by the caller.
764
765```c
766/*
767 * SSL_poll
768 * --------
769 *
770 * SSL_poll evaluates each of the items in the given array of SSL_POLL_ITEMs
771 * and determines which poll items have relevant readiness events raised. It is
772 * similar to POSIX poll(2).
773 *
774 * The events field of each item specifies the events the caller is interested
775 * in and is the sum of zero or more SSL_POLL_EVENT_* values. When using
776 * SSL_poll in a blocking fashion, only the occurrence of one or more events
777 * specified in the events field, or a timeout or failure of the polling
778 * mechanism, will cause SSL_poll to return.
779 *
780 * When SSL_poll returns, the revents field is set to the events actually active
781 * on an item. This may or may not also include events which were not requested
782 * in the events field.
783 *
784 * Specifying an item with an events field of zero is a no-op; the array entry
785 * is ignored. Unlike poll(2), error events are not automatically included
786 * and it is the application's responsibility to request them.
787 *
788 * Each item to be polled is described by a BIO_POLL_DESCRIPTOR. A
789 * BIO_POLL_DESCRIPTOR is an extensible tagged union structure which describes
790 * some kind of object which SSL_poll might (or might not) know how to poll.
791 * Currently, SSL_poll can poll the following kinds of BIO_POLL_DESCRIPTOR:
792 *
793 *   BIO_POLL_DESCRIPTOR_TYPE_SOCK_FD   (int fd)    -- OS-pollable sockets only
794 *      Note: Some OSes consider sockets to be a different kind of handle type
795 *            to ordinary file handles. Therefore, this type is used
796 *            specifically for OS socket handles only (e.g. SOCKET on Win32).
797 *            It cannot be used to poll other OS handle types.
798 *
799 *   BIO_POLL_DESCRIPTOR_TYPE_SSL       (SSL *ssl)  -- QUIC SSL objects only
800 *
801 * num_items is the number of items in the passed array.
802 *
803 * stride must be set to sizeof(SSL_POLL_ITEM).
804 *
805 * timeout specifies how long to wait for at least one passed SSL_POLL_ITEM to
806 * have at least one event to report. If it is set to NULL, this function does
807 * not time out and waits forever. Otherwise, it is a timeout value expressing a
808 * timeout duration in microseconds. The value expresses a duration, not a
809 * deadline.
810 *
811 * This function can be used in a non-blocking mode where it will provide
812 * information on readiness for each of the items and then return immediately,
813 * even if no item is ready. To facilitate this, pass a zero-value timeout
814 * structure.
815 *
816 * If num_items is set to zero, this function returns with a timeout condition
817 * after the specified timeout, or immediately with failure if no timeout
818 * was requested (as otherwise it would logically deadlock).
819 *
820 * flags must be zero or more SSL_POLL_FLAG values:
821 *
822 *   - SSL_POLL_FLAG_NO_HANDLE_EVENTS:
823 *       This may be used only when a zero timeout is specified (non-blocking
824 *       mode). Ordinarily in this case, relevant SSL objects have internal
825 *       event processing performed as this may help them to become ready.
826 *       This may also cause network I/O to occur. If this flag is specified,
827 *       no such processing will be performed. This means that SSL_poll
828 *       will only report pre-existing readiness events for the specified objects.
829 *
830 *       If timeout is NULL or non-zero, specifying this flag is an error.
831 *
832 * Regardless of whether this function succeeds, times out, or fails for other
833 * reasons, the revents field of each item is set to a valid value reflecting
834 * the current readiness, or to 0, and *result_count (if non-NULL) is written
835 * with the total number of items having an revents field, which,
836 * when masked with the corresponding events field, is nonzero at the time the
837 * function returns. Note that these entries in the items array may not be
838 * consecutive or at the start of the array.
839 *
840 * There is a distinction between exception conditions on a resource which is
841 * polled (such as a connection being terminated) and an failure in the polling
842 * code itself. A mere exception condition is not considered a failure of
843 * the polling mechanism itself and does not call SSL_poll to return 0. If
844 * the polling mechanism itself fails (for example, because an unsupported
845 * BIO_POLL_DESCRIPTOR type or SSL object type is passed), the F event type
846 * is raised on at least one poll item and the function returns 0. At least
847 * one ERR stack entry will be raised describing the cause of the first F event
848 * for the input items. Any additional F events do not have their error
849 * information reported.
850 *
851 * Returns 1 on success or timeout, and 0 on failure. Timeout conditions can
852 * be distinguished by the *result_count field being written as 0.
853 *
854 * This function does not modify any item's events or desc field.
855 * The initial value of an revents field when this function is called is of no
856 * consequence.
857 *
858 * This is a "one-shot" API; greater performance may be obtained from using
859 * an API which requires advanced registration of pollables.
860 */
861#define SSL_POLL_FLAG_NO_HANDLE_EVENTS      (1U << 0)
862
863int SSL_poll(SSL_POLL_ITEM *item,
864             size_t num_items, size_t stride,
865             const struct timeval *timeout,
866             uint64_t flags,
867             size_t *result_count);
868```
869
870**Performance and thundering-herd issues.** There are two intrinsic performance
871issues with this design:
872
873- Because it does not involve advance registration of things being polled,
874  the entire object list needs to be scanned in each call, and there is
875  no real opportunity to maintain internal state which would make polling
876  more efficient.
877
878- Because this design is inherently “stateless”, it cannot really solve
879  the thundering herd problem in any reasonable way. In other words, if n
880  threads are all calling `SSL_poll` on the same set of objects and events,
881  there is no way for an event to be efficiently distributed to just one of
882  those threads.
883
884  This limitation is intrinsic to the design of `poll(2)` and poll-esque APIs.
885  It is not necessarily a reason not to offer this rather simple API, as use of
886  poll(2) and poll(2)-like APIs is widespread and users are likely to appreciate
887  an API which does not provide significant impedance discontinuities to
888  applications which use select/poll, even if those applications suffer impaired
889  performance as a result.
890
891### Sketch B: Registered/Retained Mode API
892
893Alternatively, an API which requires advance registration of pollable objects is
894proposed.
895
896Attention is called to certain design features:
897
898- This design can solve the thundering herd problem, achieving efficient
899  distribution of work to threads by auto-disabling an event mask bit after
900  distribution of the readiness event to one thread currently calling the poll
901  function.
902
903- The fundamental call, `SSL_POLL_GROUP_change_poll`, combines the operations
904  of adding/removing/changing registered events and actually polling. This is
905  important as due to the herd-avoidance design above, events can be and are
906  automatically disarmed and need rearming as frequently as the poll function is
907  called. This streamlined design therefore enhances efficiency. This design
908  aspect is inspired directly by kqueue.
909
910- Addition of registered events and mutation of existing events uses an
911  idempotent upsert-type operation, which is what most applications actually
912  want (unlike e.g. epoll).
913
914```c
915typedef struct ssl_poll_group_st SSL_POLL_GROUP;
916
917/*
918 * The means of obtaining an SSL_POLL_GROUP instance is discussed
919 * subsequently. For now, you can imagine the following strawman function:
920 *
921 *     SSL_POLL_GROUP *SSL_POLL_GROUP_new(void);
922 *
923 */
924
925void SSL_POLL_GROUP_free(SSL_POLL_GROUP *pg);
926
927#define SSL_POLL_EVENT_FLAG_NONE            0
928
929/*
930 * Registered event is deleted (not disabled) after one event fires.
931 */
932#define SSL_POLL_EVENT_FLAG_ONESHOT         (1U << 0)
933
934/*
935 * Work queue dispatch (anti-thundering herd) - dispatch to one concurrent call
936 * and set DISABLED.
937 */
938#define SSL_POLL_EVENT_FLAG_DISPATCH        (1U << 1)
939
940/* Registered event is disabled and will not return events. */
941#define SSL_POLL_EVENT_FLAG_DISABLED        (1U << 2)
942
943/* Delete a registered event. */
944#define SSL_POLL_EVENT_FLAG_DELETE          (1U << 3)
945
946/* Change previous cookie value. Cookie is normally only set on initial add. */
947#define SSL_POLL_EVENT_FLAG_UPDATE_COOKIE   (1U << 4)
948
949/*
950 * A structure to request registration, deregistration or modification of a
951 * registered event.
952 */
953typedef struct ssl_poll_change_st {
954    /* The pollable object to be polled. */
955    BIO_POLL_DESCRIPTOR desc;
956    size_t              instance;
957
958    /* An opaque application value passed through in any reported event. */
959    void                *cookie;
960
961    /*
962     * Disables and enables event types. Any events in disable_mask are
963     * disabled, and then any events in enable_events are enabled. disable_events
964     * is processed before enable_events, therefore the enabled event types may
965     * be set (ignoring any previous value) by setting disable_events to
966     * UINT64_MAX and enable_events to the desired event types. Non-existent
967     * event types are ignored.
968     */
969    uint64_t            disable_events, enable_events;
970
971    /*
972     * Enables and disables registered event flags in the same vein as
973     * disable_events and enable_events manages registered event types.
974     * This is used to disable and enable SSL_POLL_EVENT_FLAG bits.
975     */
976    uint64_t            disable_flags, enable_flags;
977} SSL_POLL_CHANGE;
978
979typedef struct ssl_poll_event_st {
980    BIO_POLL_DESCRIPTOR desc;
981    size_t              instance;
982    void                *cookie;
983    uint64_t            revents;
984} SSL_POLL_EVENT;
985
986/*
987 * SSL_POLL_GROUP_change_poll
988 * --------------------------
989 *
990 * This function performs the following actions:
991 *
992 *   - firstly, if num_changes is non-zero, it updates registered events on the
993 *     specified poll group, adding, removing and modifying registered events as
994 *     specified by the changes in the array given in changes;
995 *
996 *   - secondly, if num_events is non-zero, it polls for any events that have
997 *     arisen that match the registered events, and places up to num_events such
998 *     events in the array given in events.
999 *
1000 * This function may be used for either of these effects, or both at the same
1001 * time. Changes to event registrations are applied before events are returned.
1002 *
1003 * If num_changes is non-zero, change_stride must be set to
1004 * sizeof(SSL_POLL_CHANGE).
1005 *
1006 * If num_events is non-zero, event_stride must be set to
1007 * sizeof(SSL_POLL_EVENT).
1008 *
1009 * If timeout is NULL, this function blocks forever until an applicable event
1010 * occurs. If it points to a zero value, this function never blocks and will
1011 * apply given changes, return any applicable events, if any, and then return
1012 * immediately. Note that any requested changes are always applied regardless of
1013 * timeout outcome.
1014 *
1015 * flags must be zero or more SSL_POLL_FLAGS. If SSL_POLL_FLAG_NO_HANDLE_EVENTS
1016 * is set, polled objects do not automatically have I/O performed which might
1017 * enable them to raise applicable events. If SSL_POLL_FLAG_NO_POLL is set,
1018 * changes are processed but no polling is performed. This is useful if it is
1019 * desired to provide an event array to allow errors when processing changes
1020 * to be received. Passing SSL_POLL_FLAG_NO_POLL forces a timeout of 0
1021 * (non-blocking mode); the timeout argument is ignored.
1022 *
1023 * The number of events written to events is written to *num_events_out,
1024 * regardless of whether this function succeeds or fails.
1025 *
1026 * Returns 1 on success or 0 on failure. A timeout is considered a success case
1027 * which returns 0 events; thus in this case, the function returns 1 and
1028 * *num_events_out is written as 0.
1029 *
1030 * This function differs from poll-style interfaces in that the events reported
1031 * in the events array bear no positional relationship to the registration
1032 * changes indicated in changes. Thus the length of these arrays is unrelated.
1033 *
1034 * An error may occur when processing a change. If this occurs, an entry
1035 * describing the error is written out as an event to the event array. The
1036 * function still returns success, unless there is no room in the events array
1037 * for the error (for example, if num_events is 0), in which case failure is
1038 * returned.
1039 *
1040 * When an event is output from this function, desc is set to the original
1041 * registered poll descriptor, cookie is set to the cookie value which was
1042 * passed in when registering the event, and revents is set to any applicable
1043 * events, which might be a superset of the events which were actually asked
1044 * for. (However, only events actually asked for at registration time will
1045 * cause a blocking call to SSL_POLL_GROUP_change_poll to return.)
1046 *
1047 * An event structure which represents a change processing error will have the
1048 * psuedo-event SSL_POLL_EVENT_POLL_ERROR set, with copies of the desc and
1049 * cookie provided. This is not a real event and cannot be requested in a
1050 * change.
1051 *
1052 * The 'primary key' for any registered event is the tuple (poll descriptor,
1053 * instance). Changing an existing event is done by passing a change structure
1054 * with the same values for the poll descriptor and instance. The instance field
1055 * can be used to register multiple separate registered events on the same
1056 * poll descriptor. Many applications will be able to use a instance field of
1057 * 0 in all circumstances.
1058 *
1059 * To unregister an event, pass a matching poll descriptor and instance value
1060 * and set DELETE in enable_flags.
1061 *
1062 * It is recommended that callers delete a registered event from a poll group
1063 * before freeing the underlying resource. If an object which is registered
1064 * inside a poll group is freed, the semantics depend on the type of the poll
1065 * descriptor used. For example, libssl has no safe way to detect if an OS
1066 * socket poll descriptor is closed, therefore it is essential callers
1067 * deregister such registered events prior to closing the socket handle.
1068 *
1069 * Other poll descriptor types may implement automatic deregistration from poll
1070 * groups which they are registered into when they are freed. This varies by
1071 * poll descriptor type. However, even if a poll descriptor type does implement
1072 * this, applications must still ensure no events in an SSL_POLL_EVENT
1073 * structure recorded from a previous call to this function are left over, which
1074 * may still reference that poll descriptor. Therefore, applications must still
1075 * excercise caution when freeing resources which are registered, or which were
1076 * previously registered in a poll group.
1077 */
1078#define SSL_POLL_FLAG_NO_HANDLE_EVENTS       (1U << 0)
1079#define SSL_POLL_FLAG_NO_POLL                (1U << 1)
1080
1081#define SSL_POLL_EVENT_POLL_ERROR            (((uint64_t)1) << 63)
1082
1083int SSL_POLL_GROUP_change_poll(SSL_POLL_GROUP *pg,
1084
1085                               const SSL_POLL_CHANGE *changes,
1086                               size_t num_changes,
1087                               size_t change_stride,
1088
1089                               SSL_POLL_EVENT *events,
1090                               size_t num_events,
1091                               size_t event_stride,
1092
1093                               const struct timeval *timeout,
1094                               uint64_t flags,
1095                               size_t *num_events_out);
1096
1097/* These macros may be used if only one function is desired. */
1098#define SSL_POLL_GROUP_change(pg, changes, num_changes, flags)      \
1099    SSL_POLL_GROUP_change_poll((pg), (changes), (num_changes),      \
1100                               sizeof(SSL_POLL_CHANGE),             \
1101                               NULL, 0, 0, NULL, (flags), NULL)
1102
1103#define SSL_POLL_GROUP_poll(pg, items, num_items, timeout, flags, result_c) \
1104    SSL_POLL_GROUP_change_poll((pg), NULL, 0, 0,                            \
1105                               (items), (num_items), sizeof(SSL_POLL_ITEM), \
1106                               (timeout), (flags), (result_c))
1107
1108/* Convenience inlines. */
1109static ossl_inline ossl_unused void SSL_POLL_CHANGE_set(SSL_POLL_CHANGE *chg,
1110                                                        BIO_POLL_DESCRIPTOR desc,
1111                                                        size_t instance,
1112                                                        void *cookie,
1113                                                        uint64_t events,
1114                                                        uint64_t flags)
1115{
1116    chg->desc           = desc;
1117    chg->instance       = instance;
1118    chg->cookie         = cookie;
1119    chg->disable_events = UINT64_MAX;
1120    chg->enable_events  = events;
1121    chg->disable_flags  = UINT64_MAX;
1122    chg->enable_flags   = flags;
1123}
1124
1125static ossl_inline ossl_unused void SSL_POLL_CHANGE_delete(SSL_POLL_CHANGE *chg,
1126                                                           BIO_POLL_DESCRIPTOR desc,
1127                                                           size_t instance)
1128{
1129    chg->desc           = desc;
1130    chg->instance       = instance;
1131    chg->cookie.ptr     = NULL;
1132    chg->disable_events = 0;
1133    chg->enable_events  = 0;
1134    chg->disable_flags  = 0;
1135    chg->enable_flags   = SSL_POLL_EVENT_FLAG_DELETE;
1136}
1137
1138static ossl_inline ossl_unused void
1139SSL_POLL_CHANGE_chevent(SSL_POLL_CHANGE *chg,
1140                        BIO_POLL_DESCRIPTOR desc,
1141                        size_t instance,
1142                        uint64_t disable_events,
1143                        uint64_t enable_events)
1144{
1145    chg->desc           = desc;
1146    chg->instance       = instance;
1147    chg->cookie.ptr     = NULL;
1148    chg->disable_events = disable_events;
1149    chg->enable_events  = enable_events;
1150    chg->disable_flags  = 0;
1151    chg->enable_flags   = 0;
1152}
1153
1154static ossl_inline ossl_unused void
1155SSL_POLL_CHANGE_chflag(SSL_POLL_CHANGE *chg,
1156                       BIO_POLL_DESCRIPTOR desc,
1157                       size_t instance,
1158                       uint64_t disable_flags,
1159                       uint64_t enable_flags)
1160{
1161    chg->desc           = desc;
1162    chg->instance       = instance;
1163    chg->cookie.ptr     = NULL;
1164    chg->disable_events = 0;
1165    chg->enable_events  = 0;
1166    chg->disable_flags  = disable_flags;
1167    chg->enable_flags   = enable_flags;
1168}
1169
1170static ossl_inline ossl_unused BIO_POLL_DESCRIPTOR
1171SSL_as_poll_descriptor(SSL *s)
1172{
1173    BIO_POLL_DESCRIPTOR d;
1174
1175    d.type      = BIO_POLL_DESCRIPTOR_TYPE_SSL;
1176    d.value.ssl = s;
1177
1178    return d;
1179}
1180```
1181
1182#### Use Case Examples
1183
1184```c
1185/*
1186 * Scenario 1: Register multiple events on different QUIC objects and
1187 * immediately start blocking for events.
1188 */
1189{
1190    int rc;
1191
1192    SSL *qconn1 = get_some_quic_conn();
1193    SSL *qconn2 = get_some_quic_conn();
1194    SSL *qstream1 = get_some_quic_stream();
1195    SSL *qlisten1 = get_some_quic_listener();
1196    int socket = get_some_socket_handle();
1197
1198    SSL_POLL_GROUP *pg = SSL_POLL_GROUP_new();
1199    SSL_POLL_CHANGE changes[32], *chg = changes;
1200    SSL_POLL_EVENT events[32];
1201    void *cookie = some_app_ptr;
1202    size_t i, nchanges = 0, nevents = 0;
1203
1204    /* Wait for an incoming stream or conn error on conn 1 and 2. */
1205    SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qconn1), 0, cookie,
1206                        SSL_POLL_EVENT_IS | SSL_POLL_EVENT_E, 0);
1207    ++nchanges;
1208
1209    SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qconn2), 0, cookie,
1210                        SSL_POLL_EVENT_IS | SSL_POLL_EVENT_E, 0);
1211    ++nchanges;
1212
1213    /* Wait for incoming data (or reset) on stream 1. */
1214    SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qstream1), 0, cookie,
1215                        SSL_POLL_EVENT_R, 0);
1216    ++nchanges;
1217
1218    /* Wait for an incoming connection. */
1219    SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qlisten1), 0, cookie,
1220                        SSL_POLL_EVENT_IC, 0);
1221    ++nchanges;
1222
1223    /* Also poll on an ordinary OS socket. */
1224    SSL_POLL_CHANGE_set(chg++, OS_socket_as_poll_descriptor(socket), 0, cookie,
1225                        SSL_POLL_EVENT_RW, 0);
1226    ++nchanges;
1227
1228    /* Immediately register all of these events and wait for an event. */
1229    rc = SSL_POLL_GROUP_change_poll(pg,
1230                                    changes, nchanges, sizeof(changes[0]),
1231                                    events, OSSL_NELEM(events), sizeof(events[0]),
1232                                    NULL, 0, &nevents);
1233    if (!rc)
1234        return 0;
1235
1236    for (i = 0; i < nevents; ++i) {
1237        if ((events[i].revents & SSL_POLL_EVENT_POLL_ERROR) != 0)
1238            return 0;
1239
1240        process_event(&events[i]);
1241    }
1242
1243    return 1;
1244}
1245
1246void process_event(const SSL_POLL_EVENT *event)
1247{
1248    APP_INFO *app = event->cookie.ptr;
1249
1250    do_something(app, event->revents);
1251}
1252
1253/*
1254 * Scenario 2: Test for pre-existing registered events in non-blocking mode
1255 * as part of a hierarchical polling strategy.
1256 */
1257{
1258   int rc;
1259
1260   SSL_POLL_EVENT events[32],
1261   size_t i, nevents = 0;
1262   struct timeval timeout = { 0 };
1263
1264   /*
1265    * Find out what is ready without blocking.
1266    * Assume application already did I/O event handling and do not tick again.
1267    */
1268   rc = SSL_POLL_GROUP_poll(pg, events, OSSL_NELEM(events),
1269                            &timeout, SSL_POLL_FLAG_NO_HANDLE_EVENTS,
1270                            &nevents);
1271   if (!rc)
1272       return 0;
1273
1274   for (i = 0; i < nevents; ++i)
1275       process_event(&events[i]);
1276}
1277
1278/*
1279 * Scenario 3: Remove one event but don't poll.
1280 */
1281{
1282    int rc;
1283    SSL_POLL_CHANGE changes[1], *chg = changes;
1284    size_t nchanges = 0;
1285
1286    SSL_POLL_CHANGE_delete(chg++, SSL_as_poll_descriptor(qstream1), 0);
1287    ++nchanges;
1288
1289    if (!SSL_POLL_GROUP_change(pg, changes, nchanges, 0))
1290        return 0;
1291
1292    return 1;
1293}
1294
1295/*
1296 * Scenario 4: Efficient (non-thundering-herd) multi-thread dispatch with
1297 * efficient rearm.
1298 *
1299 * Assume all registered events have SSL_POLL_EVENT_FLAG_DISPATCH set on them.
1300 *
1301 * Assume this function is being called concurrently from a large number of
1302 * threads.
1303 */
1304{
1305    int rc;
1306    SSL_POLL_CHANGE changes[32], *chg;
1307    SSL_POLL_EVENT events[32];
1308    size_t i, nchanges, nevents = 0;
1309
1310    /*
1311     * This will block, and then the first event to occur will be returned on
1312     * *one* thread, and the event will be disabled. Other threads will keep
1313     * waiting.
1314     */
1315    if (!SSL_POLL_GROUP_poll(pg, events, OSSL_NELEM(events),
1316                             NULL, 0, &nevents))
1317        return 0;
1318
1319    /* Application event loop */
1320    while (!app_should_stop()) {
1321        chg = changes;
1322        nchanges = 0;
1323
1324        for (i = 0; i < nevents; ++i) {
1325            process_event(&events[i]); /* do something in application */
1326
1327            /* We have processed the event so now reenable it. */
1328            SSL_POLL_CHANGE_chflag(chg++, events[i].desc, events[i].instance,
1329                                   SSL_POLL_EVENT_FLAG_DISABLE, 0);
1330            ++nchanges;
1331        }
1332
1333        /* Reenable any event we processed and go to sleep again. */
1334        if (!SSL_POLL_GROUP_change_poll(pg, changes, nchanges, sizeof(changes[0]),
1335                                        events, OSSL_NELEM(events), sizeof(events[0]),
1336                                        NULL, 0, &nevents))
1337            return 0;
1338    }
1339
1340    return 1;
1341}
1342```
1343
1344Proposal
1345--------
1346
1347It is proposed to offer both of these API sketches. The simple `SSL_poll` API is
1348compelling for simple use cases, and both APIs have merits and cases where they
1349will be highly desirable. The ability of the registered API to support
1350thundering herd mitigation is of particular importance.
1351
1352Custom Poller Methods
1353---------------------
1354
1355It is also desirable to support custom poller methods provided by an
1356application. This allows an application to support custom poll descriptor types
1357and provide a way to poll on those poll descriptors. For example, an application
1358could provide a BIO_dgram_pair (which ordinarily cannot support polling and
1359cannot be used with the blocking API) and a custom poller which can poll some
1360opaque poll descriptor handle provided by the application (which might be e.g.
1361based on condition variables or so on).
1362
1363We therefore now discuss modifications to the above APIs to support custom
1364poller methods.
1365
1366### Translation
1367
1368When a poller polls a QUIC SSL object, it must figure out how to block on this
1369object. This means it must ultimately make some blocking poll(2)-like call to
1370the OS. Since an OS only knows how to block on resources it issues, this means
1371that all resources such as QUIC SSL objects must be reduced into OS resources
1372before polling can occur.
1373
1374This process occurs via translation. Suppose `SSL_poll` is called with a QCSO,
1375two QSSOs on that QCSO, and an OS socket handle:
1376
1377  - `SSL_poll` will convert the poll descriptors pointing to SSL objects
1378    to network-side poll descriptors by calling `SSL_get_[rw]poll_descriptor`,
1379    which calls through to `BIO_get_[rw]poll_descriptor`;
1380
1381  - The yielded poll descriptors are then reduced to a set of unique poll
1382    descriptors (for example, both QSSOs will have the same underlying
1383    poll descriptor, so duplicates are removed);
1384
1385  - The OS socket handle poll descriptor which was passed in is simply
1386    passed through as-is;
1387
1388  - The resulting set of poll descriptors is then passed on to an underlying
1389    poller implementation, which might be based on e.g. poll(2). But it might
1390    also be a custom method provided by an application if one of the SSL objects
1391    resolved to a custom poll descriptor type.
1392
1393  - When the underlying poll call returns, reverse translation occurs.
1394    Poll descriptors which have become ready in some aspect and which were
1395    translated are mapped back to the input SSL objects which they were derived
1396    from (since duplicates are removed, this may be multiple SSL objects per
1397    poll descriptor). This set of SSL objects is reduced to a unique set of
1398    event leaders and those event leaders are ticked. The QUIC SSL objects are
1399    then probed for their current state to determine current readiness and this
1400    information is returned.
1401
1402The above scheme also means that the retained-mode polling API can be more
1403efficient since translation information can be retained internally rather than
1404being re-derived every time.
1405
1406### Custom Poller Methods API
1407
1408There are two kinds of polling that occur:
1409
1410- Internal polling for blocking API: This is where an SSL object automatically
1411  polls internally to support blocking API operation. If an underlying network
1412  BIO cannot support a poll descriptor which we understand how to poll on, we
1413  cannot support blocking API operation. We can support a poll descriptor if it
1414  is an OS socket handle, or if a custom poller is configured that knows how to
1415  poll it.
1416
1417- External polling support: This is where an application calls a polling API.
1418
1419Firstly, the `SSL_POLL_METHOD` object is defined abstractly as follows:
1420
1421```c
1422/* API (Psuedocode) */
1423#define SSL_POLL_METHOD_CAP_IMMEDIATE  (1U << 0) /* supports immediate mode */
1424#define SSL_POLL_METHOD_CAP_RETAINED   (1U << 1) /* supports retained mode */
1425
1426interface SSL_POLL_METHOD {
1427    int free(void);
1428    int up_ref(void);
1429
1430    uint64_t get_caps(void);
1431    int supports_poll_descriptor(const BIO_POLL_DESCRIPTOR *desc);
1432    int poll(/* as shown for SSL_poll */);
1433    SSL_POLL_GROUP *create_poll_group(const OSSL_PARAM *params);
1434}
1435
1436interface SSL_POLL_GROUP {
1437    int free(void);
1438    int up_ref(void);
1439
1440    int change_poll(/* as shown for SSL_POLL_GROUP_change_poll */);
1441}
1442```
1443
1444This interface is realised as follows:
1445
1446```c
1447typedef struct ssl_poll_method_st SSL_POLL_METHOD;
1448typedef struct ssl_poll_group_st SSL_POLL_GROUP;
1449
1450typedef struct ssl_poll_method_funcs_st {
1451    int (*free)(SSL_POLL_METHOD *self);
1452    int (*up_ref)(SSL_POLL_METHOD *self);
1453
1454    uint64_t (*get_caps)(const SSL_POLL_GROUP *self);
1455    int (*poll)(SSL_POLL_METHOD *self, /* as shown for SSL_poll */);
1456    SSL_POLL_GROUP *(*create_poll_group)(SSL_POLL_METHOD *self,
1457                                         const OSSL_PARAM *params);
1458} SSL_POLL_METHOD_FUNCS;
1459
1460SSL_POLL_METHOD *SSL_POLL_METHOD_new(const SSL_POLL_METHOD_FUNCS *funcs,
1461                                     size_t funcs_len, size_t data_len);
1462
1463void *SSL_POLL_METHOD_get0_data(const SSL_POLL_METHOD *self);
1464
1465int SSL_POLL_METHOD_free(SSL_POLL_METHOD *self);
1466void SSL_POLL_METHOD_do_free(SSL_POLL_METHOD *self);
1467int SSL_POLL_METHOD_up_ref(SSL_POLL_METHOD *self);
1468
1469uint64_t SSL_POLL_METHOD_get_caps(const SSL_POLL_METHOD *self);
1470int SSL_POLL_METHOD_supports_poll_descriptor(SSL_POLL_METHOD *self,
1471                                             const BIO_POLL_DESCRIPTOR *desc);
1472int SSL_POLL_METHOD_poll(SSL_POLL_METHOD *self, ...);
1473SSL_POLL_GROUP *SSL_POLL_METHOD_create_poll_group(SSL_POLL_METHOD *self,
1474                                                  const OSSL_PARAM *params);
1475
1476typedef struct ssl_poll_group_funcs_st {
1477    int (*free)(SSL_POLL_GROUP *self);
1478    int (*up_ref)(SSL_POLL_GROUP *self);
1479
1480    int (*change_poll)(SSL_POLL_GROUP *self, /* as shown for change_poll */);
1481} SSL_POLL_GROUP_FUNCS;
1482
1483SSL_POLL_GROUP *SSL_POLL_GROUP_new(const SSL_POLL_GROUP_FUNCS *funcs,
1484                                     size_t funcs_len, size_t data_len);
1485void *SSL_POLL_GROUP_get0_data(const SSL_POLL_GROUP *self);
1486
1487int SSL_POLL_GROUP_free(SSL_POLL_GROUP *self);
1488int SSL_POLL_GROUP_up_ref(SSL_POLL_GROUP *self);
1489int SSL_POLL_GROUP_change_poll(SSL_POLL_GROUP *self,
1490                               /* as shown for change_poll */);
1491```
1492
1493Here is how an application might define and create a `SSL_POLL_METHOD` instance
1494of its own:
1495
1496```c
1497struct app_poll_method_st {
1498    uint32_t refcount;
1499} APP_POLL_METHOD;
1500
1501static int app_poll_method_free(SSL_POLL_METHOD *self)
1502{
1503    APP_POLL_METHOD *data = SSL_POLL_METHOD_get0_data(self);
1504
1505    if (!--data->refcount)
1506        SSL_POLL_METHOD_do_free(self);
1507
1508    return 1;
1509}
1510
1511static int app_poll_method_up_ref(SSL_POLL_METHOD *self)
1512{
1513    APP_POLL_METHOD *data = SSL_POLL_METHOD_get0_data(self);
1514
1515    ++data->refcount;
1516
1517    return 1;
1518}
1519
1520static uint64_t app_poll_method_get_caps(const SSL_POLL_METHOD *self)
1521{
1522    return SSL_POLL_METHOD_CAP_IMMEDIATE;
1523}
1524
1525static int app_poll_method_supports_poll_descriptor(SSL_POLL_METHOD *self,
1526                                                    const BIO_POLL_DESCRIPTOR *d)
1527{
1528    return d->type == BIO_POLL_DESCRIPTOR_TYPE_SOCK_FD;
1529}
1530
1531/* etc. */
1532
1533SSL_POLL_METHOD *app_create_custom_poll_method(void)
1534{
1535    SSL_POLL_METHOD *self;
1536    APP_POLL_METHOD *data;
1537
1538    static const SSL_POLL_METHOD_FUNCS funcs = {
1539        app_poll_method_free,
1540        app_poll_method_up_ref,
1541        app_poll_method_get_caps,
1542        app_poll_method_supports_poll_descriptor,
1543        app_poll_method_poll,
1544        NULL /* not supported by app */
1545    };
1546
1547    self = SSL_POLL_METHOD_new(&funcs, sizeof(funcs), sizeof(APP_POLL_METHOD));
1548    if (self == NULL)
1549        return NULL;
1550
1551    data = SSL_POLL_METHOD_get0_data(self);
1552    data->refcount = 1;
1553    return data;
1554}
1555```
1556
1557We also provide a “default” method:
1558
1559```c
1560BIO_POLL_METHOD *SSL_get0_default_poll_method(const OSSL_PARAM *params);
1561```
1562
1563No params are currently defined; this is reserved for future use.
1564
1565`SSL_poll` is a shorthand for using the method provided by
1566`SSL_get0_default_poll_method(NULL)`.
1567
1568### Internal Polling: Usage within SSL Objects
1569
1570To support custom pollers for internal polling, SSL objects receive an API that
1571allows a custom poller to be configured. To avoid confusion, custom pollers can
1572only be configured on an event leader, but the getter function will return the
1573custom poller configured on an event leader when called on any QUIC SSL object
1574in the hierarchy, or NULL if none is configured.
1575
1576An `SSL_POLL_METHOD` can be associated with an SSL object. It can also be set
1577on a `SSL_CTX` object, in which case it is inherited by SSL objects created from
1578it:
1579
1580```c
1581int SSL_CTX_set1_poll_method(SSL_CTX *ctx, SSL_POLL_METHOD *method);
1582SSL_POLL_METHOD *SSL_CTX_get0_poll_method(const SSL_CTX *ctx);
1583
1584int SSL_set1_poll_method(SSL *ssl, SSL_POLL_METHOD *method);
1585SSL_POLL_METHOD *SSL_get0_poll_method(const SSL *ssl);
1586```
1587
1588An SSL object created from a `SSL_CTX` which has never had
1589`SSL_set1_poll_method` called on it directly inherits the value set on the
1590`SSL_CTX`, including if the poll method set on the `SSL_CTX` is changed after
1591the SSL object is created. Calling `SSL_set1_poll_method(..., NULL)` overrides
1592this behaviour.
1593
1594When a poll method is set on a QUIC domain, blocking API calls use that poller
1595to block as needed.
1596
1597Our QUIC implementation may, if it wishes, use the provided poll method to
1598construct a poll group, but is not guaranteed to do so. We reserve the right to
1599use the immediate mode or retained mode API of the poller as desired. If we use
1600the retained mode, we handle state updates and teardown as needed if the caller
1601later changes the configured poll method by calling `SSL_set1_poll_method`
1602again.
1603
1604If the poll method is set to NULL, we use the default poll method, which is the
1605same as the method provided by `SSL_get0_default_poll_method`.
1606
1607Because the poll method provided is used to handle blocking on network I/O, a
1608poll method provided in this context only needs to handle OS socket handles,
1609similar to our own reactor polling in QUIC MVP.
1610
1611### External Polling: Usage over SSL Objects
1612
1613An application can also use an `SSL_POLL_METHOD` itself, whether via the
1614immediate or retained mode. In the latter case it creates one or more
1615`SSL_POLL_GROUP` instances.
1616
1617Custom pollers are responsible for their own translation arrangements.
1618Retained-mode usage can be more efficient because it can allow recursive staging
1619of implementation-specific polling data. For example, suppose an application
1620enrolls a QCSO and two subsidiary QSSOs in a poll group. The reduction of these
1621three objects to a single pair of read/write BIO poll descriptors as provided by
1622an SSL object can be cached.
1623
1624### Future Adaptation to Internal Pollable Resources
1625
1626Suppose that in the future our QUIC implementation becomes more sophisticated
1627and we want to use a different kind of pollable resource to mask a more
1628elaborate internal reactor. For example, suppose for the sake of example we want
1629to switch to an internal thread-based reactor design, and signal readiness not
1630via an OS socket handle but via a condition variable or Linux-style `eventfd`.
1631
1632Our design would hold up under these conditions as follows:
1633
1634- For condition variables this would require a new poll descriptor type.
1635  Our default poller could be amended to support this new poll descriptor type.
1636  However, most OSes do not provide a way to simultaneously wait on a condition
1637  variable and other resources, so there are issues here unless an additional
1638  thread is used to adapt socket readiness to a condition variable.
1639
1640- For something like `eventfd` things will work well with the existing `SOCK_FD`
1641  type. A QUIC SSL object simply starts returning an eventfd fd for
1642  `BIO_get_rpoll_descriptor` and this becomes readable when signalled by our
1643  internal engine. `BIO_get_wpoll_descriptor` works in the same way. (Of course
1644  a change on this level would probably require some sort of application
1645  opt-in via our API.)
1646
1647- For something like Win32 Events, `WaitForSingleObject` or
1648  `WaitForMultipleObjects` works, but would require a new poll descriptor type.
1649  It is possible to plumb socket readiness into this API also, assuming Vista
1650  (WSAEventSelect).
1651
1652Worked Examples
1653---------------
1654
1655### Internal Polling — Default Poll Method
1656
1657- Application creates a new QCSO
1658- Application does not set a custom poll method on it
1659- Application uses it in blocking mode and sets network BIOs
1660- Our QUIC implementation requests poll descriptors from the network BIOs
1661- Our QUIC implementation asks the default poller if it understands
1662  how to poll those poll descriptors. If not, blocking cannot be supported.
1663- When it needs to block, our QUIC implementation uses the default poll method
1664  in either immediate or retained mode based on the poll descriptors reported by
1665  the network BIOs provided
1666
1667### Internal Polling — Custom Poll Method
1668
1669- Application instantiates a custom poll method
1670- Application creates a new QCSO
1671- Application sets the custom poll method on the QCSO
1672- Application configures the QCSO for blocking mode and sets network BIOs
1673- Our QUIC implementation requests poll descriptors from the network BIOs
1674- Our QUIC implementation asks the custom poll method if it understands how to
1675- poll those poll descriptors. If not, blocking cannot be supported.
1676- When it needs to block, our QUIC implementation uses the custom poll method
1677  in either immediate or retained mode based on the poll descriptors reported
1678  by the network BIOs provided (internal polling)
1679
1680### External Polling — Immediate Mode
1681
1682- Application gets a poll method (default or custom)
1683- Application invokes poll() on the poll method on some number of QLSOs, QCSOs, QSSOs
1684  and OS sockets, etc.
1685- The poll method performs translation to a set of OS resources.
1686- The poll method asks the OS to poll/block.
1687- The poll method examines the results reported from the OS and performs reverse
1688  translation.
1689- The poll method poll() call reports the results and returns.
1690
1691Note that custom poller methods configured on a SSL object are used for internal
1692polling (blocking API calls) only. Thus they have no effect on the above
1693scenario.
1694
1695### External Polling — Retained Mode
1696
1697- Application gets a poll method (default or custom)
1698- Application uses the poll method to create a poll group
1699- Application registers some number of QLSOs, QCSOs, QSSOs and OS sockets, etc.
1700  in the poll group.
1701- The poll group caches translations to a set of OS resources. It may create
1702  an OS device for fast polling (e.g. epoll) and register these resources
1703  with that method.
1704- Application polls using the poll group.
1705- The poll group asks the OS to poll/block.
1706- The poll group examines the results reported from the OS and performs reverse
1707  translation.
1708- The poll method reports the results and returns.
1709
1710### External Polling — Immediate Mode Without Event Handling
1711
1712- Application gets a poll method (default or custom)
1713- Application invokes poll() on the poll method on some number of QLSOs, QCSOs,
1714  and QSSOs with `NO_HANDLE_EVENTS` set.
1715- If the poll method is the default poll method, it knows how to examine
1716  QUIC SSL objects for readiness and does so.
1717- If the poll method is a custom poll method, it could choose to subdelegate
1718  this work to the default poll method, or implement it itself.
1719
1720Change Notification Callback Mechanism
1721--------------------------------------
1722
1723We propose to allow applications and libssl code to register callbacks for
1724lifecycle events on SSL objects, as discussed above. This can be used both by us
1725and by applications (e.g. to implement custom poller methods). The advantage
1726here is that an SSL object registered into a poll group can be automatically
1727unregistered from that poll group when it is freed.
1728
1729The proposed API is as follows:
1730
1731```c
1732/*
1733 * The SSL object is about to be freed (the refcount has reached zero).
1734 * The SSL object is still completely healthy until this call returns.
1735 * If the SSL object is reffed during a callback, the freeing is cancelled.
1736 * The callback then has full responsibility for its lifecycle.
1737 */
1738#define SSL_LIFECYCLE_EVENT_TYPE_PRE_FREE       1
1739
1740/*
1741 * Either the read or write network BIO on an SSL object has just been changed,
1742 * or both. The fields in data.bio_change specify the old and new BIO pointers.
1743 * If a BIO reference is being set to NULL on an SSL object, the 'new' pointer
1744 * will be NULL; conversely, if a BIO is being set on an SSL object where
1745 * previously no BIO was set, the 'old' pointer will be NULL. If the applicable
1746 * flag (R or W) is not set, the old and new fields will be set to NULL.
1747 */
1748#define SSL_LIFECYCLE_EVENT_TYPE_BIO_CHANGE     2
1749
1750#define SSL_LIFECYCLE_EVENT_FLAG_R              (1U << 0) /* read BIO changed */
1751#define SSL_LIFECYCLE_EVENT_FLAG_W              (1U << 1) /* write BIO changed */
1752
1753typedef struct ssl_lifecycle_event_st SSL_LIFECYCLE_EVENT;
1754typedef struct ssl_lifecycle_cb_cookie_st *SSL_LIFECYCLE_CB_COOKIE;
1755
1756/* Returns SSL_LIFECYCLE_EVENT_TYPE */
1757uint32_t SSL_LIFECYCLE_EVENT_get_type(const SSL_LIFECYCLE_EVENT *event);
1758
1759/* Returns SSL_LIFECYCLE_EVENT_FLAG */
1760uint32_t SSL_LIFECYCLE_EVENT_get_flags(const SSL_LIFECYCLE_EVENT *event);
1761
1762/* Returns an SSL object associated with the event (if applicable) */
1763SSL *SSL_LIFECYCLE_EVENT_get0_ssl(const SSL_LIFECYCLE_EVENT *event);
1764
1765/*
1766 * For a BIO_CHANGE event, fills the passed pointers if non-NULL with the
1767 * applicable values. For other event types, fails.
1768 */
1769int SSL_LIFECYCLE_EVENT_get0_bios(const SSL_LIFECYCLE_EVENT *event,
1770                                  BIO **r_old, BIO **r_new,
1771                                  BIO **w_old, BIO **w_new);
1772
1773/*
1774 * Register a lifecycle callback. Multiple lifecycle callbacks may be
1775 * registered. *cookie is written with an opaque value which may be used to
1776 * subsequently unregister the callback.
1777 */
1778int SSL_register_lifecycle_callback(SSL *ssl,
1779                                    void (*cb)(const SSL_LIFECYCLE_EVENT *event,
1780                                               void *arg),
1781                                    void *arg,
1782                                    SSL_LIFECYCLE_CB_COOKIE *cookie);
1783
1784int SSL_unregister_lifecycle_callback(SSL *ssl, SSL_LIFECYCLE_CB_COOKIE cookie);
1785```
1786
1787Q&A
1788---
1789
1790**Q. How do we support poll methods which only support immediate mode?**
1791
1792A. We simply have a fallback path for this when our QUIC implementation consumes
1793a custom poller. This is easy enough.
1794
1795**Q. How do we support poll methods which only support retained mode?**
1796
1797A. We intend to implement support for retained mode in our QUIC implementation's
1798internal blocking code, so this should also work OK. Remember that an external
1799poller method does not interact with an internal poller method (i.e., a method
1800set on an SSL object). In particular, no two poller methods ever interact
1801directly with one another. This avoids the need for recursive state shadowing
1802(where one poll method's retained mode API maintains state and also makes calls
1803to another poll method's retained mode API).
1804
1805**Q. How does this design interact with hierarchical polling?**
1806
1807A. We assume an application uses its own polling arrangements initially and then
1808uses calls to an OpenSSL external polling API (such as `SSL_poll` or a poll
1809method) to drill down into what is actually ready, as discussed above. There is
1810no issue here. An application can also use OpenSSL polling APIs instead of its
1811own, if desired; for example it could create a poll group from the default poll
1812method and use it to poll only network sockets, some of which may be from QUIC
1813SSL object poll descriptors, and then if needed call SSL_poll to narrow things
1814down once something becomes ready.
1815
1816**Q. Should we support immediate and retained mode in the same API or segregate
1817these?**
1818
1819A. They are in the same API, though we let applications use capability bits
1820to report support for only one of these if they wish.
1821
1822**Q. How do we support extensibility of the poller interface?**
1823
1824A. Using an extensible function table. An application can set a function
1825   pointer to NULL if it does not support it. Capability flags are used to
1826   advertise what is supported.
1827
1828**Q. If an application sets a poll method on both an event leader and a poll
1829   group, what happens?**
1830
1831A. Setting a poll method on an event leader provides a mechanism used for internal
1832blocking when making blocking calls. It is never used currently if no QUIC SSL
1833object in the QUIC domain isn't used in blocking mode (though this isn't a
1834contractual guarantee and we might do so in future for fast identification of
1835what we need to handle if we handle multiple OS-level sockets in future).
1836
1837Setting a poll method on a poll group provides a mechanism used for polling
1838using that event group. Note that a custom poll method configured on a SSL
1839object is **not** used for the translation process performed by a poll group,
1840even when polling that SSL object. Translation is driven by
1841`SSL_get_[rw]poll_descriptor`.
1842
1843**Q. What if different poll methods are configured on different event leaders
1844     (QUIC domains) and an application then tries to poll them all?**
1845
1846A. Because the poll method configured on an event leader is ignored in favour of
1847the poll method directly invoked, there is no conflict here. The poll method
1848handles all polling when it is specifically invoked.
1849
1850**Q. Where should the responsibility for poll descriptor translation lie?**
1851
1852A. With the poll method or poll group being called at the time.
1853
1854**Q. What method does `SSL_poll` use?**
1855
1856A. It uses the default poll method. If an application wishes to use a different
1857poll method, it can call the `poll` method directly on that `BIO_POLL_METHOD`.
1858
1859**Q. An application creates a poll group, registers an SSL object and later
1860changes the network BIOs set on that SSL object, or changes the poll descriptors
1861they return. What happens?**
1862
1863A. This is solved with two design aspects:
1864
1865- An application is not allowed to have the poll descriptors returned by a BIO
1866  change silently. If it wishes to change these, it must call `SSL_set_bio`
1867  again, even if with the same BIOs already set.
1868
1869- We will need to either:
1870
1871    - have a callback registration interface so retained mode pollers
1872      which have performed cached translation can be notified that a poll
1873      descriptor they have relied on is changing (proposed above).
1874
1875    - require retained mode pollers to check for changes to translated objects
1876      (less efficient).
1877
1878      This might cause issues with epoll because we don't have an opportunity
1879      to deregister an FD in this case.
1880
1881  We choose the first option.
1882
1883**Q. An application creates a poll group, registers a QCSO and some subsidiary
1884QSSOs and later frees all of these objects. What happens? (In other words, are
1885SSL objects auto-deregistered from poller groups?)**
1886
1887A. We must assume a poll group retains an SSL object pointer if such an object
1888has been registered with it. Thus our options are either:
1889
1890- require applications to deregister objects from any poll group they are using
1891  prior to freeing them; or
1892
1893- add internal callback registration machinery to QUIC SSL objects so we can
1894  get a cleanup notification (see the above callback mechanism).
1895
1896We choose the latter.
1897
1898**Q. An application creates a poll group, registers a (non-QUIC-related) OS
1899socket handle and then closes it. What happens?**
1900
1901Since OSes in general do not provide a way to get notified of these closures it
1902is not really possible to handle this automatically. It is essential that an
1903application deregisters the handle from the poll group first.
1904
1905**Q. How does code using a poll method determine what poll descriptors that
1906method supports?**
1907
1908A query method is provided which can be used to determine if the method supports
1909a given descriptor.
1910
1911Windows support
1912---------------
1913
1914Windows customarily poses a number of issues for supporting polling APIs. This
1915is largely because Windows chose an approach based around I/O *completion*
1916notification rather than around I/O *readiness* notification. While an implementation
1917of the Berkeley select(2)-style API is available, the options for higher
1918performance polling are largely confined to using I/O completion ports.
1919
1920Because the semantics of I/O readiness and I/O completion are very different, it
1921has proven impossible in practice to create an I/O readiness API as an
1922abstraction over Windows's I/O completion API. The converse is not true; it is
1923fairly easy to create an I/O completion notification API over an I/O readiness
1924API.
1925
1926It is therefore prudent to give some consideration to how Windows can be
1927supported:
1928
19291. We can always use `select` (or on Vista and later, `WSAPoll`).
1930   This may not actually be much of a problem as even in a server role, with QUIC
1931   we are likely to be handling a lot of clients on a relatively small number of
1932   OS sockets.
1933
19342. `WSAAsyncSelect` could be used with a helper thread. One thread could service
1935   multiple sockets, possibly even multiple poll groups.
1936
19373. `WSAEventSelect` allows a Win32 Event to be signalled on readiness,
1938   but this is not very useful because `WaitForMultipleObjects` is limited to 64
1939   objects (and even if it wasn't, poses the same issues as `select`, so back to
1940   where one started).
1941
19424. I/O Completion Ports are the “official” way to do high-performance I/O
1943   but notify on completion rather than readiness. It is impossible to build
1944   a poller API on top of this as such. As mentioned above, nobody has ever
1945   really managed to do so successfully.
1946
19475. `IOCTL_AFD_POLL`. This is an undocumented function of Winsock internals
1948   which allows a) epoll/kqueue-style interfaces to be built over Winsock, b)
1949   which are highly performant, like epoll/kqueue, and c) which use IOCPs to
1950   signal *readiness* rather than *completion*. In fact, this is what the
1951   `select` and `WSAPoll` functions use internally. Unlike those functions, this
1952   is based around registering sockets in advance and submits readiness
1953   notifications to an IOCP, so this can be quite performant.
1954
1955   `IOCTL_AFD_POLL` is an internal, undocumented API. It is however widely used,
1956   and is now the basis of libuv (the I/O library used by Node.js), ZeroMQ, and
1957   Rust's entire asynchronous I/O ecosystem on Windows. In other words, while
1958   officially being undocumented and internal, it has in practice become widely
1959   used by third-party software, to the point where it cannot really be changed
1960   in future without breaking massive amounts of software. `IOCTL_AFD_POLL` has
1961   been around since at least NT 4 and is supported by Wine. Moreover it is
1962   worth noting that the reason why so many projects have resorted to using this
1963   API on Windows is due to the sheer lack of anything providing the appropriate
1964   functionality in the public API. The high level of reliance on this
1965   functionality in contemporary software doing asynchronous I/O does give
1966   reasonable confidence in using this API.
1967
1968An immediate mode interface can be implemented using option 1.
1969
1970Based on the above, options 1, 2 and 5 are viable for implementation of a
1971retained mode interface, with option 2 being a fairly substantial hack and
1972option 5 being the preferred approach for projects wanting an epoll/kqueue-style
1973model on Windows. The suggested approach is therefore to implement option 5,
1974though option 1 is also a viable fallback.
1975
1976In any case, it appears the poller API as designed and proposed above
1977can be implemented adequately on Windows.
1978
1979Extra features on QUIC objects
1980------------------------------
1981
1982These are unlikely to be implemented initially — this is just some exploration
1983of features we might want to offer in future and how they would interact with
1984the polling design.
1985
1986### Low-watermark functionality
1987
1988Sometimes an application knows it does not need to do anything until at least N
1989bytes are available to read or write. In conventional Berkeley sockets APIs this
1990is known as “low-watermark” (LOWAT) functionality.
1991
1992Rather than making polling interfaces more convoluted by adding fields to
1993polling-related structures, we propose to add a knob which can be configured on
1994an individual QUIC stream:
1995
1996```c
1997#define SSL_LOWAT_FLAG_ONESHOT     (1U << 0)
1998
1999int SSL_set_read_lowat(SSL *ssl, size_t lowat, uint64_t flags);
2000int SSL_get_read_lowat(SSL *ssl, size_t *lowat);
2001
2002int SSL_set_write_lowat(SSL *ssl, size_t lowat, uint64_t flags);
2003int SSL_get_write_lowat(SSL *ssl, size_t *lowat);
2004```
2005
2006If `ONESHOT` is set, the low-watermark condition is automatically cleared
2007after the next call to a read or write function respectively. The low-watermark
2008condition can also be cleared by passing a low-watermark of 0.
2009
2010If low-watermark mode is configured, a poller will not report a stream as having
2011data ready to read, or room to write data, if the amount of room available is
2012less than the configured watermark.
2013
2014### Timeouts
2015
2016It is desirable to be able to cause blocking I/O operations to time out. For
2017example, an application might want to perform a blocking read from a peer but
2018only wait for a certain amount of time.
2019
2020We support this with a configurable timeout per each type of operation.
2021
2022```c
2023/* All operations - defined as separate bit for forward ABI compatibility */
2024#define SSL_OP_CLASS_ALL        (1U << 0)
2025/* The timeout concerns reads. */
2026#define SSL_OP_CLASS_R          (1U << 1)
2027/* The timeout concerns writes. */
2028#define SSL_OP_CLASS_W          (1U << 2)
2029/* The timeout concetns accepts. */
2030#define SSL_OP_CLASS_A          (1U << 3)
2031/* The timeout concerns new stream creation (which may be blocked on FC). */
2032#define SSL_OP_CLASS_N          (1U << 4)
2033/* The timeout concerns connects. */
2034#define SSL_OP_CLASS_C          (1U << 5)
2035
2036/*
2037 * If set, t is a deadline (absolute time), otherwise it is a duration which
2038 * starts whenever an operation is commenced.
2039 */
2040#define SSL_TIMEOUT_FLAG_DEADLINE    (1U << 0)
2041
2042/*
2043 * Configure a timeout for one or more operation types. At least one operation
2044 * type must be specified. If t is NULL, the timeout is unset for the given
2045 * operation. This may be called multiple times to set different timeouts
2046 * for different operations.
2047 */
2048int SSL_set_io_timeout(SSL *ssl, uint64_t operation,
2049                       const struct timeval *t, uint64_t flags);
2050
2051/*
2052 * Retrieves a configured timeout value. operation must be a single operation
2053 * flag from SSL_OP_CLASS. If a timeout is configured for the operation
2054 * type, *is_set is written as 1 and *t is written with the configured timeout.
2055 * *flags is written with SSL_OP_CLASS_DEADLINE or 0 as applicable.
2056 * Otherwise, *is_set is written as 0, the value of *t is undefined and *flags
2057 * is set to 0. Returns 1 on success (including if unset) and 0 on failure (for
2058 * example if called on an unsupported SSL object type).
2059 */
2060int SSL_get_io_timeout(SSL *ssl, uint64_t operation,
2061                       struct timeval *t, int *is_set,
2062                       uint64_t *flags);
2063
2064/*
2065 * Returns 1 if the last invocation of an applicable operation specified by
2066 * operation failed due to a timeout.
2067 *
2068 * For SSL_OP_CLASS_R, this means SSL_read or SSL_read_ex.
2069 * For SSL_OP_CLASS_W, this means SSL_write or SSL_write_ex.
2070 * For SSL_OP_CLASS_A, this means SSL_accept_stream.
2071 * For SSL_OP_CLASS_N, this means SSL_new_stream.
2072 * For SSL_OP_CLASS_C, this means SSL_do_handshake or any
2073 *   function which implicitly calls it, which includes any other I/O function
2074 *   if the connection process has not been completed yet.
2075 *
2076 * If a function is called in non-blocking mode and it cannot execute
2077 * immediately, this is considered to be a timeout. Therefore while timeouts are
2078 * not useful in non-blocking mode, this function can be used to determine if a
2079 * function failed because it would otherwise block.
2080 *
2081 * Invoking any operation of a given operation class clears the timeout flag
2082 * for that operation class regardless of the outcome of that operation.
2083 */
2084int SSL_timed_out(SSL *ssl, uint64_t operation);
2085```
2086
2087We could consider adding a new `SSL_get_error` code also (`SSL_ERROR_TIMEOUT`).
2088There are no compatibility issues here because it will only be returned if an
2089application chooses to use the timeout functionality.
2090
2091TODO: Check for duplicate existing APIs
2092
2093TODO: Consider using ctrls
2094
2095### Autotick control
2096
2097We automatically engage in event handling when an I/O function such as
2098`SSL_read`, `SSL_write`, `SSL_accept_stream` or `SSL_new_stream` is called.
2099This is likely to be undesirable for applications in many circumstances,
2100so we should have a way to inhibit this.
2101
2102```c
2103#define SSL_EVENT_FLAG_INHIBIT      (1U << 0)
2104#define SSL_EVENT_FLAG_INHIBIT_ONCE (1U << 1)
2105
2106/*
2107 * operation is one or more SSL_OP_CLASS values. Inhibition can be enabled for a
2108 * single future call to an operation of that type (INHIBIT_ONCE), after which
2109 * it is disabled, or enabled persistently (INHIBIT).
2110 */
2111int SSL_set_event_flags(SSL *ssl, uint64_t operation, uint64_t flags);
2112
2113/*
2114 * operation must specify a single operation. The flags configured are reported
2115 * in *flags.
2116 */
2117int SSL_get_event_flags(SSL *ssl, uint64_t operation, uint64_t *flags);
2118```
2119
2120Autotick inhibition is only useful in non-blocking mode and it is ignored in
2121blocking mode. Using it in non-blocking mode carries the following implications:
2122
2123- Data can be drained using `SSL_read` from existing buffers, but network I/O
2124  is not serviced and no new data will arrive (unless `SSL_handle_events` is
2125  called).
2126
2127- Data can be placed into available write buffer space using `SSL_write`,
2128  but data will not be transmitted (unless `SSL_handle_events` is called).
2129
2130- Likewise, no new incoming stream events will occur, and if calls to
2131  `SSL_new_stream` are currently blocked due to flow control, this
2132  situation will not change.
2133
2134- `SSL_do_handshake` will simply report whether the handshake is done or not.
2135