1QUIC Polling API Design 2======================= 3 4- [QUIC Polling API Design](#quic-polling-api-design) 5 * [Background](#background) 6 * [Requirements](#requirements) 7 * [Reflections on Past Mistakes in Poller Interface Design](#reflections-on-past-mistakes-in-poller-interface-design) 8 * [Example Use Cases](#example-use-cases) 9 + [Use Case A: Simple Blocking or Non-Blocking Application](#use-case-a--simple-blocking-or-non-blocking-application) 10 + [Use Case B: Application-Controlled Hierarchical Polling](#use-case-b--application-controlled-hierarchical-polling) 11 * [Use of Poll Descriptors](#use-of-poll-descriptors) 12 * [Event Types and Representation](#event-types-and-representation) 13 * [Designs](#designs) 14 + [Sketch A: One-Shot/Immediate Mode API](#sketch-a--one-shot-immediate-mode-api) 15 + [Sketch B: Registered/Retained Mode API](#sketch-b--registered-retained-mode-api) 16 - [Use Case Examples](#use-case-examples) 17 * [Proposal](#proposal) 18 * [Custom Poller Methods](#custom-poller-methods) 19 + [Translation](#translation) 20 + [Custom Poller Methods API](#custom-poller-methods-api) 21 + [Internal Polling: Usage within SSL Objects](#internal-polling--usage-within-ssl-objects) 22 + [External Polling: Usage over SSL Objects](#external-polling--usage-over-ssl-objects) 23 + [Future Adaptation to Internal Pollable Resources](#future-adaptation-to-internal-pollable-resources) 24 * [Worked Examples](#worked-examples) 25 + [Internal Polling — Default Poll Method](#internal-polling---default-poll-method) 26 + [Internal Polling — Custom Poll Method](#internal-polling---custom-poll-method) 27 + [External Polling — Immediate Mode](#external-polling---immediate-mode) 28 + [External Polling — Retained Mode](#external-polling---retained-mode) 29 + [External Polling — Immediate Mode Without Event Handling](#external-polling---immediate-mode-without-event-handling) 30 * [Change Notification Callback Mechanism](#change-notification-callback-mechanism) 31 * [Q&A](#q-a) 32 * [Windows support](#windows-support) 33 * [Extra features on QUIC objects](#extra-features-on-quic-objects) 34 + [Low-watermark functionality](#low-watermark-functionality) 35 + [Timeouts](#timeouts) 36 + [Autotick control](#autotick-control) 37 38Background 39---------- 40 41An application can create multiple QLSOs (see the [server API design 42document](quic-server-api.md)), each bound to a single read/write network BIO 43pair. Therefore an application needs to be able to poll: 44 45- a QLSO for new incoming connection events; 46- a QCSO for new incoming stream events; 47- a QCSO for new incoming datagram events (when we support the datagram 48 extension); 49- a QCSO for stream creatability events; 50- a QCSO for new connection error events; 51- a QSSO (or QCSO with a default stream attached) for readability events; 52- a QSSO (or QCSO with a default stream attached) for writeability events; 53- non-OpenSSL objects, such as OS socket handles. 54 55Observations: 56 57- There are a large number of event types an application might want to poll on. 58 59- There are different object types we might want to poll on. 60 61- These object types are currently all SSL objects, though we should not assume 62 that this will always be the case. 63 64- The nature of a polling interface means that it must be possible to 65 poll (i.e., block) on all desired objects in a single call. i.e., polling 66 cannot really be composed using multiple sequential calls. Thus, it must be 67 able for an application to request wakeup on the first of an arbitrary subset 68 of any of the above kinds of events in a single polling call. 69 70Requirements 71------------ 72 73- **Universal cross-pollability.** Ability to poll on any combination of the above 74 event types and pollable objects in a single poller call. 75 76- **Support external polling.** An application must be able to be in control 77 of its own polling if desired. This means no libssl code does any blocking I/O 78 or poll(2)-style calls; the application handles all poll(2)-like calls to the 79 OS. The application must thereafter be able to find out from us what QUIC 80 objects are ready to be serviced. 81 82- **Support internal polling.** Support a blocking poll(2)-like call provided 83 by libssl for applications that want us to arrange OS polling. 84 85- **Timeouts.** Support for optional timeouts. 86 87- **Multi-threading.** The API must have semantics suitable for performant 88 multi-threaded use, including for concurrent access to the same QUIC objects 89 where supported by our API contract. This includes in particular 90 avoidance of the thundering herd problem. 91 92Desirable: 93 94- Avoid needless impedance discontinuities with COTS polling interfaces (e.g. 95 select(2), poll(2)). 96 97- Efficient and performant design. 98 99- Future extensibility. 100 101Reflections on Past Mistakes in Poller Interface Design 102------------------------------------------------------- 103 104The deficiencies of select(2) are fairly evident and essentially attested to by 105its replacement with poll(2) in POSIX operating systems. To the extent that 106poll(2) has been replaced, it is largely due to the performance issues it poses 107when evaluating large numbers of file descriptors. However, this design 108is also unable to address the thundering herd problem, which we discuss 109subsequently. 110 111The replacements for poll(2) include Linux's epoll(2) and BSD's kqueue(2). 112 113The design of Linux's epoll(2) interface in particular has often been noted to 114contain a large number of design issues: 115 116- It is designed to poll only FDs; this is probably a partial cause behind 117 Linux's adaptation of everything into a FD (PIDs, signals, timers, eventfd, 118 etc.) 119 120- Events registered with epoll are associated with the underlying kernel 121 object (file description), rather than a file descriptor; therefore events can 122 still be received for a FD after the FD is closed(!) by a process, even 123 quoting an incorrect FD in the reported events, unless a process takes care to 124 unregister the FD prior to calling close(2). 125 126- There are separate `EPOLL_CTL_ADD` and `EPOLL_CTL_MOD` calls which are needed 127 to add a new FD registration and modify an existing FD registration, when 128 most of the time what is desired is an “upsert” (update or insert) call. Thus 129 callers have to track whether an FD has already been added or not. 130 131- Only one FD can be registered, modified, or unregistered per syscall, rather 132 than several FDs at once (syscall overhead). 133 134- The design is poorly equipped to handle multithreaded use due to the 135 thundering herd issue. If a single UDP datagram arrives and multiple threads 136 are polling for such an event, only one of these threads should be woken up. 137 138BSD's kqueue(2) has generally been regarded as a good, well thought out design, 139and avoids most or all of these issues. 140 141Example Use Cases 142----------------- 143 144Suppose there exists a hypothetical poll(2)-like API called `SSL_poll`. We 145explore various possible use cases below: 146 147### Use Case A: Simple Blocking or Non-Blocking Application 148 149An application has two QCSOs open each with one QSSO. The QCSOs and QSSOs might 150be in blocking or non-blocking mode. It wants to block until any of these have 151data ready to read (or a connection error) and wants to know which SSL object is 152ready and for what reason. It also wants to timeout after 1 second. 153 154```text 155SSL_poll([qcso0, qcso1, qsso0, qsso1], 156 [READ|ERR, READ|ERR, READ|ERR, READ|ERR], timeout=1sec) 157 → (OK, [qcso0], [READ]) 158 | Timeout 159``` 160 161### Use Case B: Application-Controlled Hierarchical Polling 162 163An application has two QCSOs open each with one QSSO, all in non-blocking mode. 164It wants to block until any of these have data ready to read (or a connection 165error) and wants to know which SSL object is ready and for what reason, but also 166wants to block until various other application-specific non-QUIC events occur. 167As such, it wants to handle its own polling. 168 169This usage pattern is supported via hierarchical polling: 170 171- An application collects file descriptors and event flags to poll from our QUIC 172 implementation, either by using `SSL_get_[rw]poll_descriptor` and 173 `SSL_net_(read|write)_desired` on each QCSO and deduplicating the results, or 174 using those calls on each QLSO. It also determines the QUIC event handling 175 timeout using `SSL_get_event_timeout`. 176 177- An application does its own polling and timeout handling. 178 179- An application calls `SSL_handle_events` if the polling process indicated 180 an event for either of the QUIC poll descriptors or the QUIC event handling 181 timeout has expired. The call need be made only on an Event Leader but can 182 be made on any QUIC SSL object in the hierarchy. 183 184- An application calls `SSL_poll` similarly to the above example, but with 185 timeout set to 0 (and possibly with some kind of `NO_HANDLE_EVENTS` flag). The 186 purpose of this call is **not** to block but to narrow down what QUIC objects 187 are now ready for servicing. 188 189This demonstrates the principle of hierarchical polling, whereby an application 190can do its own polling and then use a poller in a mode where it always returns 191immediately to narrow things down to specific QUIC objects. This is necessary as 192one QCSO may obviously service many QSSOs, etc. 193 194The requirement implied by this use case are: 195 196- An application must be able to use our polling interface without blocking and 197 without having `SSL_handle_events` or OS polling APIs be called, if desired. 198 199Use of Poll Descriptors 200----------------------- 201 202As discussed in the [I/O Architecture Design Document](../quic-io-arch.md), the 203notion of poll descriptors is used to provide an abstraction over arbitrary 204pollable resources. A `BIO_POLL_DESCRIPTOR` is a tagged union structure which 205can contain different kinds of handles. 206 207This concept maps directly to our capacity for application-level polling of the 208QUIC stack defined in this document, so it is used here. This creates a 209consistent interface around polling. 210 211To date, `BIO_POLL_DESCRIPTOR` structures have been used to contain an OS socket 212file descriptor (`int` for POSIX, `SOCKET` for Win32), which can be used with 213APIs such as `select(2)`. The tagged union structure is extended to support 214specifying a SSL object pointer: 215 216```c 217#define BIO_POLL_DESCRIPTOR_SSL 2 /* (SSL *) */ 218 219typedef struct bio_poll_descriptor_st { 220 uint32_t type; 221 union { 222 ... 223 SSL *ssl; 224 } value; 225} BIO_POLL_DESCRIPTOR; 226``` 227 228Event Types and Representation 229------------------------------ 230 231Regardless of the API design chosen, event types can first be defined. 232 233### Summary of Event Types 234 235We define the following event types: 236 237- **R (Readable):** There is application data available to be read. 238 239- **W (Writable):** It is currently possible to write more application data. 240 241- **ER (Exception on Read):** The receive part of a stream has been remotely 242 reset via a `RESET_STREAM` frame. 243 244- **EW (Exception on Write):** The send part of a stream has been remotely 245 reset via a `STOP_SENDING` frame. 246 247- **EC (Exception on Connection):** A connection has started terminating 248 (Terminating or Terminated states). 249 250- **EL (Exception on Listener):** A QUIC listener SSL object has failed, 251 for example due to a permanent error on an underlying network BIO. 252 253- **ECD (Exception on Connection Drained):** A connection has *finished* 254 terminating (Terminated state). 255 256- **IC (Incoming Connection):** There is at least one incoming connection 257 available to be popped using `SSL_accept_connection()`. 258 259- **ISB (Incoming Stream — Bidirectional):** There is at least one 260 bidirectional stream incoming and available to be popped using 261 `SSL_accept_stream()`. 262 263- **ISU (Incoming Stream — Unidirectional):** There is at least one 264 unidirectional stream incoming and available to be popped using 265 `SSL_accept_stream()`. 266 267- **OSB (Outgoing Stream — Bidirectional):** It is currently possible 268 to create at least one additional bidirectional stream. 269 270- **OSU (Outgoing Stream — Unidirectional):** It is currently possible 271 to create at least one additional unidirectional stream. 272 273- **F (Failure):** Identifies failure of the `SSL_poll()` mechanism itself. 274 275While this is a fairly large number of event types, there are valid use cases 276for all of these and reasons why they need to be separate from one another. The 277following dialogue explores the various design considerations. 278 279### General Principles 280 281From our discussion below we derive some general principles: 282 283- It is important to provide an adequate granularity of event types so as to 284 ensure an application can avoid wakeups it doesn't want. 285 286- Event types which are not given by a particular object are simply ignored 287 if requested by the application and never raised, similar to `poll(2)`. 288 289- While not all event masks may make sense (e.g. `R` but not `ER`), we do not 290 seek to prescribe combinations at this time. This is dissimilar to `poll(2)` 291 which makes some event types “mandatory”. We may evolve this in future. 292 293- Exception events on some successfully polled resource are not the same as the 294 failure of the `SSL_poll()` mechanism itself (`SSL_poll()` returning 0). 295 296### Header File Definitions 297 298```c 299#define SSL_POLL_EVENT_NONE 0 300 301/* 302 * Fundamental Definitions 303 * ----------------------- 304 */ 305 306/* F (Failure) */ 307#define SSL_POLL_EVENT_F (1U << 0) 308 309/* EL (Exception on Listener) */ 310#define SSL_POLL_EVENT_EL (1U << 1) 311 312/* EC (Exception on Connection) */ 313#define SSL_POLL_EVENT_EC (1U << 2) 314 315/* ECD (Exception on Connection Drained) */ 316#define SSL_POLL_EVENT_ECD (1U << 3) 317 318/* ER (Exception on Read) */ 319#define SSL_POLL_EVENT_ER (1U << 4) 320 321/* EW (Exception on Write) */ 322#define SSL_POLL_EVENT_EW (1U << 5) 323 324/* R (Readable) */ 325#define SSL_POLL_EVENT_R (1U << 6) 326 327/* W (Writable) */ 328#define SSL_POLL_EVENT_W (1U << 7) 329 330/* IC (Incoming Connection) */ 331#define SSL_POLL_EVENT_IC (1U << 8) 332 333/* ISB (Incoming Stream: Bidirectional) */ 334#define SSL_POLL_EVENT_ISB (1U << 9) 335 336/* ISU (Incoming Stream: Unidirectional) */ 337#define SSL_POLL_EVENT_ISU (1U << 10) 338 339/* OSB (Outgoing Stream: Bidirectional) */ 340#define SSL_POLL_EVENT_OSB (1U << 11) 341 342/* OSU (Outgoing Stream: Unidirectional) */ 343#define SSL_POLL_EVENT_OSU (1U << 12) 344 345/* 346 * Composite Definitions 347 * --------------------- 348 */ 349 350/* Read/write. */ 351#define SSL_POLL_EVENT_RW (SSL_POLL_EVENT_R | SSL_POLL_EVENT_W) 352 353/* Read/write and associated exception event types. */ 354#define SSL_POLL_EVENT_RE (SSL_POLL_EVENT_R | SSL_POLL_EVENT_ER) 355#define SSL_POLL_EVENT_WE (SSL_POLL_EVENT_W | SSL_POLL_EVENT_EW) 356#define SSL_POLL_EVENT_RWE (SSL_POLL_EVENT_RE | SSL_POLL_EVENT_WE) 357 358/* All exception event types. */ 359#define SSL_POLL_EVENT_E (SSL_POLL_EVENT_EL | SSL_POLL_EVENT_EC \ 360 | SSL_POLL_EVENT_ER | SSL_POLL_EVENT_EW) 361 362/* Streams and connections. */ 363#define SSL_POLL_EVENT_IS (SSL_POLL_EVENT_ISB | SSL_POLL_EVENT_ISU) 364#define SSL_POLL_EVENT_I (SSL_POLL_EVENT_IS | SSL_POLL_EVENT_IC) 365#define SSL_POLL_EVENT_OS (SSL_POLL_EVENT_OSB | SSL_POLL_EVENT_OSU) 366``` 367 368### Discussion 369 370#### `EL`: Exception on Listener 371 372**Q. When is this event type raised?** 373 374A. This event type is raised only on listener (port) failure, which occurs when 375an underlying network BIO encounters a permanent error. 376 377**Q. Does `EL` imply `EC` and `ECD` on all child connections?** 378 379A. Yes. A permanent network BIO failure causes immediate failure of all 380connections dependent on it without first going through `TERMINATING` (except 381possibly in the future with multipath for connections which aren't exclusively 382reliant on that port). 383 384**Q. What SSL object types can raise this event type?** 385 386A. The event type is raised on a QLSO only. This may be revisited in future 387(e.g. having it also be raised on child QCSOs.) 388 389**Q. Why does this event type need to be distinct from `EC`?** 390 391A. An application which is not immediately concerned by the failure of an 392individual connection likely still needs to be notified if an entire port fails. 393 394#### `EC`, `ECD`: Exception on Connection (/Drained) 395 396**Q. Should this event be reported when a connection begins shutdown, begins 397terminating, or finishes terminating?** 398 399A. 400 401- There is a use case to learn when we finish terminating because that is when 402 we can throw away our port safely (raised on `TERMINATED`); 403 404- there is a use case for learning as soon as we start terminating (raised on 405 `TERMINATING` or `TERMINATED`); 406 407- shutdown (i.e., waiting for streams to be done transmitting and then 408 terminating, as per `SSL_shutdown_ex()`) is always initiated by the local 409 application, thus there is no valid need for an application to poll on it. 410 411As such, separate event types must be available both for the start of the 412termination process and the conclusion of the termination process. `EC` 413corresponds to `TERMINATING` or `TERMINATED` and `ECD` corresponds to 414`TERMINATED` only. 415 416**Q. What happens in the event of idle timeout?** 417 418A. Idle timeout is an immediate transition to `TERMINATED` as per the channel 419code. 420 421**Q. Does `ECD` imply `EC`?** 422 423A. Yes, as `EC` is raised in both the `TERMINATING` and `TERMINATED` states. 424 425**Q. Can `ECD` occur without `EC` also occurring?** 426 427A. No, this is not possible. 428 429**Q. Does it make sense for an application to be able to mask this?** 430 431A. Possibly not, though there is nothing particularly requiring us to prevent 432this at this time. 433 434**Q. Does it make sense for an application to be able to listen for this but not 435`EL`?** 436 437A. Yes, since `EL` implies `EC`, it is valid for an application to handle 438port/listener failure purely in terms of the emergent consequence of all 439connections failing. 440 441#### `R`: Readable 442 443Application data or FIN is available for popping via `SSL_read`. Never raised 444after a stream FIN has been retired. 445 446**Q. Is this raised on `RESET_STREAM`?** 447 448A. No. Applications which wish to know of receive stream part failure should 449listen for `ER`. 450 451**Q. Should this be reported if the connection fails?** 452 453A. If there is still application data that can be read, yes. Otherwise, no. 454 455**Q. Should this be reported if shutdown has commenced?** 456 457A. Potentially — if there is still data to be read or more data arrives at the 458last minute. 459 460**Q. What happens if this event is enabled on a send-only stream?** 461 462A. The event is never raised. 463 464**Q. Can this event be received before a connection has been (fully) 465established?** 466 467A. Potentially on the server side in the future due to incoming 0-RTT data. 468 469#### `ER`: Error on Read 470 471Raised only when the receive part of a stream has been reset by the remote peer 472using a `RESET_STREAM` frame. 473 474**Q. Should this be reported if a stream has already been concluded normally and 475that FIN has been retired by the application by calling `SSL_read()`?** 476 477A. No. We consider FIN retirement a success condition for our purposes here, so 478normal stream conclusion and the retirement of that event does not cause ER. 479 480**Q. Should this be reported if the connection fails?** 481 482A. No, because that can be separately determined via the `EC` event and this 483provides greater clarity as to what event is occurring and why. Also, it is 484possible that a connection could fail and some application data is still 485buffered to be read by the application, so `EC` does not imply `!R`. 486 487**Q. Should this be reported if shutdown has been commenced?** 488 489A. No — so long as the connection is alive more data could still be received at 490the last minute. 491 492**Q. What happens if this event is enabled on a send-only stream?** 493 494A. The event is never raised. 495 496**Q. What happens if this event is enabled on a QCSO?** 497 498A. The event is applicable if the QCSO has a default stream attached. Otherwise, 499it is never raised. 500 501**Q. Why is this event separate from `R`?** 502 503A. If an application receives an `R` event, this means more application data is 504available to be read but this may be a business-as-usual circumstance which the 505application does not feel obliged to handle urgently; therefore, it might mask 506`R` in some circumstances. 507 508If a stream reset is triggered by a peer, this needs to be notifiable to an 509application immediately even if the application would not care about more 510ordinary application data arriving on a stream for now. 511 512Therefore, `ER` *must* be separate from `R`, otherwise such applications would 513be unable to prevent spurious wakeups due to normal application data when they 514only care about the possibility of a stream reset. 515 516**Q. Should applications be able to listen on `R` but not `ER`?** 517 518A. This would enable an application to listen for more application data but not 519care about stream resets. This can be permitted for now even if it raises some 520questions about the robustness of such applications. 521 522**Q. How will the future reliable stream resets extension be handled?** 523 524A. `R` will be raised until all data up to the reliable reset point has been 525retired by the application, then `ER` is raised and `R` is never again raised. 526 527**Q. What happens if a stream is reset after the FIN has been retired by the 528application?** 529 530A. The reset is ignored; as per RFC 9000 s. 3.2, the Data Read state is terminal 531and has no `RESET_STREAM` transition. Moreover, after an application is done 532with a stream it can free the QSSO, which means a post-FIN-retirement reset 533cannot be reliably received anyway. 534 535Note that this does not preclude handling of `RESET_STREAM` in the normal way 536for a stream which was concluded normally but where the application has *not* 537yet read all data, which is potentially useful. 538 539#### `W`: Writable 540 541Raised when send buffer space is available, so that it is possible to write 542application data via `SSL_write`. 543 544**Q. Is this raised on `STOP_SENDING`?** 545 546A. No. Applications which wish to know of remotely-triggered send stream part 547reset should listen for `EW`. 548 549**Q. Should this be reported if the connection fails?** 550 551A. No. 552 553**Q. Should this be reported if shutdown has commenced?** 554 555A. No. 556 557**Q. What happens if this event is enabled on a concluded send part?** 558 559A. The event is never raised after the stream is concluded. 560 561**Q. What happens if this event is enabled on a receive-only stream?** 562 563A. The event is never raised. 564 565**Q. What happens if this event is enabled on a QCSO?** 566 567A. The event is applicable if the QCSO has a default stream attached. Otherwise, 568it is never raised. 569 570**Q. Can this event be raised before a connection has been established?** 571 572A. Potentially in the future, if 0-RTT is in use and we have a cached 0-RTT 573session including flow control budgets which establish we have room to write 574more data for 0-RTT. 575 576#### `ER`: Error on Write 577 578Raised only when the send part of a stream has been reset by the remote peer via 579`STOP_SENDING`. 580 581**Q. Should this be raised if a stream's send part has been concluded 582normally?** 583 584A. No. We consider that a success condition for our purposes here. 585 586**Q. Should this be reported if the connection fails?** 587 588A. No, because that can be separately determined via the `EC` event and this 589provides greater clarity as to what event is occurring and why. 590 591**Q. What happens if this event is enabled on a receive-only stream?** 592 593A. The event is never raised. 594 595**Q. Should this be reported if the send part was reset locally via 596`SSL_reset_stream()`?** 597 598A. There is no need for this since the application knows what it did, though 599there is no particular harm in doing so. Current decision: do not report it. 600 601**Q. What if the send part was reset locally and then we also received a 602`STOP_SENDING` frame for it?** 603 604A. If the local application has reset a stream locally, it knows about this fact 605therefore there is no need to raise `EW`. The local reset takes precedence. 606 607**Q. Should this be reported if shutdown has commenced?** 608 609A. Probably not, since shutdown is under local application control and so if an 610application does this it already knows about it. Therefore there is no reason to 611poll for it. 612 613**Q. Why is this event separate from `W`?** 614 615A. It is useful for an application to be able to determine if something odd has 616happened on a stream (like it being reset remotely via `STOP_SENDING`) even if 617it does not currently want to write anything (and therefore is not listening for 618`W`). Since stream resets can occur asynchronously and have application 619protocol-defined semantics, it is important an application can be notified of 620them immediately. 621 622**Q. Should applications be able to listen on `W` but not `EW`?** 623 624A. This would enable an application to listen for the opportunity to write but 625not care about `STOP_SENDING` events. This is probably valid even if it raises 626some questions about the robustness of such applications. It can be allowed, 627even if not recommended (see the General Principles section below). 628 629**Q. How will the future reliable stream resets extension be handled?** 630 631A. The extension does not offer a `STOP_SENDING` equivalent so this is not a 632relevant concern. 633 634#### `ISB`, `ISU`: Incoming Stream Availability 635 636Indicates one or more incoming bidrectional or unidirectional streams which have 637yet to be popped via `SSL_accept_stream()`. 638 639**Q. Is this raised on `RESET_STREAM`?** 640 641A. It is raised on anything that would cause `SSL_accept_stream()` to return a 642stream. This could include a stream which was created by being reset. 643 644**Q. What happens if this event is enabled on a QSSO or QLSO?** 645 646A. The event is never raised. 647 648**Q. If a stream is in the accept queue and then the connection fails, should it 649still be reported?** 650 651A. Yes. The application may be able to accept the stream and pop any application 652data which was already received in future. It is the application's choice to 653listen for EC and have it take priority if it wishes. 654 655**Q. Can this event be raised before a connection has been established?** 656 657A. Client — no. Server — no initially, except possibly during 0-RTT when a 658connection is not considered fully established yet. 659 660#### `OSB`, `OSU`: Outgoing Stream Readiness 661 662Indicates we have the ability, based on current stream count flow control state, 663to initiate an outgoing bidirectional or unidirectional stream. 664 665**Q. Should this be reported if the connection fails?** 666 667A. No. 668 669**Q. Should this be reported if shutdown has commenced?** 670 671A. No. 672 673**Q. What happens if this event is enabled on a QLSO or QSSO?** 674 675A. The event is never raised. 676 677**Q. Can this event be raised before a connection has been established?** 678 679A. Potentially in future, on the client side only, if 0-RTT is in use and we 680have a cached 0-RTT session including flow control budgets which establish we 681have room to write more data for 0-RTT. 682 683#### `IC`: Incoming Connection 684 685Indicates at least one incoming connection is available to be popped using 686`SSL_accept_connection()`. 687 688**Q. Should this be reported if the port fails?** 689 690A. Potentially. A connection could have already been able to receive application 691data prior to it being popped from the accept queue by the application calling 692`SSL_accept_connection()`. Whether or not application data was received on any 693stream, a successfully established connection should be reported so that the 694application knows it happened. 695 696**Q. Can this event be raised before a connection has been established?** 697 698A. Potentially in future, if 0-RTT is in use; we could receive connection data 699before the connection process is complete (handshake confirmation). 700 701**Q. What happens if this event is enabled on a QCSO or QSSO?** 702 703A. The event is never raised. 704 705#### `F`: Failure 706 707Indicates that the `SSL_poll` mechanism itself has failed. This may be due to 708specifying an unsupported `BIO_POLL_DESCRIPTOR` type, or an unsupported `SSL` 709object, or so on. This indicates a caller usage error. It is wholly distinct 710from an exception condition on a successfully polled resource (e.g. `ER`, `EW`, 711`EC`, `EP`). 712 713**Q. Can this event type be masked?** 714 715A. No — this event type may always be raised even if not requested. Requesting 716it is a no-op (similar to `poll(2)` `POLLERR`). This is the only non-maskable 717event type. 718 719**Q. What happens if an `F` event is raised?** 720 721The `F` event is reported in one or more elements of the items array. The 722`result_count` output value reflects the number of items in the items array with 723non-zero `revents` fields, as always. This includes any `F` events (there may be 724multiple), and any non-`F` events which were output for earlier entries in the 725items array (where a `F` event occurs for a subsequent entry in the items 726array). 727 728`SSL_poll()` then returns 0. The ERR stack *always* has at least one entry 729placed on it, which reflects the first `F` event which was output. Any 730subsequent `F` events do not have error information available. 731 732Designs 733------- 734 735Two designs are considered here: 736 737- Sketch A: An “immediate-mode” poller interface similar to poll(2). 738 739- Sketch B: A “registered” poller interface similar to BSD's kqueue(2) (or Linux's 740 epoll(2)). 741 742Sketch A is simpler but is likely to be less performant. Sketch B is a bit more 743elaborate but can offer more performance. It is possible to offer both APIs if 744desired. 745 746### Sketch A: One-Shot/Immediate Mode API 747 748We define a common structure for representing polled events: 749 750```c 751typedef struct ssl_poll_item_st { 752 BIO_POLL_DESCRIPTOR desc; 753 uint64_t events, revents; 754} SSL_POLL_ITEM; 755``` 756 757This structure works similarly to the `struct pollfd` structure used by poll(2). 758`desc` describes the object to be polled, `events` is a bitmask of 759`SSL_POLL_EVENT` values describing what events to listen for, and `revents` is 760a bitmask of zero or more events which are actually raised. 761 762Polling implementations are only permitted to modify the `revents` field in a 763`SSL_POLL_ITEM` structure passed by the caller. 764 765```c 766/* 767 * SSL_poll 768 * -------- 769 * 770 * SSL_poll evaluates each of the items in the given array of SSL_POLL_ITEMs 771 * and determines which poll items have relevant readiness events raised. It is 772 * similar to POSIX poll(2). 773 * 774 * The events field of each item specifies the events the caller is interested 775 * in and is the sum of zero or more SSL_POLL_EVENT_* values. When using 776 * SSL_poll in a blocking fashion, only the occurrence of one or more events 777 * specified in the events field, or a timeout or failure of the polling 778 * mechanism, will cause SSL_poll to return. 779 * 780 * When SSL_poll returns, the revents field is set to the events actually active 781 * on an item. This may or may not also include events which were not requested 782 * in the events field. 783 * 784 * Specifying an item with an events field of zero is a no-op; the array entry 785 * is ignored. Unlike poll(2), error events are not automatically included 786 * and it is the application's responsibility to request them. 787 * 788 * Each item to be polled is described by a BIO_POLL_DESCRIPTOR. A 789 * BIO_POLL_DESCRIPTOR is an extensible tagged union structure which describes 790 * some kind of object which SSL_poll might (or might not) know how to poll. 791 * Currently, SSL_poll can poll the following kinds of BIO_POLL_DESCRIPTOR: 792 * 793 * BIO_POLL_DESCRIPTOR_TYPE_SOCK_FD (int fd) -- OS-pollable sockets only 794 * Note: Some OSes consider sockets to be a different kind of handle type 795 * to ordinary file handles. Therefore, this type is used 796 * specifically for OS socket handles only (e.g. SOCKET on Win32). 797 * It cannot be used to poll other OS handle types. 798 * 799 * BIO_POLL_DESCRIPTOR_TYPE_SSL (SSL *ssl) -- QUIC SSL objects only 800 * 801 * num_items is the number of items in the passed array. 802 * 803 * stride must be set to sizeof(SSL_POLL_ITEM). 804 * 805 * timeout specifies how long to wait for at least one passed SSL_POLL_ITEM to 806 * have at least one event to report. If it is set to NULL, this function does 807 * not time out and waits forever. Otherwise, it is a timeout value expressing a 808 * timeout duration in microseconds. The value expresses a duration, not a 809 * deadline. 810 * 811 * This function can be used in a non-blocking mode where it will provide 812 * information on readiness for each of the items and then return immediately, 813 * even if no item is ready. To facilitate this, pass a zero-value timeout 814 * structure. 815 * 816 * If num_items is set to zero, this function returns with a timeout condition 817 * after the specified timeout, or immediately with failure if no timeout 818 * was requested (as otherwise it would logically deadlock). 819 * 820 * flags must be zero or more SSL_POLL_FLAG values: 821 * 822 * - SSL_POLL_FLAG_NO_HANDLE_EVENTS: 823 * This may be used only when a zero timeout is specified (non-blocking 824 * mode). Ordinarily in this case, relevant SSL objects have internal 825 * event processing performed as this may help them to become ready. 826 * This may also cause network I/O to occur. If this flag is specified, 827 * no such processing will be performed. This means that SSL_poll 828 * will only report pre-existing readiness events for the specified objects. 829 * 830 * If timeout is NULL or non-zero, specifying this flag is an error. 831 * 832 * Regardless of whether this function succeeds, times out, or fails for other 833 * reasons, the revents field of each item is set to a valid value reflecting 834 * the current readiness, or to 0, and *result_count (if non-NULL) is written 835 * with the total number of items having an revents field, which, 836 * when masked with the corresponding events field, is nonzero at the time the 837 * function returns. Note that these entries in the items array may not be 838 * consecutive or at the start of the array. 839 * 840 * There is a distinction between exception conditions on a resource which is 841 * polled (such as a connection being terminated) and an failure in the polling 842 * code itself. A mere exception condition is not considered a failure of 843 * the polling mechanism itself and does not call SSL_poll to return 0. If 844 * the polling mechanism itself fails (for example, because an unsupported 845 * BIO_POLL_DESCRIPTOR type or SSL object type is passed), the F event type 846 * is raised on at least one poll item and the function returns 0. At least 847 * one ERR stack entry will be raised describing the cause of the first F event 848 * for the input items. Any additional F events do not have their error 849 * information reported. 850 * 851 * Returns 1 on success or timeout, and 0 on failure. Timeout conditions can 852 * be distinguished by the *result_count field being written as 0. 853 * 854 * This function does not modify any item's events or desc field. 855 * The initial value of an revents field when this function is called is of no 856 * consequence. 857 * 858 * This is a "one-shot" API; greater performance may be obtained from using 859 * an API which requires advanced registration of pollables. 860 */ 861#define SSL_POLL_FLAG_NO_HANDLE_EVENTS (1U << 0) 862 863int SSL_poll(SSL_POLL_ITEM *item, 864 size_t num_items, size_t stride, 865 const struct timeval *timeout, 866 uint64_t flags, 867 size_t *result_count); 868``` 869 870**Performance and thundering-herd issues.** There are two intrinsic performance 871issues with this design: 872 873- Because it does not involve advance registration of things being polled, 874 the entire object list needs to be scanned in each call, and there is 875 no real opportunity to maintain internal state which would make polling 876 more efficient. 877 878- Because this design is inherently “stateless”, it cannot really solve 879 the thundering herd problem in any reasonable way. In other words, if n 880 threads are all calling `SSL_poll` on the same set of objects and events, 881 there is no way for an event to be efficiently distributed to just one of 882 those threads. 883 884 This limitation is intrinsic to the design of `poll(2)` and poll-esque APIs. 885 It is not necessarily a reason not to offer this rather simple API, as use of 886 poll(2) and poll(2)-like APIs is widespread and users are likely to appreciate 887 an API which does not provide significant impedance discontinuities to 888 applications which use select/poll, even if those applications suffer impaired 889 performance as a result. 890 891### Sketch B: Registered/Retained Mode API 892 893Alternatively, an API which requires advance registration of pollable objects is 894proposed. 895 896Attention is called to certain design features: 897 898- This design can solve the thundering herd problem, achieving efficient 899 distribution of work to threads by auto-disabling an event mask bit after 900 distribution of the readiness event to one thread currently calling the poll 901 function. 902 903- The fundamental call, `SSL_POLL_GROUP_change_poll`, combines the operations 904 of adding/removing/changing registered events and actually polling. This is 905 important as due to the herd-avoidance design above, events can be and are 906 automatically disarmed and need rearming as frequently as the poll function is 907 called. This streamlined design therefore enhances efficiency. This design 908 aspect is inspired directly by kqueue. 909 910- Addition of registered events and mutation of existing events uses an 911 idempotent upsert-type operation, which is what most applications actually 912 want (unlike e.g. epoll). 913 914```c 915typedef struct ssl_poll_group_st SSL_POLL_GROUP; 916 917/* 918 * The means of obtaining an SSL_POLL_GROUP instance is discussed 919 * subsequently. For now, you can imagine the following strawman function: 920 * 921 * SSL_POLL_GROUP *SSL_POLL_GROUP_new(void); 922 * 923 */ 924 925void SSL_POLL_GROUP_free(SSL_POLL_GROUP *pg); 926 927#define SSL_POLL_EVENT_FLAG_NONE 0 928 929/* 930 * Registered event is deleted (not disabled) after one event fires. 931 */ 932#define SSL_POLL_EVENT_FLAG_ONESHOT (1U << 0) 933 934/* 935 * Work queue dispatch (anti-thundering herd) - dispatch to one concurrent call 936 * and set DISABLED. 937 */ 938#define SSL_POLL_EVENT_FLAG_DISPATCH (1U << 1) 939 940/* Registered event is disabled and will not return events. */ 941#define SSL_POLL_EVENT_FLAG_DISABLED (1U << 2) 942 943/* Delete a registered event. */ 944#define SSL_POLL_EVENT_FLAG_DELETE (1U << 3) 945 946/* Change previous cookie value. Cookie is normally only set on initial add. */ 947#define SSL_POLL_EVENT_FLAG_UPDATE_COOKIE (1U << 4) 948 949/* 950 * A structure to request registration, deregistration or modification of a 951 * registered event. 952 */ 953typedef struct ssl_poll_change_st { 954 /* The pollable object to be polled. */ 955 BIO_POLL_DESCRIPTOR desc; 956 size_t instance; 957 958 /* An opaque application value passed through in any reported event. */ 959 void *cookie; 960 961 /* 962 * Disables and enables event types. Any events in disable_mask are 963 * disabled, and then any events in enable_events are enabled. disable_events 964 * is processed before enable_events, therefore the enabled event types may 965 * be set (ignoring any previous value) by setting disable_events to 966 * UINT64_MAX and enable_events to the desired event types. Non-existent 967 * event types are ignored. 968 */ 969 uint64_t disable_events, enable_events; 970 971 /* 972 * Enables and disables registered event flags in the same vein as 973 * disable_events and enable_events manages registered event types. 974 * This is used to disable and enable SSL_POLL_EVENT_FLAG bits. 975 */ 976 uint64_t disable_flags, enable_flags; 977} SSL_POLL_CHANGE; 978 979typedef struct ssl_poll_event_st { 980 BIO_POLL_DESCRIPTOR desc; 981 size_t instance; 982 void *cookie; 983 uint64_t revents; 984} SSL_POLL_EVENT; 985 986/* 987 * SSL_POLL_GROUP_change_poll 988 * -------------------------- 989 * 990 * This function performs the following actions: 991 * 992 * - firstly, if num_changes is non-zero, it updates registered events on the 993 * specified poll group, adding, removing and modifying registered events as 994 * specified by the changes in the array given in changes; 995 * 996 * - secondly, if num_events is non-zero, it polls for any events that have 997 * arisen that match the registered events, and places up to num_events such 998 * events in the array given in events. 999 * 1000 * This function may be used for either of these effects, or both at the same 1001 * time. Changes to event registrations are applied before events are returned. 1002 * 1003 * If num_changes is non-zero, change_stride must be set to 1004 * sizeof(SSL_POLL_CHANGE). 1005 * 1006 * If num_events is non-zero, event_stride must be set to 1007 * sizeof(SSL_POLL_EVENT). 1008 * 1009 * If timeout is NULL, this function blocks forever until an applicable event 1010 * occurs. If it points to a zero value, this function never blocks and will 1011 * apply given changes, return any applicable events, if any, and then return 1012 * immediately. Note that any requested changes are always applied regardless of 1013 * timeout outcome. 1014 * 1015 * flags must be zero or more SSL_POLL_FLAGS. If SSL_POLL_FLAG_NO_HANDLE_EVENTS 1016 * is set, polled objects do not automatically have I/O performed which might 1017 * enable them to raise applicable events. If SSL_POLL_FLAG_NO_POLL is set, 1018 * changes are processed but no polling is performed. This is useful if it is 1019 * desired to provide an event array to allow errors when processing changes 1020 * to be received. Passing SSL_POLL_FLAG_NO_POLL forces a timeout of 0 1021 * (non-blocking mode); the timeout argument is ignored. 1022 * 1023 * The number of events written to events is written to *num_events_out, 1024 * regardless of whether this function succeeds or fails. 1025 * 1026 * Returns 1 on success or 0 on failure. A timeout is considered a success case 1027 * which returns 0 events; thus in this case, the function returns 1 and 1028 * *num_events_out is written as 0. 1029 * 1030 * This function differs from poll-style interfaces in that the events reported 1031 * in the events array bear no positional relationship to the registration 1032 * changes indicated in changes. Thus the length of these arrays is unrelated. 1033 * 1034 * An error may occur when processing a change. If this occurs, an entry 1035 * describing the error is written out as an event to the event array. The 1036 * function still returns success, unless there is no room in the events array 1037 * for the error (for example, if num_events is 0), in which case failure is 1038 * returned. 1039 * 1040 * When an event is output from this function, desc is set to the original 1041 * registered poll descriptor, cookie is set to the cookie value which was 1042 * passed in when registering the event, and revents is set to any applicable 1043 * events, which might be a superset of the events which were actually asked 1044 * for. (However, only events actually asked for at registration time will 1045 * cause a blocking call to SSL_POLL_GROUP_change_poll to return.) 1046 * 1047 * An event structure which represents a change processing error will have the 1048 * psuedo-event SSL_POLL_EVENT_POLL_ERROR set, with copies of the desc and 1049 * cookie provided. This is not a real event and cannot be requested in a 1050 * change. 1051 * 1052 * The 'primary key' for any registered event is the tuple (poll descriptor, 1053 * instance). Changing an existing event is done by passing a change structure 1054 * with the same values for the poll descriptor and instance. The instance field 1055 * can be used to register multiple separate registered events on the same 1056 * poll descriptor. Many applications will be able to use a instance field of 1057 * 0 in all circumstances. 1058 * 1059 * To unregister an event, pass a matching poll descriptor and instance value 1060 * and set DELETE in enable_flags. 1061 * 1062 * It is recommended that callers delete a registered event from a poll group 1063 * before freeing the underlying resource. If an object which is registered 1064 * inside a poll group is freed, the semantics depend on the type of the poll 1065 * descriptor used. For example, libssl has no safe way to detect if an OS 1066 * socket poll descriptor is closed, therefore it is essential callers 1067 * deregister such registered events prior to closing the socket handle. 1068 * 1069 * Other poll descriptor types may implement automatic deregistration from poll 1070 * groups which they are registered into when they are freed. This varies by 1071 * poll descriptor type. However, even if a poll descriptor type does implement 1072 * this, applications must still ensure no events in an SSL_POLL_EVENT 1073 * structure recorded from a previous call to this function are left over, which 1074 * may still reference that poll descriptor. Therefore, applications must still 1075 * excercise caution when freeing resources which are registered, or which were 1076 * previously registered in a poll group. 1077 */ 1078#define SSL_POLL_FLAG_NO_HANDLE_EVENTS (1U << 0) 1079#define SSL_POLL_FLAG_NO_POLL (1U << 1) 1080 1081#define SSL_POLL_EVENT_POLL_ERROR (((uint64_t)1) << 63) 1082 1083int SSL_POLL_GROUP_change_poll(SSL_POLL_GROUP *pg, 1084 1085 const SSL_POLL_CHANGE *changes, 1086 size_t num_changes, 1087 size_t change_stride, 1088 1089 SSL_POLL_EVENT *events, 1090 size_t num_events, 1091 size_t event_stride, 1092 1093 const struct timeval *timeout, 1094 uint64_t flags, 1095 size_t *num_events_out); 1096 1097/* These macros may be used if only one function is desired. */ 1098#define SSL_POLL_GROUP_change(pg, changes, num_changes, flags) \ 1099 SSL_POLL_GROUP_change_poll((pg), (changes), (num_changes), \ 1100 sizeof(SSL_POLL_CHANGE), \ 1101 NULL, 0, 0, NULL, (flags), NULL) 1102 1103#define SSL_POLL_GROUP_poll(pg, items, num_items, timeout, flags, result_c) \ 1104 SSL_POLL_GROUP_change_poll((pg), NULL, 0, 0, \ 1105 (items), (num_items), sizeof(SSL_POLL_ITEM), \ 1106 (timeout), (flags), (result_c)) 1107 1108/* Convenience inlines. */ 1109static ossl_inline ossl_unused void SSL_POLL_CHANGE_set(SSL_POLL_CHANGE *chg, 1110 BIO_POLL_DESCRIPTOR desc, 1111 size_t instance, 1112 void *cookie, 1113 uint64_t events, 1114 uint64_t flags) 1115{ 1116 chg->desc = desc; 1117 chg->instance = instance; 1118 chg->cookie = cookie; 1119 chg->disable_events = UINT64_MAX; 1120 chg->enable_events = events; 1121 chg->disable_flags = UINT64_MAX; 1122 chg->enable_flags = flags; 1123} 1124 1125static ossl_inline ossl_unused void SSL_POLL_CHANGE_delete(SSL_POLL_CHANGE *chg, 1126 BIO_POLL_DESCRIPTOR desc, 1127 size_t instance) 1128{ 1129 chg->desc = desc; 1130 chg->instance = instance; 1131 chg->cookie.ptr = NULL; 1132 chg->disable_events = 0; 1133 chg->enable_events = 0; 1134 chg->disable_flags = 0; 1135 chg->enable_flags = SSL_POLL_EVENT_FLAG_DELETE; 1136} 1137 1138static ossl_inline ossl_unused void 1139SSL_POLL_CHANGE_chevent(SSL_POLL_CHANGE *chg, 1140 BIO_POLL_DESCRIPTOR desc, 1141 size_t instance, 1142 uint64_t disable_events, 1143 uint64_t enable_events) 1144{ 1145 chg->desc = desc; 1146 chg->instance = instance; 1147 chg->cookie.ptr = NULL; 1148 chg->disable_events = disable_events; 1149 chg->enable_events = enable_events; 1150 chg->disable_flags = 0; 1151 chg->enable_flags = 0; 1152} 1153 1154static ossl_inline ossl_unused void 1155SSL_POLL_CHANGE_chflag(SSL_POLL_CHANGE *chg, 1156 BIO_POLL_DESCRIPTOR desc, 1157 size_t instance, 1158 uint64_t disable_flags, 1159 uint64_t enable_flags) 1160{ 1161 chg->desc = desc; 1162 chg->instance = instance; 1163 chg->cookie.ptr = NULL; 1164 chg->disable_events = 0; 1165 chg->enable_events = 0; 1166 chg->disable_flags = disable_flags; 1167 chg->enable_flags = enable_flags; 1168} 1169 1170static ossl_inline ossl_unused BIO_POLL_DESCRIPTOR 1171SSL_as_poll_descriptor(SSL *s) 1172{ 1173 BIO_POLL_DESCRIPTOR d; 1174 1175 d.type = BIO_POLL_DESCRIPTOR_TYPE_SSL; 1176 d.value.ssl = s; 1177 1178 return d; 1179} 1180``` 1181 1182#### Use Case Examples 1183 1184```c 1185/* 1186 * Scenario 1: Register multiple events on different QUIC objects and 1187 * immediately start blocking for events. 1188 */ 1189{ 1190 int rc; 1191 1192 SSL *qconn1 = get_some_quic_conn(); 1193 SSL *qconn2 = get_some_quic_conn(); 1194 SSL *qstream1 = get_some_quic_stream(); 1195 SSL *qlisten1 = get_some_quic_listener(); 1196 int socket = get_some_socket_handle(); 1197 1198 SSL_POLL_GROUP *pg = SSL_POLL_GROUP_new(); 1199 SSL_POLL_CHANGE changes[32], *chg = changes; 1200 SSL_POLL_EVENT events[32]; 1201 void *cookie = some_app_ptr; 1202 size_t i, nchanges = 0, nevents = 0; 1203 1204 /* Wait for an incoming stream or conn error on conn 1 and 2. */ 1205 SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qconn1), 0, cookie, 1206 SSL_POLL_EVENT_IS | SSL_POLL_EVENT_E, 0); 1207 ++nchanges; 1208 1209 SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qconn2), 0, cookie, 1210 SSL_POLL_EVENT_IS | SSL_POLL_EVENT_E, 0); 1211 ++nchanges; 1212 1213 /* Wait for incoming data (or reset) on stream 1. */ 1214 SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qstream1), 0, cookie, 1215 SSL_POLL_EVENT_R, 0); 1216 ++nchanges; 1217 1218 /* Wait for an incoming connection. */ 1219 SSL_POLL_CHANGE_set(chg++, SSL_as_poll_descriptor(qlisten1), 0, cookie, 1220 SSL_POLL_EVENT_IC, 0); 1221 ++nchanges; 1222 1223 /* Also poll on an ordinary OS socket. */ 1224 SSL_POLL_CHANGE_set(chg++, OS_socket_as_poll_descriptor(socket), 0, cookie, 1225 SSL_POLL_EVENT_RW, 0); 1226 ++nchanges; 1227 1228 /* Immediately register all of these events and wait for an event. */ 1229 rc = SSL_POLL_GROUP_change_poll(pg, 1230 changes, nchanges, sizeof(changes[0]), 1231 events, OSSL_NELEM(events), sizeof(events[0]), 1232 NULL, 0, &nevents); 1233 if (!rc) 1234 return 0; 1235 1236 for (i = 0; i < nevents; ++i) { 1237 if ((events[i].revents & SSL_POLL_EVENT_POLL_ERROR) != 0) 1238 return 0; 1239 1240 process_event(&events[i]); 1241 } 1242 1243 return 1; 1244} 1245 1246void process_event(const SSL_POLL_EVENT *event) 1247{ 1248 APP_INFO *app = event->cookie.ptr; 1249 1250 do_something(app, event->revents); 1251} 1252 1253/* 1254 * Scenario 2: Test for pre-existing registered events in non-blocking mode 1255 * as part of a hierarchical polling strategy. 1256 */ 1257{ 1258 int rc; 1259 1260 SSL_POLL_EVENT events[32], 1261 size_t i, nevents = 0; 1262 struct timeval timeout = { 0 }; 1263 1264 /* 1265 * Find out what is ready without blocking. 1266 * Assume application already did I/O event handling and do not tick again. 1267 */ 1268 rc = SSL_POLL_GROUP_poll(pg, events, OSSL_NELEM(events), 1269 &timeout, SSL_POLL_FLAG_NO_HANDLE_EVENTS, 1270 &nevents); 1271 if (!rc) 1272 return 0; 1273 1274 for (i = 0; i < nevents; ++i) 1275 process_event(&events[i]); 1276} 1277 1278/* 1279 * Scenario 3: Remove one event but don't poll. 1280 */ 1281{ 1282 int rc; 1283 SSL_POLL_CHANGE changes[1], *chg = changes; 1284 size_t nchanges = 0; 1285 1286 SSL_POLL_CHANGE_delete(chg++, SSL_as_poll_descriptor(qstream1), 0); 1287 ++nchanges; 1288 1289 if (!SSL_POLL_GROUP_change(pg, changes, nchanges, 0)) 1290 return 0; 1291 1292 return 1; 1293} 1294 1295/* 1296 * Scenario 4: Efficient (non-thundering-herd) multi-thread dispatch with 1297 * efficient rearm. 1298 * 1299 * Assume all registered events have SSL_POLL_EVENT_FLAG_DISPATCH set on them. 1300 * 1301 * Assume this function is being called concurrently from a large number of 1302 * threads. 1303 */ 1304{ 1305 int rc; 1306 SSL_POLL_CHANGE changes[32], *chg; 1307 SSL_POLL_EVENT events[32]; 1308 size_t i, nchanges, nevents = 0; 1309 1310 /* 1311 * This will block, and then the first event to occur will be returned on 1312 * *one* thread, and the event will be disabled. Other threads will keep 1313 * waiting. 1314 */ 1315 if (!SSL_POLL_GROUP_poll(pg, events, OSSL_NELEM(events), 1316 NULL, 0, &nevents)) 1317 return 0; 1318 1319 /* Application event loop */ 1320 while (!app_should_stop()) { 1321 chg = changes; 1322 nchanges = 0; 1323 1324 for (i = 0; i < nevents; ++i) { 1325 process_event(&events[i]); /* do something in application */ 1326 1327 /* We have processed the event so now reenable it. */ 1328 SSL_POLL_CHANGE_chflag(chg++, events[i].desc, events[i].instance, 1329 SSL_POLL_EVENT_FLAG_DISABLE, 0); 1330 ++nchanges; 1331 } 1332 1333 /* Reenable any event we processed and go to sleep again. */ 1334 if (!SSL_POLL_GROUP_change_poll(pg, changes, nchanges, sizeof(changes[0]), 1335 events, OSSL_NELEM(events), sizeof(events[0]), 1336 NULL, 0, &nevents)) 1337 return 0; 1338 } 1339 1340 return 1; 1341} 1342``` 1343 1344Proposal 1345-------- 1346 1347It is proposed to offer both of these API sketches. The simple `SSL_poll` API is 1348compelling for simple use cases, and both APIs have merits and cases where they 1349will be highly desirable. The ability of the registered API to support 1350thundering herd mitigation is of particular importance. 1351 1352Custom Poller Methods 1353--------------------- 1354 1355It is also desirable to support custom poller methods provided by an 1356application. This allows an application to support custom poll descriptor types 1357and provide a way to poll on those poll descriptors. For example, an application 1358could provide a BIO_dgram_pair (which ordinarily cannot support polling and 1359cannot be used with the blocking API) and a custom poller which can poll some 1360opaque poll descriptor handle provided by the application (which might be e.g. 1361based on condition variables or so on). 1362 1363We therefore now discuss modifications to the above APIs to support custom 1364poller methods. 1365 1366### Translation 1367 1368When a poller polls a QUIC SSL object, it must figure out how to block on this 1369object. This means it must ultimately make some blocking poll(2)-like call to 1370the OS. Since an OS only knows how to block on resources it issues, this means 1371that all resources such as QUIC SSL objects must be reduced into OS resources 1372before polling can occur. 1373 1374This process occurs via translation. Suppose `SSL_poll` is called with a QCSO, 1375two QSSOs on that QCSO, and an OS socket handle: 1376 1377 - `SSL_poll` will convert the poll descriptors pointing to SSL objects 1378 to network-side poll descriptors by calling `SSL_get_[rw]poll_descriptor`, 1379 which calls through to `BIO_get_[rw]poll_descriptor`; 1380 1381 - The yielded poll descriptors are then reduced to a set of unique poll 1382 descriptors (for example, both QSSOs will have the same underlying 1383 poll descriptor, so duplicates are removed); 1384 1385 - The OS socket handle poll descriptor which was passed in is simply 1386 passed through as-is; 1387 1388 - The resulting set of poll descriptors is then passed on to an underlying 1389 poller implementation, which might be based on e.g. poll(2). But it might 1390 also be a custom method provided by an application if one of the SSL objects 1391 resolved to a custom poll descriptor type. 1392 1393 - When the underlying poll call returns, reverse translation occurs. 1394 Poll descriptors which have become ready in some aspect and which were 1395 translated are mapped back to the input SSL objects which they were derived 1396 from (since duplicates are removed, this may be multiple SSL objects per 1397 poll descriptor). This set of SSL objects is reduced to a unique set of 1398 event leaders and those event leaders are ticked. The QUIC SSL objects are 1399 then probed for their current state to determine current readiness and this 1400 information is returned. 1401 1402The above scheme also means that the retained-mode polling API can be more 1403efficient since translation information can be retained internally rather than 1404being re-derived every time. 1405 1406### Custom Poller Methods API 1407 1408There are two kinds of polling that occur: 1409 1410- Internal polling for blocking API: This is where an SSL object automatically 1411 polls internally to support blocking API operation. If an underlying network 1412 BIO cannot support a poll descriptor which we understand how to poll on, we 1413 cannot support blocking API operation. We can support a poll descriptor if it 1414 is an OS socket handle, or if a custom poller is configured that knows how to 1415 poll it. 1416 1417- External polling support: This is where an application calls a polling API. 1418 1419Firstly, the `SSL_POLL_METHOD` object is defined abstractly as follows: 1420 1421```c 1422/* API (Psuedocode) */ 1423#define SSL_POLL_METHOD_CAP_IMMEDIATE (1U << 0) /* supports immediate mode */ 1424#define SSL_POLL_METHOD_CAP_RETAINED (1U << 1) /* supports retained mode */ 1425 1426interface SSL_POLL_METHOD { 1427 int free(void); 1428 int up_ref(void); 1429 1430 uint64_t get_caps(void); 1431 int supports_poll_descriptor(const BIO_POLL_DESCRIPTOR *desc); 1432 int poll(/* as shown for SSL_poll */); 1433 SSL_POLL_GROUP *create_poll_group(const OSSL_PARAM *params); 1434} 1435 1436interface SSL_POLL_GROUP { 1437 int free(void); 1438 int up_ref(void); 1439 1440 int change_poll(/* as shown for SSL_POLL_GROUP_change_poll */); 1441} 1442``` 1443 1444This interface is realised as follows: 1445 1446```c 1447typedef struct ssl_poll_method_st SSL_POLL_METHOD; 1448typedef struct ssl_poll_group_st SSL_POLL_GROUP; 1449 1450typedef struct ssl_poll_method_funcs_st { 1451 int (*free)(SSL_POLL_METHOD *self); 1452 int (*up_ref)(SSL_POLL_METHOD *self); 1453 1454 uint64_t (*get_caps)(const SSL_POLL_GROUP *self); 1455 int (*poll)(SSL_POLL_METHOD *self, /* as shown for SSL_poll */); 1456 SSL_POLL_GROUP *(*create_poll_group)(SSL_POLL_METHOD *self, 1457 const OSSL_PARAM *params); 1458} SSL_POLL_METHOD_FUNCS; 1459 1460SSL_POLL_METHOD *SSL_POLL_METHOD_new(const SSL_POLL_METHOD_FUNCS *funcs, 1461 size_t funcs_len, size_t data_len); 1462 1463void *SSL_POLL_METHOD_get0_data(const SSL_POLL_METHOD *self); 1464 1465int SSL_POLL_METHOD_free(SSL_POLL_METHOD *self); 1466void SSL_POLL_METHOD_do_free(SSL_POLL_METHOD *self); 1467int SSL_POLL_METHOD_up_ref(SSL_POLL_METHOD *self); 1468 1469uint64_t SSL_POLL_METHOD_get_caps(const SSL_POLL_METHOD *self); 1470int SSL_POLL_METHOD_supports_poll_descriptor(SSL_POLL_METHOD *self, 1471 const BIO_POLL_DESCRIPTOR *desc); 1472int SSL_POLL_METHOD_poll(SSL_POLL_METHOD *self, ...); 1473SSL_POLL_GROUP *SSL_POLL_METHOD_create_poll_group(SSL_POLL_METHOD *self, 1474 const OSSL_PARAM *params); 1475 1476typedef struct ssl_poll_group_funcs_st { 1477 int (*free)(SSL_POLL_GROUP *self); 1478 int (*up_ref)(SSL_POLL_GROUP *self); 1479 1480 int (*change_poll)(SSL_POLL_GROUP *self, /* as shown for change_poll */); 1481} SSL_POLL_GROUP_FUNCS; 1482 1483SSL_POLL_GROUP *SSL_POLL_GROUP_new(const SSL_POLL_GROUP_FUNCS *funcs, 1484 size_t funcs_len, size_t data_len); 1485void *SSL_POLL_GROUP_get0_data(const SSL_POLL_GROUP *self); 1486 1487int SSL_POLL_GROUP_free(SSL_POLL_GROUP *self); 1488int SSL_POLL_GROUP_up_ref(SSL_POLL_GROUP *self); 1489int SSL_POLL_GROUP_change_poll(SSL_POLL_GROUP *self, 1490 /* as shown for change_poll */); 1491``` 1492 1493Here is how an application might define and create a `SSL_POLL_METHOD` instance 1494of its own: 1495 1496```c 1497struct app_poll_method_st { 1498 uint32_t refcount; 1499} APP_POLL_METHOD; 1500 1501static int app_poll_method_free(SSL_POLL_METHOD *self) 1502{ 1503 APP_POLL_METHOD *data = SSL_POLL_METHOD_get0_data(self); 1504 1505 if (!--data->refcount) 1506 SSL_POLL_METHOD_do_free(self); 1507 1508 return 1; 1509} 1510 1511static int app_poll_method_up_ref(SSL_POLL_METHOD *self) 1512{ 1513 APP_POLL_METHOD *data = SSL_POLL_METHOD_get0_data(self); 1514 1515 ++data->refcount; 1516 1517 return 1; 1518} 1519 1520static uint64_t app_poll_method_get_caps(const SSL_POLL_METHOD *self) 1521{ 1522 return SSL_POLL_METHOD_CAP_IMMEDIATE; 1523} 1524 1525static int app_poll_method_supports_poll_descriptor(SSL_POLL_METHOD *self, 1526 const BIO_POLL_DESCRIPTOR *d) 1527{ 1528 return d->type == BIO_POLL_DESCRIPTOR_TYPE_SOCK_FD; 1529} 1530 1531/* etc. */ 1532 1533SSL_POLL_METHOD *app_create_custom_poll_method(void) 1534{ 1535 SSL_POLL_METHOD *self; 1536 APP_POLL_METHOD *data; 1537 1538 static const SSL_POLL_METHOD_FUNCS funcs = { 1539 app_poll_method_free, 1540 app_poll_method_up_ref, 1541 app_poll_method_get_caps, 1542 app_poll_method_supports_poll_descriptor, 1543 app_poll_method_poll, 1544 NULL /* not supported by app */ 1545 }; 1546 1547 self = SSL_POLL_METHOD_new(&funcs, sizeof(funcs), sizeof(APP_POLL_METHOD)); 1548 if (self == NULL) 1549 return NULL; 1550 1551 data = SSL_POLL_METHOD_get0_data(self); 1552 data->refcount = 1; 1553 return data; 1554} 1555``` 1556 1557We also provide a “default” method: 1558 1559```c 1560BIO_POLL_METHOD *SSL_get0_default_poll_method(const OSSL_PARAM *params); 1561``` 1562 1563No params are currently defined; this is reserved for future use. 1564 1565`SSL_poll` is a shorthand for using the method provided by 1566`SSL_get0_default_poll_method(NULL)`. 1567 1568### Internal Polling: Usage within SSL Objects 1569 1570To support custom pollers for internal polling, SSL objects receive an API that 1571allows a custom poller to be configured. To avoid confusion, custom pollers can 1572only be configured on an event leader, but the getter function will return the 1573custom poller configured on an event leader when called on any QUIC SSL object 1574in the hierarchy, or NULL if none is configured. 1575 1576An `SSL_POLL_METHOD` can be associated with an SSL object. It can also be set 1577on a `SSL_CTX` object, in which case it is inherited by SSL objects created from 1578it: 1579 1580```c 1581int SSL_CTX_set1_poll_method(SSL_CTX *ctx, SSL_POLL_METHOD *method); 1582SSL_POLL_METHOD *SSL_CTX_get0_poll_method(const SSL_CTX *ctx); 1583 1584int SSL_set1_poll_method(SSL *ssl, SSL_POLL_METHOD *method); 1585SSL_POLL_METHOD *SSL_get0_poll_method(const SSL *ssl); 1586``` 1587 1588An SSL object created from a `SSL_CTX` which has never had 1589`SSL_set1_poll_method` called on it directly inherits the value set on the 1590`SSL_CTX`, including if the poll method set on the `SSL_CTX` is changed after 1591the SSL object is created. Calling `SSL_set1_poll_method(..., NULL)` overrides 1592this behaviour. 1593 1594When a poll method is set on a QUIC domain, blocking API calls use that poller 1595to block as needed. 1596 1597Our QUIC implementation may, if it wishes, use the provided poll method to 1598construct a poll group, but is not guaranteed to do so. We reserve the right to 1599use the immediate mode or retained mode API of the poller as desired. If we use 1600the retained mode, we handle state updates and teardown as needed if the caller 1601later changes the configured poll method by calling `SSL_set1_poll_method` 1602again. 1603 1604If the poll method is set to NULL, we use the default poll method, which is the 1605same as the method provided by `SSL_get0_default_poll_method`. 1606 1607Because the poll method provided is used to handle blocking on network I/O, a 1608poll method provided in this context only needs to handle OS socket handles, 1609similar to our own reactor polling in QUIC MVP. 1610 1611### External Polling: Usage over SSL Objects 1612 1613An application can also use an `SSL_POLL_METHOD` itself, whether via the 1614immediate or retained mode. In the latter case it creates one or more 1615`SSL_POLL_GROUP` instances. 1616 1617Custom pollers are responsible for their own translation arrangements. 1618Retained-mode usage can be more efficient because it can allow recursive staging 1619of implementation-specific polling data. For example, suppose an application 1620enrolls a QCSO and two subsidiary QSSOs in a poll group. The reduction of these 1621three objects to a single pair of read/write BIO poll descriptors as provided by 1622an SSL object can be cached. 1623 1624### Future Adaptation to Internal Pollable Resources 1625 1626Suppose that in the future our QUIC implementation becomes more sophisticated 1627and we want to use a different kind of pollable resource to mask a more 1628elaborate internal reactor. For example, suppose for the sake of example we want 1629to switch to an internal thread-based reactor design, and signal readiness not 1630via an OS socket handle but via a condition variable or Linux-style `eventfd`. 1631 1632Our design would hold up under these conditions as follows: 1633 1634- For condition variables this would require a new poll descriptor type. 1635 Our default poller could be amended to support this new poll descriptor type. 1636 However, most OSes do not provide a way to simultaneously wait on a condition 1637 variable and other resources, so there are issues here unless an additional 1638 thread is used to adapt socket readiness to a condition variable. 1639 1640- For something like `eventfd` things will work well with the existing `SOCK_FD` 1641 type. A QUIC SSL object simply starts returning an eventfd fd for 1642 `BIO_get_rpoll_descriptor` and this becomes readable when signalled by our 1643 internal engine. `BIO_get_wpoll_descriptor` works in the same way. (Of course 1644 a change on this level would probably require some sort of application 1645 opt-in via our API.) 1646 1647- For something like Win32 Events, `WaitForSingleObject` or 1648 `WaitForMultipleObjects` works, but would require a new poll descriptor type. 1649 It is possible to plumb socket readiness into this API also, assuming Vista 1650 (WSAEventSelect). 1651 1652Worked Examples 1653--------------- 1654 1655### Internal Polling — Default Poll Method 1656 1657- Application creates a new QCSO 1658- Application does not set a custom poll method on it 1659- Application uses it in blocking mode and sets network BIOs 1660- Our QUIC implementation requests poll descriptors from the network BIOs 1661- Our QUIC implementation asks the default poller if it understands 1662 how to poll those poll descriptors. If not, blocking cannot be supported. 1663- When it needs to block, our QUIC implementation uses the default poll method 1664 in either immediate or retained mode based on the poll descriptors reported by 1665 the network BIOs provided 1666 1667### Internal Polling — Custom Poll Method 1668 1669- Application instantiates a custom poll method 1670- Application creates a new QCSO 1671- Application sets the custom poll method on the QCSO 1672- Application configures the QCSO for blocking mode and sets network BIOs 1673- Our QUIC implementation requests poll descriptors from the network BIOs 1674- Our QUIC implementation asks the custom poll method if it understands how to 1675- poll those poll descriptors. If not, blocking cannot be supported. 1676- When it needs to block, our QUIC implementation uses the custom poll method 1677 in either immediate or retained mode based on the poll descriptors reported 1678 by the network BIOs provided (internal polling) 1679 1680### External Polling — Immediate Mode 1681 1682- Application gets a poll method (default or custom) 1683- Application invokes poll() on the poll method on some number of QLSOs, QCSOs, QSSOs 1684 and OS sockets, etc. 1685- The poll method performs translation to a set of OS resources. 1686- The poll method asks the OS to poll/block. 1687- The poll method examines the results reported from the OS and performs reverse 1688 translation. 1689- The poll method poll() call reports the results and returns. 1690 1691Note that custom poller methods configured on a SSL object are used for internal 1692polling (blocking API calls) only. Thus they have no effect on the above 1693scenario. 1694 1695### External Polling — Retained Mode 1696 1697- Application gets a poll method (default or custom) 1698- Application uses the poll method to create a poll group 1699- Application registers some number of QLSOs, QCSOs, QSSOs and OS sockets, etc. 1700 in the poll group. 1701- The poll group caches translations to a set of OS resources. It may create 1702 an OS device for fast polling (e.g. epoll) and register these resources 1703 with that method. 1704- Application polls using the poll group. 1705- The poll group asks the OS to poll/block. 1706- The poll group examines the results reported from the OS and performs reverse 1707 translation. 1708- The poll method reports the results and returns. 1709 1710### External Polling — Immediate Mode Without Event Handling 1711 1712- Application gets a poll method (default or custom) 1713- Application invokes poll() on the poll method on some number of QLSOs, QCSOs, 1714 and QSSOs with `NO_HANDLE_EVENTS` set. 1715- If the poll method is the default poll method, it knows how to examine 1716 QUIC SSL objects for readiness and does so. 1717- If the poll method is a custom poll method, it could choose to subdelegate 1718 this work to the default poll method, or implement it itself. 1719 1720Change Notification Callback Mechanism 1721-------------------------------------- 1722 1723We propose to allow applications and libssl code to register callbacks for 1724lifecycle events on SSL objects, as discussed above. This can be used both by us 1725and by applications (e.g. to implement custom poller methods). The advantage 1726here is that an SSL object registered into a poll group can be automatically 1727unregistered from that poll group when it is freed. 1728 1729The proposed API is as follows: 1730 1731```c 1732/* 1733 * The SSL object is about to be freed (the refcount has reached zero). 1734 * The SSL object is still completely healthy until this call returns. 1735 * If the SSL object is reffed during a callback, the freeing is cancelled. 1736 * The callback then has full responsibility for its lifecycle. 1737 */ 1738#define SSL_LIFECYCLE_EVENT_TYPE_PRE_FREE 1 1739 1740/* 1741 * Either the read or write network BIO on an SSL object has just been changed, 1742 * or both. The fields in data.bio_change specify the old and new BIO pointers. 1743 * If a BIO reference is being set to NULL on an SSL object, the 'new' pointer 1744 * will be NULL; conversely, if a BIO is being set on an SSL object where 1745 * previously no BIO was set, the 'old' pointer will be NULL. If the applicable 1746 * flag (R or W) is not set, the old and new fields will be set to NULL. 1747 */ 1748#define SSL_LIFECYCLE_EVENT_TYPE_BIO_CHANGE 2 1749 1750#define SSL_LIFECYCLE_EVENT_FLAG_R (1U << 0) /* read BIO changed */ 1751#define SSL_LIFECYCLE_EVENT_FLAG_W (1U << 1) /* write BIO changed */ 1752 1753typedef struct ssl_lifecycle_event_st SSL_LIFECYCLE_EVENT; 1754typedef struct ssl_lifecycle_cb_cookie_st *SSL_LIFECYCLE_CB_COOKIE; 1755 1756/* Returns SSL_LIFECYCLE_EVENT_TYPE */ 1757uint32_t SSL_LIFECYCLE_EVENT_get_type(const SSL_LIFECYCLE_EVENT *event); 1758 1759/* Returns SSL_LIFECYCLE_EVENT_FLAG */ 1760uint32_t SSL_LIFECYCLE_EVENT_get_flags(const SSL_LIFECYCLE_EVENT *event); 1761 1762/* Returns an SSL object associated with the event (if applicable) */ 1763SSL *SSL_LIFECYCLE_EVENT_get0_ssl(const SSL_LIFECYCLE_EVENT *event); 1764 1765/* 1766 * For a BIO_CHANGE event, fills the passed pointers if non-NULL with the 1767 * applicable values. For other event types, fails. 1768 */ 1769int SSL_LIFECYCLE_EVENT_get0_bios(const SSL_LIFECYCLE_EVENT *event, 1770 BIO **r_old, BIO **r_new, 1771 BIO **w_old, BIO **w_new); 1772 1773/* 1774 * Register a lifecycle callback. Multiple lifecycle callbacks may be 1775 * registered. *cookie is written with an opaque value which may be used to 1776 * subsequently unregister the callback. 1777 */ 1778int SSL_register_lifecycle_callback(SSL *ssl, 1779 void (*cb)(const SSL_LIFECYCLE_EVENT *event, 1780 void *arg), 1781 void *arg, 1782 SSL_LIFECYCLE_CB_COOKIE *cookie); 1783 1784int SSL_unregister_lifecycle_callback(SSL *ssl, SSL_LIFECYCLE_CB_COOKIE cookie); 1785``` 1786 1787Q&A 1788--- 1789 1790**Q. How do we support poll methods which only support immediate mode?** 1791 1792A. We simply have a fallback path for this when our QUIC implementation consumes 1793a custom poller. This is easy enough. 1794 1795**Q. How do we support poll methods which only support retained mode?** 1796 1797A. We intend to implement support for retained mode in our QUIC implementation's 1798internal blocking code, so this should also work OK. Remember that an external 1799poller method does not interact with an internal poller method (i.e., a method 1800set on an SSL object). In particular, no two poller methods ever interact 1801directly with one another. This avoids the need for recursive state shadowing 1802(where one poll method's retained mode API maintains state and also makes calls 1803to another poll method's retained mode API). 1804 1805**Q. How does this design interact with hierarchical polling?** 1806 1807A. We assume an application uses its own polling arrangements initially and then 1808uses calls to an OpenSSL external polling API (such as `SSL_poll` or a poll 1809method) to drill down into what is actually ready, as discussed above. There is 1810no issue here. An application can also use OpenSSL polling APIs instead of its 1811own, if desired; for example it could create a poll group from the default poll 1812method and use it to poll only network sockets, some of which may be from QUIC 1813SSL object poll descriptors, and then if needed call SSL_poll to narrow things 1814down once something becomes ready. 1815 1816**Q. Should we support immediate and retained mode in the same API or segregate 1817these?** 1818 1819A. They are in the same API, though we let applications use capability bits 1820to report support for only one of these if they wish. 1821 1822**Q. How do we support extensibility of the poller interface?** 1823 1824A. Using an extensible function table. An application can set a function 1825 pointer to NULL if it does not support it. Capability flags are used to 1826 advertise what is supported. 1827 1828**Q. If an application sets a poll method on both an event leader and a poll 1829 group, what happens?** 1830 1831A. Setting a poll method on an event leader provides a mechanism used for internal 1832blocking when making blocking calls. It is never used currently if no QUIC SSL 1833object in the QUIC domain isn't used in blocking mode (though this isn't a 1834contractual guarantee and we might do so in future for fast identification of 1835what we need to handle if we handle multiple OS-level sockets in future). 1836 1837Setting a poll method on a poll group provides a mechanism used for polling 1838using that event group. Note that a custom poll method configured on a SSL 1839object is **not** used for the translation process performed by a poll group, 1840even when polling that SSL object. Translation is driven by 1841`SSL_get_[rw]poll_descriptor`. 1842 1843**Q. What if different poll methods are configured on different event leaders 1844 (QUIC domains) and an application then tries to poll them all?** 1845 1846A. Because the poll method configured on an event leader is ignored in favour of 1847the poll method directly invoked, there is no conflict here. The poll method 1848handles all polling when it is specifically invoked. 1849 1850**Q. Where should the responsibility for poll descriptor translation lie?** 1851 1852A. With the poll method or poll group being called at the time. 1853 1854**Q. What method does `SSL_poll` use?** 1855 1856A. It uses the default poll method. If an application wishes to use a different 1857poll method, it can call the `poll` method directly on that `BIO_POLL_METHOD`. 1858 1859**Q. An application creates a poll group, registers an SSL object and later 1860changes the network BIOs set on that SSL object, or changes the poll descriptors 1861they return. What happens?** 1862 1863A. This is solved with two design aspects: 1864 1865- An application is not allowed to have the poll descriptors returned by a BIO 1866 change silently. If it wishes to change these, it must call `SSL_set_bio` 1867 again, even if with the same BIOs already set. 1868 1869- We will need to either: 1870 1871 - have a callback registration interface so retained mode pollers 1872 which have performed cached translation can be notified that a poll 1873 descriptor they have relied on is changing (proposed above). 1874 1875 - require retained mode pollers to check for changes to translated objects 1876 (less efficient). 1877 1878 This might cause issues with epoll because we don't have an opportunity 1879 to deregister an FD in this case. 1880 1881 We choose the first option. 1882 1883**Q. An application creates a poll group, registers a QCSO and some subsidiary 1884QSSOs and later frees all of these objects. What happens? (In other words, are 1885SSL objects auto-deregistered from poller groups?)** 1886 1887A. We must assume a poll group retains an SSL object pointer if such an object 1888has been registered with it. Thus our options are either: 1889 1890- require applications to deregister objects from any poll group they are using 1891 prior to freeing them; or 1892 1893- add internal callback registration machinery to QUIC SSL objects so we can 1894 get a cleanup notification (see the above callback mechanism). 1895 1896We choose the latter. 1897 1898**Q. An application creates a poll group, registers a (non-QUIC-related) OS 1899socket handle and then closes it. What happens?** 1900 1901Since OSes in general do not provide a way to get notified of these closures it 1902is not really possible to handle this automatically. It is essential that an 1903application deregisters the handle from the poll group first. 1904 1905**Q. How does code using a poll method determine what poll descriptors that 1906method supports?** 1907 1908A query method is provided which can be used to determine if the method supports 1909a given descriptor. 1910 1911Windows support 1912--------------- 1913 1914Windows customarily poses a number of issues for supporting polling APIs. This 1915is largely because Windows chose an approach based around I/O *completion* 1916notification rather than around I/O *readiness* notification. While an implementation 1917of the Berkeley select(2)-style API is available, the options for higher 1918performance polling are largely confined to using I/O completion ports. 1919 1920Because the semantics of I/O readiness and I/O completion are very different, it 1921has proven impossible in practice to create an I/O readiness API as an 1922abstraction over Windows's I/O completion API. The converse is not true; it is 1923fairly easy to create an I/O completion notification API over an I/O readiness 1924API. 1925 1926It is therefore prudent to give some consideration to how Windows can be 1927supported: 1928 19291. We can always use `select` (or on Vista and later, `WSAPoll`). 1930 This may not actually be much of a problem as even in a server role, with QUIC 1931 we are likely to be handling a lot of clients on a relatively small number of 1932 OS sockets. 1933 19342. `WSAAsyncSelect` could be used with a helper thread. One thread could service 1935 multiple sockets, possibly even multiple poll groups. 1936 19373. `WSAEventSelect` allows a Win32 Event to be signalled on readiness, 1938 but this is not very useful because `WaitForMultipleObjects` is limited to 64 1939 objects (and even if it wasn't, poses the same issues as `select`, so back to 1940 where one started). 1941 19424. I/O Completion Ports are the “official” way to do high-performance I/O 1943 but notify on completion rather than readiness. It is impossible to build 1944 a poller API on top of this as such. As mentioned above, nobody has ever 1945 really managed to do so successfully. 1946 19475. `IOCTL_AFD_POLL`. This is an undocumented function of Winsock internals 1948 which allows a) epoll/kqueue-style interfaces to be built over Winsock, b) 1949 which are highly performant, like epoll/kqueue, and c) which use IOCPs to 1950 signal *readiness* rather than *completion*. In fact, this is what the 1951 `select` and `WSAPoll` functions use internally. Unlike those functions, this 1952 is based around registering sockets in advance and submits readiness 1953 notifications to an IOCP, so this can be quite performant. 1954 1955 `IOCTL_AFD_POLL` is an internal, undocumented API. It is however widely used, 1956 and is now the basis of libuv (the I/O library used by Node.js), ZeroMQ, and 1957 Rust's entire asynchronous I/O ecosystem on Windows. In other words, while 1958 officially being undocumented and internal, it has in practice become widely 1959 used by third-party software, to the point where it cannot really be changed 1960 in future without breaking massive amounts of software. `IOCTL_AFD_POLL` has 1961 been around since at least NT 4 and is supported by Wine. Moreover it is 1962 worth noting that the reason why so many projects have resorted to using this 1963 API on Windows is due to the sheer lack of anything providing the appropriate 1964 functionality in the public API. The high level of reliance on this 1965 functionality in contemporary software doing asynchronous I/O does give 1966 reasonable confidence in using this API. 1967 1968An immediate mode interface can be implemented using option 1. 1969 1970Based on the above, options 1, 2 and 5 are viable for implementation of a 1971retained mode interface, with option 2 being a fairly substantial hack and 1972option 5 being the preferred approach for projects wanting an epoll/kqueue-style 1973model on Windows. The suggested approach is therefore to implement option 5, 1974though option 1 is also a viable fallback. 1975 1976In any case, it appears the poller API as designed and proposed above 1977can be implemented adequately on Windows. 1978 1979Extra features on QUIC objects 1980------------------------------ 1981 1982These are unlikely to be implemented initially — this is just some exploration 1983of features we might want to offer in future and how they would interact with 1984the polling design. 1985 1986### Low-watermark functionality 1987 1988Sometimes an application knows it does not need to do anything until at least N 1989bytes are available to read or write. In conventional Berkeley sockets APIs this 1990is known as “low-watermark” (LOWAT) functionality. 1991 1992Rather than making polling interfaces more convoluted by adding fields to 1993polling-related structures, we propose to add a knob which can be configured on 1994an individual QUIC stream: 1995 1996```c 1997#define SSL_LOWAT_FLAG_ONESHOT (1U << 0) 1998 1999int SSL_set_read_lowat(SSL *ssl, size_t lowat, uint64_t flags); 2000int SSL_get_read_lowat(SSL *ssl, size_t *lowat); 2001 2002int SSL_set_write_lowat(SSL *ssl, size_t lowat, uint64_t flags); 2003int SSL_get_write_lowat(SSL *ssl, size_t *lowat); 2004``` 2005 2006If `ONESHOT` is set, the low-watermark condition is automatically cleared 2007after the next call to a read or write function respectively. The low-watermark 2008condition can also be cleared by passing a low-watermark of 0. 2009 2010If low-watermark mode is configured, a poller will not report a stream as having 2011data ready to read, or room to write data, if the amount of room available is 2012less than the configured watermark. 2013 2014### Timeouts 2015 2016It is desirable to be able to cause blocking I/O operations to time out. For 2017example, an application might want to perform a blocking read from a peer but 2018only wait for a certain amount of time. 2019 2020We support this with a configurable timeout per each type of operation. 2021 2022```c 2023/* All operations - defined as separate bit for forward ABI compatibility */ 2024#define SSL_OP_CLASS_ALL (1U << 0) 2025/* The timeout concerns reads. */ 2026#define SSL_OP_CLASS_R (1U << 1) 2027/* The timeout concerns writes. */ 2028#define SSL_OP_CLASS_W (1U << 2) 2029/* The timeout concetns accepts. */ 2030#define SSL_OP_CLASS_A (1U << 3) 2031/* The timeout concerns new stream creation (which may be blocked on FC). */ 2032#define SSL_OP_CLASS_N (1U << 4) 2033/* The timeout concerns connects. */ 2034#define SSL_OP_CLASS_C (1U << 5) 2035 2036/* 2037 * If set, t is a deadline (absolute time), otherwise it is a duration which 2038 * starts whenever an operation is commenced. 2039 */ 2040#define SSL_TIMEOUT_FLAG_DEADLINE (1U << 0) 2041 2042/* 2043 * Configure a timeout for one or more operation types. At least one operation 2044 * type must be specified. If t is NULL, the timeout is unset for the given 2045 * operation. This may be called multiple times to set different timeouts 2046 * for different operations. 2047 */ 2048int SSL_set_io_timeout(SSL *ssl, uint64_t operation, 2049 const struct timeval *t, uint64_t flags); 2050 2051/* 2052 * Retrieves a configured timeout value. operation must be a single operation 2053 * flag from SSL_OP_CLASS. If a timeout is configured for the operation 2054 * type, *is_set is written as 1 and *t is written with the configured timeout. 2055 * *flags is written with SSL_OP_CLASS_DEADLINE or 0 as applicable. 2056 * Otherwise, *is_set is written as 0, the value of *t is undefined and *flags 2057 * is set to 0. Returns 1 on success (including if unset) and 0 on failure (for 2058 * example if called on an unsupported SSL object type). 2059 */ 2060int SSL_get_io_timeout(SSL *ssl, uint64_t operation, 2061 struct timeval *t, int *is_set, 2062 uint64_t *flags); 2063 2064/* 2065 * Returns 1 if the last invocation of an applicable operation specified by 2066 * operation failed due to a timeout. 2067 * 2068 * For SSL_OP_CLASS_R, this means SSL_read or SSL_read_ex. 2069 * For SSL_OP_CLASS_W, this means SSL_write or SSL_write_ex. 2070 * For SSL_OP_CLASS_A, this means SSL_accept_stream. 2071 * For SSL_OP_CLASS_N, this means SSL_new_stream. 2072 * For SSL_OP_CLASS_C, this means SSL_do_handshake or any 2073 * function which implicitly calls it, which includes any other I/O function 2074 * if the connection process has not been completed yet. 2075 * 2076 * If a function is called in non-blocking mode and it cannot execute 2077 * immediately, this is considered to be a timeout. Therefore while timeouts are 2078 * not useful in non-blocking mode, this function can be used to determine if a 2079 * function failed because it would otherwise block. 2080 * 2081 * Invoking any operation of a given operation class clears the timeout flag 2082 * for that operation class regardless of the outcome of that operation. 2083 */ 2084int SSL_timed_out(SSL *ssl, uint64_t operation); 2085``` 2086 2087We could consider adding a new `SSL_get_error` code also (`SSL_ERROR_TIMEOUT`). 2088There are no compatibility issues here because it will only be returned if an 2089application chooses to use the timeout functionality. 2090 2091TODO: Check for duplicate existing APIs 2092 2093TODO: Consider using ctrls 2094 2095### Autotick control 2096 2097We automatically engage in event handling when an I/O function such as 2098`SSL_read`, `SSL_write`, `SSL_accept_stream` or `SSL_new_stream` is called. 2099This is likely to be undesirable for applications in many circumstances, 2100so we should have a way to inhibit this. 2101 2102```c 2103#define SSL_EVENT_FLAG_INHIBIT (1U << 0) 2104#define SSL_EVENT_FLAG_INHIBIT_ONCE (1U << 1) 2105 2106/* 2107 * operation is one or more SSL_OP_CLASS values. Inhibition can be enabled for a 2108 * single future call to an operation of that type (INHIBIT_ONCE), after which 2109 * it is disabled, or enabled persistently (INHIBIT). 2110 */ 2111int SSL_set_event_flags(SSL *ssl, uint64_t operation, uint64_t flags); 2112 2113/* 2114 * operation must specify a single operation. The flags configured are reported 2115 * in *flags. 2116 */ 2117int SSL_get_event_flags(SSL *ssl, uint64_t operation, uint64_t *flags); 2118``` 2119 2120Autotick inhibition is only useful in non-blocking mode and it is ignored in 2121blocking mode. Using it in non-blocking mode carries the following implications: 2122 2123- Data can be drained using `SSL_read` from existing buffers, but network I/O 2124 is not serviced and no new data will arrive (unless `SSL_handle_events` is 2125 called). 2126 2127- Data can be placed into available write buffer space using `SSL_write`, 2128 but data will not be transmitted (unless `SSL_handle_events` is called). 2129 2130- Likewise, no new incoming stream events will occur, and if calls to 2131 `SSL_new_stream` are currently blocked due to flow control, this 2132 situation will not change. 2133 2134- `SSL_do_handshake` will simply report whether the handshake is done or not. 2135