#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source.  A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#

#
# Copyright 2020 Robert Mustacchi
#
     _      _ _
 ___| |_ __| (_) ___
/ __| __/ _` | |/ _ \
\__ \ || (_| | | (_) |
|___/\__\__,_|_|\___/

Notes on the design of stdio.

------------
File Streams
------------

At the heart of the stdio is the 'FILE *'. The 'FILE *' represents a
stream that can be read, written, and seeked. The streams traditionally
refer to a file descriptor, when created by fopen(3C), or may refer to
memory, when created by open_memstream(3C) or fmopen(3C). This document
focuses on the implementation of streams. Other misc. functions in stdio
are not discussed.

------------
Organization
------------

Most functions exist in a file with the same name. When adding new
files to stdio the file name should match the primary function name.
There are a few exceptions. Almost all of the logic related to both
flushing and knowledge of how to handle the 32-bit ABI issues (described
in the next section) can be found in flush.c.

-----------------------------
struct __FILE_TAG and the ABI
-----------------------------

The definition of the 'FILE *' is a pointer to a 'struct __FILE_TAG'.
The 'struct __FILE_TAG' structure has a long history that dates back to
historical UNIX. For better or for worse, we have inherited some of the
design decisions of the past, it's important to understand what those
are as they have profound impact on the stdio design and serve as a good
cautionary tale for future ABI decisions.

In the original UNIX designs, the 'struct __FILE_TAG' was exposed as a
non-opaque structure. This was also true on other platforms. This had a
couple of challenges:

* It meant the size of the 'struct __FILE_TAG' was part of the ABI
* Consumers would access the members directly. You can find examples of
  this in our public headers where things like getc() are inlined in
  terms of the implementation. Various 3rd-party software that has
  existed for quite some time knows the offset of members and directly
  manipulates them. This is still true as of 2020.
* The 'struct __FILE_TAG' only used an unsigned char (uint8_t) for the
  file descriptor in the 32-bit version. Other systems used a short, so
  they were in better shape. This was changed in the 64-bit version to
  use an int.
* The main C stdio symbols 'stdin', 'stdout', and 'stderr', were (and
  still are) exposed as an array. This means that while the 64-bit
  structure is opaque, its size is actually part of the ABI.

All of these issues have been dealt with in different ways in the
system. The first thing that is a little confusing is where to find the
definitions of the actual implementation. The 32-bit 'struct __FILE_BUF'
is split into two different pieces, the part that is public and a
secondary, private part.

The public definition of the 'struct __FILE_TAG' for 32-bit code and the
opaque definition for 64-bit code may be found in
'usr/src/head/stdio_impl.h.'. The actual definition of the 64-bit
structure and the 32-bit additions are all found in
'usr/src/lib/libc/inc/file64.h.'

In file64.h, one will find the 'struct xFILEdata' (extended FILE * data).
This represents all of the data that has been added to stdio that is
missing from the public structure. Whenever a 'FILE *' is allocated,
32-bit code always ensures that there is a corresponding 'struct
xFILEdata' that exists. Currently, we still have plenty of padding left
in the 64-bit version of the structure for at least 3 pointers.

To add a member to the structure, one has to add data to the structures
in 'lib/libc/inc/file64.h'. If for some reason, all the padding would be
used up, then you must stop. The size of the 64-bit structure _cannot_
be extended, as noted earlier it is part of the ABI. If we hit this
case, then one must introduce the struct xFILEdata for the lp64
environment.

--------------------------
Allocating FILE Structures
--------------------------

libc defines a number of 'FILE *' structures by default. These can all
be found in 'data.c'. The first _NFILE (20 or 60 depending on the
platform) are defined statically. In the 32-bit case, the corresponding
'struct _xFILEdata' is allocated along with it.

To determine if a structure is free or not in the array, the `_flag`
member is consulted. If the flag has been set to zero, then the STREAM
is considered free and can be allocated. All of the allocated (whether
used or not) 'FILE *' structures are present on a linked list which is
found in 'flush.c' rooted at the symbol '__first_link'. This list is
always scanned to try and reuse an existing 'FILE *' structure before
allocating a new one. If all of the existing ones are in use, then one
will be allocated.

An important thing to understand is that once allocated, a 'FILE *' will
never be freed by libc. It will always exist on the global list of
structures to be reused.

---------
Buffering
---------

Every stream in stdio starts out as buffered. Buffering can be changed
by calling either setbuf(3C) or setvbuf(3C). This buffer is stored in
the `_base` member of the 'struct __FILE_TAG'. The amount of valid data
in the buffer is maintained in the '_cnt' member of the structure. By
default, there is no associated buffer with a stream. When the stream is
first used, the buffer will be assigned by a call to _findbuf() in
_findbuf.c.

There are pre-allocated buffers that exist. There are two specifically
for stdin and stdout (stderr is unbuffered). These include space for
both the buffer and the pushback buffer. The pushback buffer is used so
someone can call fungetc(3C) regardless of whether a buffering mode is
enabled or not. Characters that we 'unget' are placed on the pushback
buffer.

For other buffering modes, we'll try and allocate an appropriate sized
buffer. The buffer size defaults to BUFSIZ, but if the stream is backed
by a file descriptor, we'll use fstat() to determine the appropriate
size to use and match the file system block size. If we cannot allocate
that, we'll fall back to trying to allocate a pushback buffer.

libc defines static data for _NFILE worth of pushback buffers which are
indexed based on the underlying file descriptor. This and the stdin and
stdout buffers are all found in 'data.c' in  _smbuf, _sibuf, and _sobuf
respectively.

------------------------------
Reading, Writing, and Flushing
------------------------------

By default, reads and writes on a stream, whether backed by a
file-descriptor or not, go through the buffer described in the previous
section. If a read or write can be satisfied by the buffer, then no
underlying I/O will occur, unless buffering has been disabled.

The various function entry points that read such as fread(3C) or
fgetc(3C) will not call read() directly but will instead try to fill the
buffer, which will cause a read if required. This is centralized in
_filbuf(). When a read is required from the underlying file, it will
call _xread() in flush.c. For more on _xread() see the operations vector
section further along.

Unlike reads, writes are much less centralized and each of the main
writing entry points has reimplemented the path of writing to the buffer
and flushing it. It would be good in the future to consolidate them. In
general, data will be written directly to the stdio buffer. When that
buffer needs to be flushed either the _flsbuf() or _xflsbuf() functions
will be called to actually flush out the buffer.

When data needs to be flushed from a buffer to its underlying file
descriptor (or other backing store), all of the write family functions
ultimately call _xwrite().

Flushes can occur in a few different ways:

1. A write has filled up the buffer.
2. A new line ('\n') is written and new-line buffering is used.
3. fflush(3C) or a similar function has been called.
4. A read occurs on a buffer that has unflushed writes.
5. The stream is being closed.

Most of these methods are fairly similar; however, the fflush(3C) case
is a little different. fflush() may be asked to flush all of the streams
when it is passed a NULL stream. Even when that happens it will still
utilize the same underlying mechanism via _xflsbuf() or _flsbuf().

-----------
Orientation
-----------

Streams handle both wide characters and narrow characters. There is an
internal multi-byte conversion state buffer that is included with every
stream. A stream may exist in one of three modes:

1. It may have an explicit narrow orientation
2. It may have an explicit wide orientation
3. It may have no orientation

When most streams are created, they have no orientation. The orientation
can then be explicitly set by calling fwide(3C). Some streams are also
created with an explicit orientation, for example, open_wmemstream(3C)
always sets the stream to be wide.

The C standard dictates that certain operations will actually cause a
stream with no orientation to have an explicit orientation set. Calling
a narrow or wide related character function, such as 'fgetc(3C)' or
'fgetwc(3C)' respectively will then cause the orientation to be set if
it has not been. Once an orientation for a stream has been set, it
cannot be changed until the stream has been closed or it is reset by
calling freopen(3C).

There are a few functions that don't change this today. One example is
ungetc(3C). Often this isn't indicative of whether it should or
shouldn't change the orientation, but is a side effect of the history of
the stdio implementation.

-------------------------------------
Operations Vectors and Memory Streams
-------------------------------------

Traditionally, stdio streams were always backed by a file descriptor of
some kind and therefore always called out into functions like read(2),
write(2), lseek(2), and close(2) directly. A series of new functions
were introduced in POSIX 2008 that add support for streams backed by
memory in the form of fmemopen(3C), open_memstream(3C), and
open_wmemstream(3C).

To deal with this and other possible designs, an operations vector was
added to the stream represented by the 'stdio_ops_t' structure. This is
stored in the '_ops' member of the 'struct __FILE_BUF'. For a normal
stream backed by a file descriptor, this member will be NULL.

In places where a normal system call would have been made there is now a
call to a corresponding function such as _xread(), _xwrite(), xseek(),
_xseek64(), and _xclose(). If an operations vector is defined, it will
call into the corresponding operation vector. If not, it will perform
the traditional system call. This design choice consolidates all of the
work required to implement non-file descriptor backed streams.

When creating a non-file backed stream there are several expectations in
the system:

* The stream code should obtain a stream normally through a call to
  _findiop().
* If one needs to translate the normal fopen(3C) arguments, they should
  use the _stdio_flags() function. This will also construct the
  appropriate internal stdio flags for the stream.
* The stream code must call _xassoc() to set the file operations vector
  before return a 'FILE *' out of libc.
* All of the operations vectors must be implemented.
* If the stream is seekable, it must explicitly use the SET_SEEKABLE()
  macro before return the stream.
* If the stream is supposed to have a default orientation, it must set
  it by calling _setorientation(). Not all streams have a default
  orientation.
* In the stream's close entry point it should call _xunassoc().

--------------------------
Extended File and fileno()
--------------------------

The 32-bit libc has historically been limited to 255 open streams
because of the use of an unsigned char. This problem does not impact the
64-bit libc. To deal with this, libc uses a series of techniques which
are summarized for users in extendedFILE(7). The usage of extendedFILE
can also be enabled by passing the special 'F' character to fopen(3C).

The '_magic' member in the 32-bit 'struct __FILE_TAG' contains what used
to be the file descriptor. When extended file is not in use, the
_magic member still does contain the file descriptor. However, when
extendedFILE is enabled, then the _magic member contains a sentinel
value and the actual value is stored in the 'struct xFILEdata' _magic
member.

The act of getting the correct file descriptor has been centralized in a
function called _get_fd(). This function knows how to handle the special
32-bit case and the normal case. It also centralizes the logic of
checking for a non-file backed stream. There are many cases in libc
where we want to know the file descriptor to perform some operation;
however, non-file backed streams do not have a corresponding file
descriptor. When such a stream is detected, we will explicitly return
-1. This ensures that a bad file descriptor will be used if someone
mistakenly calls a system call. Functions like _fileno() call this
directly.

-------
Testing
-------

There is a burgeoning test suite for stdio in
usr/src/test/libc-tests/tests/stdio. If working in stdio (or libc more
generally) it is recommended that you run this test suite and add new
tests to it where appropriate. For most new functionality it is
encouraged that you both import test suites that may already exist and
that you also write your own test suites to properly cover a number of
error and corner cases.

Tests should also be written against libumem(3LIB), and umem debugging
should be explicitly enabled in the program. Enabling umem debugging can
catch a number of common memory usage errors. It also makes it easier to
test for memory leaks by taking a core file and used the mdb
'::findleaks' dcmd. A good starting point is to place the following in
the program:

const char *
_umem_debug_init(void)
{
	return ("default,verbose");
}

const char *
_umem_logging_init(void)
{
	return ("fail,contents");
}

For the definition of these flags, see umem_debug(3MALLOC).

In addition, by leveraging umem debugging it becomes very easy to
simulate malloc failure when required. This can be enabled by calling
umem_setmtbf(1), which ensures that any subsequent memory requests
through malloc(), including those made indirectly by libc, will fail. To
restore the behavior after a test, one can simply call umem_setmtbf(0).