.\"
.\" This file and its contents are supplied under the terms of the
.\" Common Development and Distribution License ("CDDL"), version 1.0.
.\" You may only use this file in accordance with the terms of version
.\" 1.0 of the CDDL.
.\"
.\" A full copy of the text of the CDDL should have accompanied this
.\" source.  A copy of the CDDL is also available via the Internet at
.\" http://www.illumos.org/license/CDDL.
.\"
.\"
.\" Copyright 2016 Joyent, Inc.
.\"
.Dd January 31, 2016
.Dt BYTEORDER 5
.Os
.Sh NAME
.Nm byteorder ,
.Nm endian
.Nd byte order and endianness
.Sh DESCRIPTION
Integer values which occupy more than 1 byte in memory can be laid out
in different ways on different platforms. In particular, there is a
major split between those which place the least significant byte of an
integer at the lowest address, and those which place the most
significant byte there instead. As this difference relates to which
end of the integer is found in memory first, the term
.Em endian
is used to refer to a particular byte order.
.Pp
A platform is referred to as using a
.Em big-endian
byte order when it places the most significant byte at the lowest
address, and
.Em little-endian
when it places the least significant byte first. Some platforms may also
switch between big- and little-endian mode and run code compiled for
either.
.Pp
Historically, there have also been some systems that utilized
.Em middle-endian
byte orders for integers larger than 2 bytes. Such orderings are not in
common use today.
.Pp
Endianness is also of particular importance when dealing with values
that are being read into memory from an external source. For example,
network protocols such as IP conventionally define the fields in a
packet as being always stored in big-endian byte order. This means that
a little-endian machine will have to perform transformations on these
fields in order to process them.
.Ss Examples
To illustrate endianness in memory, let us consider the decimal integer
2864434397. This number fits in 32 bits of storage (4 bytes).
.Pp
On a big-endian system, this integer would be written into memory as
the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to
highest.
.Pp
On a little-endian system, it would be written instead as the bytes
0xDD, 0xCC, 0xBB, 0xAA, in that order.
.Pp
If both the big- and little-endian systems were asked to store this
integer at address 0x100, we would see the following in each of their
memory:
.Bd -literal

                    Big-Endian

        ++------++------++------++------++
        || 0xAA || 0xBB || 0xCC || 0xDD ||
        ++------++------++------++------++
            ^^      ^^      ^^      ^^
          0x100   0x101   0x102   0x103
            vv      vv      vv      vv
        ++------++------++------++------++
        || 0xDD || 0xCC || 0xBB || 0xAA ||
        ++------++------++------++------++

                  Little-Endian
.Ed
.Pp
It is particularly important to note that even though the byte order is
different between these two machines, the bit ordering within each byte,
by convention, is still the same.
.Pp
For example, take the decimal integer 4660, which occupies in 16 bits (2
bytes).
.Pp
On a big-endian system, this would be written into memory as 0x12, then
0x34.
.Pp
On a little-endian system, it would be written as 0x34, then 0x12.  Note
that this is not at all the same as seeing 0x43 then 0x21 in memory --
only the bytes are re-ordered, not any bits (or nybbles) within them.
.Pp
As before, storing this at address 0x100:
.Bd -literal
                    Big-Endian

                ++------++------++
                || 0x12 || 0x34 ||
                ++------++------++
                    ^^      ^^
                  0x100   0x101
                    vv      vv
                ++------++------++
                || 0x34 || 0x12 ||
                ++------++------++

                   Little-Endian
.Ed
.Pp
This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored
in both big and little-endian:
.Bd -literal
                        Big-Endian

    +------+------+------+------+------+------+------+------+
    | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF |
    +------+------+------+------+------+------+------+------+
       ^^     ^^     ^^     ^^     ^^     ^^     ^^     ^^
     0x100  0x101  0x102  0x103  0x104  0x105  0x106  0x107
       vv     vv     vv     vv     vv     vv     vv     vv
    +------+------+------+------+------+------+------+------+
    | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA |
    +------+------+------+------+------+------+------+------+

                       Little-Endian

.Ed
.Pp
The treatment of different endian values would not be complete without
discussing
.Em PDP-endian ,
which is also known as
.Em middle-endian .
While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit
values in a different way from current little-endian systems. First, it
would divide a 32-bit number into two 16-bit numbers. Each 16-bit number
would be stored in little-endian; however, the two 16-bit words would be
stored with the larger 16-bit word appearing first in memory, followed
by the latter.
.Pp
The following image illustrates PDP-endian and compares it against
little-endian values. Here, we'll start with the value 0xAABBCCDD and
show how the four bytes for it will be laid out, starting at 0x100.
.Bd -literal
                    PDP-Endian

        ++------++------++------++------++
        || 0xBB || 0xAA || 0xDD || 0xCC ||
        ++------++------++------++------++
            ^^      ^^      ^^      ^^
          0x100   0x101   0x102   0x103
            vv      vv      vv      vv
        ++------++------++------++------++
        || 0xDD || 0xCC || 0xBB || 0xAA ||
        ++------++------++------++------++

                  Little-Endian

.Ed
.Ss Network Byte Order
The term 'network byte order' refers to big-endian ordering, and
originates from the IEEE. Early disagreements over which byte ordering
to use for network traffic prompted RFC1700 to define that all
IETF-specified network protocols use big-endian ordering unless noted
explicitly otherwise. The Internet protocol family (IP, and thus TCP and
UDP etc) particularly adhere to this convention.
.Ss Determining the System's Byte Order
The operating system supports both big-endian and little-endian CPUs. To
make it easier for programs to determine the endianness of the
platform they are being compiled for, functions and macro constants are
provided in the system header files.
.Pp
The endianness of the system can be obtained by including the header
.In sys/types.h
and using the pre-processor macros
.Sy _LITTLE_ENDIAN
and
.Sy _BIG_ENDIAN .
See
.Xr types.h 3HEAD
for more information.
.Pp
Additionally, the header
.In endian.h
defines an alternative means for determining the endianness of the
current system. See
.Xr endian.h 3HEAD
for more information.
.Pp
illumos runs on both big- and little-endian systems. When writing
software for which the endianness is important, one must always check
the byte order and convert it appropriately.
.Ss Converting Between Byte Orders
The system provides two different sets of functions to convert values
between big-endian and little-endian. They are defined in
.Xr byteorder 3C
and
.Xr endian 3C .
.Pp
The
.Xr byteorder 3SOCKET
family of functions convert data between the host's native byte order
and big- or little-endian.
The functions operate on either 16-bit, 32-bit, or 64-bit values.
Functions that convert from network byte order to the host's byte order
start with the string
.Sy ntoh ,
while functions which convert from the host's byte order to network byte
order, begin with
.Sy hton .
For example, to convert a 32-bit value, a long, from network byte order
to the host's, one would use the function
.Xr ntohl 3SOCKET .
.Pp
These functions have been standardized by POSIX. However, the 64-bit variants,
.Xr ntohll 3SOCKET
and
.Xr htonll 3SOCKET
are not standardized and may not be found on other systems. For more
information on these functions, see
.Xr byteorder 3SOCKET .
.Pp
The second family of functions,
.Xr endian 3C ,
provide a means to convert between the host's byte order
and big-endian and little-endian specifically. While these functions are
similar to those in
.Xr byteorder 3C ,
they more explicitly cover different data conversions. Like them, these
functions operate on either 16-bit, 32-bit, or 64-bit values. When
converting from big-endian, to the host's endianness, the functions
begin with
.Sy betoh .
If instead, one is converting data from the host's native endianness to
another, then it starts with
.Sy htobe .
When working with little-endian data, the prefixes
.Sy letoh
and
.Sy htole
convert little-endian data to the host's endianness and from the host's
to little-endian respectively.
.Pp
These functions
are not standardized and the header they appear in varies between the
BSDs and GNU/Linux. Applications that wish to be portable, should
instead use the
.Xr byteorder 3C
functions.
.Pp
All of these functions in both families simply return their input when
the host's native byte order is the same as the desired order. For
example, when calling
.Xr htonl 3SOCKET
on a big-endian system the original data is returned with no conversion
or modification.
.Sh SEE ALSO
.Xr endian 3C ,
.Xr endian.h 3HEAD ,
.Xr inet 3HEAD ,
.Xr byteorder 3SOCKET