.. index:: Field Formatting .. _field-formatting: Field Formatting ---------------- The field format is similar to the format string for printf(3). Its use varies based on the role of the field, but generally is used to format the field's contents. If the format string is not provided for a value field, it defaults to "%s". Note a field definition can contain zero or more printf-style 'directives', which are sequences that start with a '%' and end with one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive is matched by one of more arguments to the xo_emit function. The format string has the form:: '%' format-modifier * format-character The format-modifier can be: - a '#' character, indicating the output value should be prefixed with '0x', typically to indicate a base 16 (hex) value. - a minus sign ('-'), indicating the output value should be padded on the right instead of the left. - a leading zero ('0') indicating the output value should be padded on the left with zeroes instead of spaces (' '). - one or more digits ('0' - '9') indicating the minimum width of the argument. If the width in columns of the output value is less than the minimum width, the value will be padded to reach the minimum. - a period followed by one or more digits indicating the maximum number of bytes which will be examined for a string argument, or the maximum width for a non-string argument. When handling ASCII strings this functions as the field width but for multi-byte characters, a single character may be composed of multiple bytes. xo_emit will never dereference memory beyond the given number of bytes. - a second period followed by one or more digits indicating the maximum width for a string argument. This modifier cannot be given for non-string arguments. - one or more 'h' characters, indicating shorter input data. - one or more 'l' characters, indicating longer input data. - a 'z' character, indicating a 'size_t' argument. - a 't' character, indicating a 'ptrdiff_t' argument. - a ' ' character, indicating a space should be emitted before positive numbers. - a '+' character, indicating sign should emitted before any number. Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be removed eventually. The format character is described in the following table: ===== ================= ====================== Ltr Argument Type Format ===== ================= ====================== d int base 10 (decimal) i int base 10 (decimal) o int base 8 (octal) u unsigned base 10 (decimal) x unsigned base 16 (hex) X unsigned long base 16 (hex) D long base 10 (decimal) O unsigned long base 8 (octal) U unsigned long base 10 (decimal) e double [-]d.ddde+-dd E double [-]d.dddE+-dd f double [-]ddd.ddd F double [-]ddd.ddd g double as 'e' or 'f' G double as 'E' or 'F' a double [-]0xh.hhhp[+-]d A double [-]0Xh.hhhp[+-]d c unsigned char a character C wint_t a character s char \* a UTF-8 string S wchar_t \* a unicode/WCS string p void \* '%#lx' ===== ================= ====================== The 'h' and 'l' modifiers affect the size and treatment of the argument: ===== ============= ==================== Mod d, i o, u, x, X ===== ============= ==================== hh signed char unsigned char h short unsigned short l long unsigned long ll long long unsigned long long j intmax_t uintmax_t t ptrdiff_t ptrdiff_t z size_t size_t q quad_t u_quad_t ===== ============= ==================== .. index:: UTF-8 .. index:: Locale .. _utf-8: UTF-8 and Locale Strings ~~~~~~~~~~~~~~~~~~~~~~~~ For strings, the 'h' and 'l' modifiers affect the interpretation of the bytes pointed to argument. The default '%s' string is a 'char \*' pointer to a string encoded as UTF-8. Since UTF-8 is compatible with ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a 'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte string encoded with the current locale, as given by the LC_CTYPE, LANG, or LC_ALL environment varibles. The first of this list of variables is used and if none of the variables are set, the locale defaults to "UTF-8". libxo will convert these arguments as needed to either UTF-8 (for XML, JSON, and HTML styles) or locale-based strings for display in text style:: xo_emit("All strings are utf-8 content {:tag/%ls}", L"except for wide strings"); ======== ================== =============================== Format Argument Type Argument Contents ======== ================== =============================== %s const char \* UTF-8 string %S const char \* UTF-8 string (alias for '%ls') %ls const wchar_t \* Wide character UNICODE string %hs const char * locale-based string ======== ================== =============================== .. admonition:: "Long", not "locale" The "*l*" in "%ls" is for "*long*", following the convention of "%ld". It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to "%ls". For example, the following function is passed a locale-base name, a hat size, and a time value. The hat size is formatted in a UTF-8 (ASCII) string, and the time value is formatted into a wchar_t string:: void print_order (const char *name, int size, struct tm *timep) { char buf[32]; const char *size_val = "unknown"; if (size > 0) snprintf(buf, sizeof(buf), "%d", size); size_val = buf; } wchar_t when[32]; wcsftime(when, sizeof(when), L"%d%b%y", timep); xo_emit("The hat for {:name/%hs} is {:size/%s}.\n", name, size_val); xo_emit("It was ordered on {:order-time/%ls}.\n", when); } It is important to note that xo_emit will perform the conversion required to make appropriate output. Text style output uses the current locale (as described above), while XML, JSON, and HTML use UTF-8. UTF-8 and locale-encoded strings can use multiple bytes to encode one column of data. The traditional "precision'" (aka "max-width") value for "%s" printf formatting becomes overloaded since it specifies both the number of bytes that can be safely referenced and the maximum number of columns to emit. xo_emit uses the precision as the former, and adds a third value for specifying the maximum number of columns. In this example, the name field is printed with a minimum of 3 columns and a maximum of 6. Up to ten bytes of data at the location given by 'name' are in used in filling those columns:: xo_emit("{:name/%3.10.6s}", name); Characters Outside of Field Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Characters in the format string that are not part of a field definition are copied to the output for the TEXT style, and are ignored for the JSON and XML styles. For HTML, these characters are placed in a