1 2.. index:: Field Formatting 3.. _field-formatting: 4 5Field Formatting 6---------------- 7 8The field format is similar to the format string for printf(3). Its 9use varies based on the role of the field, but generally is used to 10format the field's contents. 11 12If the format string is not provided for a value field, it defaults to 13"%s". 14 15Note a field definition can contain zero or more printf-style 16'directives', which are sequences that start with a '%' and end with 17one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive 18is matched by one of more arguments to the xo_emit function. 19 20The format string has the form:: 21 22 '%' format-modifier * format-character 23 24The format-modifier can be: 25 26- a '#' character, indicating the output value should be prefixed 27 with '0x', typically to indicate a base 16 (hex) value. 28- a minus sign ('-'), indicating the output value should be padded on 29 the right instead of the left. 30- a leading zero ('0') indicating the output value should be padded on the 31 left with zeroes instead of spaces (' '). 32- one or more digits ('0' - '9') indicating the minimum width of the 33 argument. If the width in columns of the output value is less than 34 the minimum width, the value will be padded to reach the minimum. 35- a period followed by one or more digits indicating the maximum 36 number of bytes which will be examined for a string argument, or the maximum 37 width for a non-string argument. When handling ASCII strings this 38 functions as the field width but for multi-byte characters, a single 39 character may be composed of multiple bytes. 40 xo_emit will never dereference memory beyond the given number of bytes. 41- a second period followed by one or more digits indicating the maximum 42 width for a string argument. This modifier cannot be given for non-string 43 arguments. 44- one or more 'h' characters, indicating shorter input data. 45- one or more 'l' characters, indicating longer input data. 46- a 'z' character, indicating a 'size_t' argument. 47- a 't' character, indicating a 'ptrdiff_t' argument. 48- a ' ' character, indicating a space should be emitted before 49 positive numbers. 50- a '+' character, indicating sign should emitted before any number. 51 52Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be 53removed eventually. 54 55The format character is described in the following table: 56 57 ===== ================= ====================== 58 Ltr Argument Type Format 59 ===== ================= ====================== 60 d int base 10 (decimal) 61 i int base 10 (decimal) 62 o int base 8 (octal) 63 u unsigned base 10 (decimal) 64 x unsigned base 16 (hex) 65 X unsigned long base 16 (hex) 66 D long base 10 (decimal) 67 O unsigned long base 8 (octal) 68 U unsigned long base 10 (decimal) 69 e double [-]d.ddde+-dd 70 E double [-]d.dddE+-dd 71 f double [-]ddd.ddd 72 F double [-]ddd.ddd 73 g double as 'e' or 'f' 74 G double as 'E' or 'F' 75 a double [-]0xh.hhhp[+-]d 76 A double [-]0Xh.hhhp[+-]d 77 c unsigned char a character 78 C wint_t a character 79 s char \* a UTF-8 string 80 S wchar_t \* a unicode/WCS string 81 p void \* '%#lx' 82 ===== ================= ====================== 83 84The 'h' and 'l' modifiers affect the size and treatment of the 85argument: 86 87 ===== ============= ==================== 88 Mod d, i o, u, x, X 89 ===== ============= ==================== 90 hh signed char unsigned char 91 h short unsigned short 92 l long unsigned long 93 ll long long unsigned long long 94 j intmax_t uintmax_t 95 t ptrdiff_t ptrdiff_t 96 z size_t size_t 97 q quad_t u_quad_t 98 ===== ============= ==================== 99 100.. index:: UTF-8 101.. index:: Locale 102 103.. _utf-8: 104 105UTF-8 and Locale Strings 106~~~~~~~~~~~~~~~~~~~~~~~~ 107 108For strings, the 'h' and 'l' modifiers affect the interpretation of 109the bytes pointed to argument. The default '%s' string is a 'char \*' 110pointer to a string encoded as UTF-8. Since UTF-8 is compatible with 111ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a 112'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit 113Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte 114string encoded with the current locale, as given by the LC_CTYPE, 115LANG, or LC_ALL environment varibles. The first of this list of 116variables is used and if none of the variables are set, the locale 117defaults to "UTF-8". 118 119libxo will convert these arguments as needed to either UTF-8 (for XML, 120JSON, and HTML styles) or locale-based strings for display in text 121style:: 122 123 xo_emit("All strings are utf-8 content {:tag/%ls}", 124 L"except for wide strings"); 125 126 ======== ================== =============================== 127 Format Argument Type Argument Contents 128 ======== ================== =============================== 129 %s const char \* UTF-8 string 130 %S const char \* UTF-8 string (alias for '%ls') 131 %ls const wchar_t \* Wide character UNICODE string 132 %hs const char * locale-based string 133 ======== ================== =============================== 134 135.. admonition:: "Long", not "locale" 136 137 The "*l*" in "%ls" is for "*long*", following the convention of "%ld". 138 It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to 139 "%ls". 140 141For example, the following function is passed a locale-base name, a 142hat size, and a time value. The hat size is formatted in a UTF-8 143(ASCII) string, and the time value is formatted into a wchar_t 144string:: 145 146 void print_order (const char *name, int size, 147 struct tm *timep) { 148 char buf[32]; 149 const char *size_val = "unknown"; 150 151 if (size > 0) 152 snprintf(buf, sizeof(buf), "%d", size); 153 size_val = buf; 154 } 155 156 wchar_t when[32]; 157 wcsftime(when, sizeof(when), L"%d%b%y", timep); 158 159 xo_emit("The hat for {:name/%hs} is {:size/%s}.\n", 160 name, size_val); 161 xo_emit("It was ordered on {:order-time/%ls}.\n", 162 when); 163 } 164 165It is important to note that xo_emit will perform the conversion 166required to make appropriate output. Text style output uses the 167current locale (as described above), while XML, JSON, and HTML use 168UTF-8. 169 170UTF-8 and locale-encoded strings can use multiple bytes to encode one 171column of data. The traditional "precision'" (aka "max-width") value 172for "%s" printf formatting becomes overloaded since it specifies both 173the number of bytes that can be safely referenced and the maximum 174number of columns to emit. xo_emit uses the precision as the former, 175and adds a third value for specifying the maximum number of columns. 176 177In this example, the name field is printed with a minimum of 3 columns 178and a maximum of 6. Up to ten bytes of data at the location given by 179'name' are in used in filling those columns:: 180 181 xo_emit("{:name/%3.10.6s}", name); 182 183Characters Outside of Field Definitions 184~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 185 186Characters in the format string that are not part of a field 187definition are copied to the output for the TEXT style, and are 188ignored for the JSON and XML styles. For HTML, these characters are 189placed in a <div> with class "text":: 190 191 EXAMPLE: 192 xo_emit("The hat is {:size/%s}.\n", size_val); 193 TEXT: 194 The hat is extra small. 195 XML: 196 <size>extra small</size> 197 JSON: 198 "size": "extra small" 199 HTML: 200 <div class="text">The hat is </div> 201 <div class="data" data-tag="size">extra small</div> 202 <div class="text">.</div> 203 204.. index:: errno 205 206"%m" Is Supported 207~~~~~~~~~~~~~~~~~ 208 209libxo supports the '%m' directive, which formats the error message 210associated with the current value of "errno". It is the equivalent 211of "%s" with the argument strerror(errno):: 212 213 xo_emit("{:filename} cannot be opened: {:error/%m}", filename); 214 xo_emit("{:filename} cannot be opened: {:error/%s}", 215 filename, strerror(errno)); 216 217"%n" Is Not Supported 218~~~~~~~~~~~~~~~~~~~~~ 219 220libxo does not support the '%n' directive. It's a bad idea and we 221just don't do it. 222 223The Encoding Format (eformat) 224~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 225 226The "eformat" string is the format string used when encoding the field 227for JSON and XML. If not provided, it defaults to the primary format 228with any minimum width removed. If the primary is not given, both 229default to "%s". 230 231Content Strings 232~~~~~~~~~~~~~~~ 233 234For padding and labels, the content string is considered the content, 235unless a format is given. 236 237.. index:: printf-like 238 239Argument Validation 240~~~~~~~~~~~~~~~~~~~ 241 242Many compilers and tool chains support validation of printf-like 243arguments. When the format string fails to match the argument list, 244a warning is generated. This is a valuable feature and while the 245formatting strings for libxo differ considerably from printf, many of 246these checks can still provide build-time protection against bugs. 247 248libxo provide variants of functions that provide this ability, if the 249"--enable-printflike" option is passed to the "configure" script. 250These functions use the "_p" suffix, like "xo_emit_p()", 251xo_emit_hp()", etc. 252 253The following are features of libxo formatting strings that are 254incompatible with printf-like testing: 255 256- implicit formats, where "{:tag}" has an implicit "%s"; 257- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to 258 ten bytes of data can be inspected to fill a minimum of 4 columns and 259 a maximum of 6; 260- percent signs in strings, where "{:filled}%" makes a single, 261 trailing percent sign; 262- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means 263 locale-based string and "{:tag/%ls}" means a wide character string; 264- distinct encoding formats, where "{:tag/#%s/%s}" means the display 265 styles (text and HTML) will use "#%s" where other styles use "%s"; 266 267If none of these features are in use by your code, then using the "_p" 268variants might be wise: 269 270 ================== ======================== 271 Function printf-like Equivalent 272 ================== ======================== 273 xo_emit_hv xo_emit_hvp 274 xo_emit_h xo_emit_hp 275 xo_emit xo_emit_p 276 xo_emit_warn_hcv xo_emit_warn_hcvp 277 xo_emit_warn_hc xo_emit_warn_hcp 278 xo_emit_warn_c xo_emit_warn_cp 279 xo_emit_warn xo_emit_warn_p 280 xo_emit_warnx xo_emit_warnx_p 281 xo_emit_err xo_emit_err_p 282 xo_emit_errx xo_emit_errx_p 283 xo_emit_errc xo_emit_errc_p 284 ================== ======================== 285 286.. index:: performance 287.. index:: XOEF_RETAIN 288 289.. _retain: 290 291Retaining Parsed Format Information 292~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 293 294libxo can retain the parsed internal information related to the given 295format string, allowing subsequent xo_emit calls, the retained 296information is used, avoiding repetitive parsing of the format string:: 297 298 SYNTAX: 299 int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...); 300 EXAMPLE: 301 xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n", 302 some, thing, fancy); 303 304To retain parsed format information, use the XOEF_RETAIN flag to the 305xo_emit_f() function. A complete set of xo_emit_f functions exist to 306match all the xo_emit function signatures (with handles, varadic 307argument, and printf-like flags): 308 309 ================== ======================== 310 Function Flags Equivalent 311 ================== ======================== 312 xo_emit_hv xo_emit_hvf 313 xo_emit_h xo_emit_hf 314 xo_emit xo_emit_f 315 xo_emit_hvp xo_emit_hvfp 316 xo_emit_hp xo_emit_hfp 317 xo_emit_p xo_emit_fp 318 ================== ======================== 319 320The format string must be immutable across multiple calls to xo_emit_f(), 321since the library retains the string. Typically this is done by using 322static constant strings, such as string literals. If the string is not 323immutable, the XOEF_RETAIN flag must not be used. 324 325The functions xo_retain_clear() and xo_retain_clear_all() release 326internal information on either a single format string or all format 327strings, respectively. Neither is required, but the library will 328retain this information until it is cleared or the process exits:: 329 330 const char *fmt = "{:name} {:count/%d}\n"; 331 for (i = 0; i < 1000; i++) { 332 xo_open_instance("item"); 333 xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]); 334 } 335 xo_retain_clear(fmt); 336 337The retained information is kept as thread-specific data. 338 339Example 340~~~~~~~ 341 342In this example, the value for the number of items in stock is emitted:: 343 344 xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n", 345 instock); 346 347This call will generate the following output:: 348 349 TEXT: 350 In stock: 144 351 XML: 352 <in-stock>144</in-stock> 353 JSON: 354 "in-stock": 144, 355 HTML: 356 <div class="line"> 357 <div class="padding"> </div> 358 <div class="label">In stock</div> 359 <div class="decoration">:</div> 360 <div class="padding"> </div> 361 <div class="data" data-tag="in-stock">144</div> 362 </div> 363 364Clearly HTML wins the verbosity award, and this output does 365not include XOF_XPATH or XOF_INFO data, which would expand the 366penultimate line to:: 367 368 <div class="data" data-tag="in-stock" 369 data-xpath="/top/data/item/in-stock" 370 data-type="number" 371 data-help="Number of items in stock">144</div> 372