1 2.. index:: Field Formatting 3 4Field Formatting 5---------------- 6 7The field format is similar to the format string for printf(3). Its 8use varies based on the role of the field, but generally is used to 9format the field's contents. 10 11If the format string is not provided for a value field, it defaults to 12"%s". 13 14Note a field definition can contain zero or more printf-style 15'directives', which are sequences that start with a '%' and end with 16one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive 17is matched by one of more arguments to the xo_emit function. 18 19The format string has the form:: 20 21 '%' format-modifier * format-character 22 23The format-modifier can be: 24 25- a '#' character, indicating the output value should be prefixed 26 with '0x', typically to indicate a base 16 (hex) value. 27- a minus sign ('-'), indicating the output value should be padded on 28 the right instead of the left. 29- a leading zero ('0') indicating the output value should be padded on the 30 left with zeroes instead of spaces (' '). 31- one or more digits ('0' - '9') indicating the minimum width of the 32 argument. If the width in columns of the output value is less than 33 the minimum width, the value will be padded to reach the minimum. 34- a period followed by one or more digits indicating the maximum 35 number of bytes which will be examined for a string argument, or the maximum 36 width for a non-string argument. When handling ASCII strings this 37 functions as the field width but for multi-byte characters, a single 38 character may be composed of multiple bytes. 39 xo_emit will never dereference memory beyond the given number of bytes. 40- a second period followed by one or more digits indicating the maximum 41 width for a string argument. This modifier cannot be given for non-string 42 arguments. 43- one or more 'h' characters, indicating shorter input data. 44- one or more 'l' characters, indicating longer input data. 45- a 'z' character, indicating a 'size_t' argument. 46- a 't' character, indicating a 'ptrdiff_t' argument. 47- a ' ' character, indicating a space should be emitted before 48 positive numbers. 49- a '+' character, indicating sign should emitted before any number. 50 51Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be 52removed eventually. 53 54The format character is described in the following table: 55 56 ===== ================= ====================== 57 Ltr Argument Type Format 58 ===== ================= ====================== 59 d int base 10 (decimal) 60 i int base 10 (decimal) 61 o int base 8 (octal) 62 u unsigned base 10 (decimal) 63 x unsigned base 16 (hex) 64 X unsigned long base 16 (hex) 65 D long base 10 (decimal) 66 O unsigned long base 8 (octal) 67 U unsigned long base 10 (decimal) 68 e double [-]d.ddde+-dd 69 E double [-]d.dddE+-dd 70 f double [-]ddd.ddd 71 F double [-]ddd.ddd 72 g double as 'e' or 'f' 73 G double as 'E' or 'F' 74 a double [-]0xh.hhhp[+-]d 75 A double [-]0Xh.hhhp[+-]d 76 c unsigned char a character 77 C wint_t a character 78 s char \* a UTF-8 string 79 S wchar_t \* a unicode/WCS string 80 p void \* '%#lx' 81 ===== ================= ====================== 82 83The 'h' and 'l' modifiers affect the size and treatment of the 84argument: 85 86 ===== ============= ==================== 87 Mod d, i o, u, x, X 88 ===== ============= ==================== 89 hh signed char unsigned char 90 h short unsigned short 91 l long unsigned long 92 ll long long unsigned long long 93 j intmax_t uintmax_t 94 t ptrdiff_t ptrdiff_t 95 z size_t size_t 96 q quad_t u_quad_t 97 ===== ============= ==================== 98 99.. index:: UTF-8 100.. index:: Locale 101 102.. _utf-8: 103 104UTF-8 and Locale Strings 105~~~~~~~~~~~~~~~~~~~~~~~~ 106 107For strings, the 'h' and 'l' modifiers affect the interpretation of 108the bytes pointed to argument. The default '%s' string is a 'char \*' 109pointer to a string encoded as UTF-8. Since UTF-8 is compatible with 110ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a 111'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit 112Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte 113string encoded with the current locale, as given by the LC_CTYPE, 114LANG, or LC_ALL environment varibles. The first of this list of 115variables is used and if none of the variables are set, the locale 116defaults to "UTF-8". 117 118libxo will convert these arguments as needed to either UTF-8 (for XML, 119JSON, and HTML styles) or locale-based strings for display in text 120style:: 121 122 xo_emit("All strings are utf-8 content {:tag/%ls}", 123 L"except for wide strings"); 124 125 ======== ================== =============================== 126 Format Argument Type Argument Contents 127 ======== ================== =============================== 128 %s const char \* UTF-8 string 129 %S const char \* UTF-8 string (alias for '%ls') 130 %ls const wchar_t \* Wide character UNICODE string 131 %hs const char * locale-based string 132 ======== ================== =============================== 133 134.. admonition:: "Long", not "locale" 135 136 The "*l*" in "%ls" is for "*long*", following the convention of "%ld". 137 It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to 138 "%ls". 139 140For example, the following function is passed a locale-base name, a 141hat size, and a time value. The hat size is formatted in a UTF-8 142(ASCII) string, and the time value is formatted into a wchar_t 143string:: 144 145 void print_order (const char *name, int size, 146 struct tm *timep) { 147 char buf[32]; 148 const char *size_val = "unknown"; 149 150 if (size > 0) 151 snprintf(buf, sizeof(buf), "%d", size); 152 size_val = buf; 153 } 154 155 wchar_t when[32]; 156 wcsftime(when, sizeof(when), L"%d%b%y", timep); 157 158 xo_emit("The hat for {:name/%hs} is {:size/%s}.\n", 159 name, size_val); 160 xo_emit("It was ordered on {:order-time/%ls}.\n", 161 when); 162 } 163 164It is important to note that xo_emit will perform the conversion 165required to make appropriate output. Text style output uses the 166current locale (as described above), while XML, JSON, and HTML use 167UTF-8. 168 169UTF-8 and locale-encoded strings can use multiple bytes to encode one 170column of data. The traditional "precision'" (aka "max-width") value 171for "%s" printf formatting becomes overloaded since it specifies both 172the number of bytes that can be safely referenced and the maximum 173number of columns to emit. xo_emit uses the precision as the former, 174and adds a third value for specifying the maximum number of columns. 175 176In this example, the name field is printed with a minimum of 3 columns 177and a maximum of 6. Up to ten bytes of data at the location given by 178'name' are in used in filling those columns:: 179 180 xo_emit("{:name/%3.10.6s}", name); 181 182Characters Outside of Field Definitions 183~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 184 185Characters in the format string that are not part of a field 186definition are copied to the output for the TEXT style, and are 187ignored for the JSON and XML styles. For HTML, these characters are 188placed in a <div> with class "text":: 189 190 EXAMPLE: 191 xo_emit("The hat is {:size/%s}.\n", size_val); 192 TEXT: 193 The hat is extra small. 194 XML: 195 <size>extra small</size> 196 JSON: 197 "size": "extra small" 198 HTML: 199 <div class="text">The hat is </div> 200 <div class="data" data-tag="size">extra small</div> 201 <div class="text">.</div> 202 203.. index:: errno 204 205"%m" Is Supported 206~~~~~~~~~~~~~~~~~ 207 208libxo supports the '%m' directive, which formats the error message 209associated with the current value of "errno". It is the equivalent 210of "%s" with the argument strerror(errno):: 211 212 xo_emit("{:filename} cannot be opened: {:error/%m}", filename); 213 xo_emit("{:filename} cannot be opened: {:error/%s}", 214 filename, strerror(errno)); 215 216"%n" Is Not Supported 217~~~~~~~~~~~~~~~~~~~~~ 218 219libxo does not support the '%n' directive. It's a bad idea and we 220just don't do it. 221 222The Encoding Format (eformat) 223~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 224 225The "eformat" string is the format string used when encoding the field 226for JSON and XML. If not provided, it defaults to the primary format 227with any minimum width removed. If the primary is not given, both 228default to "%s". 229 230Content Strings 231~~~~~~~~~~~~~~~ 232 233For padding and labels, the content string is considered the content, 234unless a format is given. 235 236.. index:: printf-like 237 238Argument Validation 239~~~~~~~~~~~~~~~~~~~ 240 241Many compilers and tool chains support validation of printf-like 242arguments. When the format string fails to match the argument list, 243a warning is generated. This is a valuable feature and while the 244formatting strings for libxo differ considerably from printf, many of 245these checks can still provide build-time protection against bugs. 246 247libxo provide variants of functions that provide this ability, if the 248"--enable-printflike" option is passed to the "configure" script. 249These functions use the "_p" suffix, like "xo_emit_p()", 250xo_emit_hp()", etc. 251 252The following are features of libxo formatting strings that are 253incompatible with printf-like testing: 254 255- implicit formats, where "{:tag}" has an implicit "%s"; 256- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to 257 ten bytes of data can be inspected to fill a minimum of 4 columns and 258 a maximum of 6; 259- percent signs in strings, where "{:filled}%" makes a single, 260 trailing percent sign; 261- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means 262 locale-based string and "{:tag/%ls}" means a wide character string; 263- distinct encoding formats, where "{:tag/#%s/%s}" means the display 264 styles (text and HTML) will use "#%s" where other styles use "%s"; 265 266If none of these features are in use by your code, then using the "_p" 267variants might be wise: 268 269 ================== ======================== 270 Function printf-like Equivalent 271 ================== ======================== 272 xo_emit_hv xo_emit_hvp 273 xo_emit_h xo_emit_hp 274 xo_emit xo_emit_p 275 xo_emit_warn_hcv xo_emit_warn_hcvp 276 xo_emit_warn_hc xo_emit_warn_hcp 277 xo_emit_warn_c xo_emit_warn_cp 278 xo_emit_warn xo_emit_warn_p 279 xo_emit_warnx xo_emit_warnx_p 280 xo_emit_err xo_emit_err_p 281 xo_emit_errx xo_emit_errx_p 282 xo_emit_errc xo_emit_errc_p 283 ================== ======================== 284 285.. index:: performance 286.. index:: XOEF_RETAIN 287 288.. _retain: 289 290Retaining Parsed Format Information 291~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 292 293libxo can retain the parsed internal information related to the given 294format string, allowing subsequent xo_emit calls, the retained 295information is used, avoiding repetitive parsing of the format string:: 296 297 SYNTAX: 298 int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...); 299 EXAMPLE: 300 xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n", 301 some, thing, fancy); 302 303To retain parsed format information, use the XOEF_RETAIN flag to the 304xo_emit_f() function. A complete set of xo_emit_f functions exist to 305match all the xo_emit function signatures (with handles, varadic 306argument, and printf-like flags): 307 308 ================== ======================== 309 Function Flags Equivalent 310 ================== ======================== 311 xo_emit_hv xo_emit_hvf 312 xo_emit_h xo_emit_hf 313 xo_emit xo_emit_f 314 xo_emit_hvp xo_emit_hvfp 315 xo_emit_hp xo_emit_hfp 316 xo_emit_p xo_emit_fp 317 ================== ======================== 318 319The format string must be immutable across multiple calls to xo_emit_f(), 320since the library retains the string. Typically this is done by using 321static constant strings, such as string literals. If the string is not 322immutable, the XOEF_RETAIN flag must not be used. 323 324The functions xo_retain_clear() and xo_retain_clear_all() release 325internal information on either a single format string or all format 326strings, respectively. Neither is required, but the library will 327retain this information until it is cleared or the process exits:: 328 329 const char *fmt = "{:name} {:count/%d}\n"; 330 for (i = 0; i < 1000; i++) { 331 xo_open_instance("item"); 332 xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]); 333 } 334 xo_retain_clear(fmt); 335 336The retained information is kept as thread-specific data. 337 338Example 339~~~~~~~ 340 341In this example, the value for the number of items in stock is emitted:: 342 343 xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n", 344 instock); 345 346This call will generate the following output:: 347 348 TEXT: 349 In stock: 144 350 XML: 351 <in-stock>144</in-stock> 352 JSON: 353 "in-stock": 144, 354 HTML: 355 <div class="line"> 356 <div class="padding"> </div> 357 <div class="label">In stock</div> 358 <div class="decoration">:</div> 359 <div class="padding"> </div> 360 <div class="data" data-tag="in-stock">144</div> 361 </div> 362 363Clearly HTML wins the verbosity award, and this output does 364not include XOF_XPATH or XOF_INFO data, which would expand the 365penultimate line to:: 366 367 <div class="data" data-tag="in-stock" 368 data-xpath="/top/data/item/in-stock" 369 data-type="number" 370 data-help="Number of items in stock">144</div> 371