1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright (c) 2014 Joyent, Inc. 13.\" 14.Dd Sep 26, 2014 15.Dt CTF 4 16.Os 17.Sh NAME 18.Nm ctf 19.Nd Compact C Type Format 20.Sh SYNOPSIS 21.In sys/ctf.h 22.Sh DESCRIPTION 23.Nm 24is designed to be a compact representation of the C programming 25language's type information focused on serving the needs of dynamic 26tracing, debuggers, and other in-situ and post-mortem introspection 27tools. 28.Nm 29data is generally included in 30.Sy ELF 31objects and is tagged as 32.Sy SHT_PROGBITS 33to ensure that the data is accessible in a running process and in subsequent 34core dumps, if generated. 35.Lp 36The 37.Nm 38data contained in each file has information about the layout and 39sizes of C types, including intrinsic types, enumerations, structures, 40typedefs, and unions, that are used by the corresponding 41.Sy ELF 42object. The 43.Nm 44data may also include information about the types of global objects and 45the return type and arguments of functions in the symbol table. 46.Lp 47Because a 48.Nm 49file is often embedded inside a file, rather than being a standalone 50file itself, it may also be referred to as a 51.Nm 52.Sy container . 53.Lp 54On illumos systems, 55.Nm 56data is consumed by multiple programs. It can be used by the modular 57debugger, 58.Xr mdb 1 , 59as well as by 60.Xr dtrace 1M . 61Programmatic access to 62.Nm 63data can be obtained through 64.Xr libctf 3LIB . 65.Lp 66The 67.Nm 68file format is broken down into seven different sections. The first 69section is the 70.Sy preamble 71and 72.Sy header , 73which describes the version of the 74.Nm 75file, links it has to other 76.Nm 77files, and the sizes of the other sections. The next section is the 78.Sy label 79section, 80which provides a way of identifying similar groups of 81.Nm 82data across multiple files. This is followed by the 83.Sy object 84information section, which describes the type of global 85symbols. The subsequent section is the 86.Sy function 87information section, which describes the return 88types and arguments of functions. The next section is the 89.Sy type 90information section, which describes 91the format and layout of the C types themselves, and finally the last 92section is the 93.Sy string 94section, which contains the names of types, enumerations, members, and 95labels. 96.Lp 97While strictly speaking, only the 98.Sy preamble 99and 100.Sy header 101are required, to be actually useful, both the type and string 102sections are necessary. 103.Lp 104A 105.Nm 106file may contain all of the type information that it requires, or it 107may optionally refer to another 108.Nm 109file which holds the remaining types. When a 110.Nm 111file refers to another file, it is called the 112.Sy child 113and the file it refers to is called the 114.Sy parent . 115A given file may only refer to one parent. This process is called 116.Em uniquification 117because it ensures each child only has type information that is 118unique to it. A common example of this is that most kernel modules in 119illumos are uniquified against the kernel module 120.Sy genunix 121and the type information that comes from the 122.Sy IP 123module. This means that a module only has types that are unique to 124itself and the most common types in the kernel are not duplicated. 125.Sh FILE FORMAT 126This documents version 127.Em two 128of the 129.Nm 130file format. All applications and tools currently produce and operate on 131this version. 132.Lp 133The file format can be summarized with the following image, the 134following sections will cover this in more detail. 135.Bd -literal 136 137 +-------------+ 0t0 138+--------| Preamble | 139| +-------------+ 0t4 140|+-------| Header | 141|| +-------------+ 0t36 + cth_lbloff 142||+------| Labels | 143||| +-------------+ 0t36 + cth_objtoff 144|||+-----| Objects | 145|||| +-------------+ 0t36 + cth_funcoff 146||||+----| Functions | 147||||| +-------------+ 0t36 + cth_typeoff 148|||||+---| Types | 149|||||| +-------------+ 0t36 + cth_stroff 150||||||+--| Strings | 151||||||| +-------------+ 0t36 + cth_stroff + cth_strlen 152||||||| 153||||||| 154||||||| 155||||||| +-- magic - vers flags 156||||||| | | | | 157||||||| +------+------+------+------+ 158+---------| 0xcf | 0xf1 | 0x02 | 0x00 | 159 |||||| +------+------+------+------+ 160 |||||| 0 1 2 3 4 161 |||||| 162 |||||| + parent label + objects 163 |||||| | + parent name | + functions + strings 164 |||||| | | + label | | + types | + strlen 165 |||||| | | | | | | | | 166 |||||| +------+------+------+------+------+-------+-------+-------+ 167 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 | 168 ||||| +------+------+------+------+------+-------+-------+-------+ 169 ||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24 170 ||||| 171 ||||| + Label name 172 ||||| | + Label type 173 ||||| | | + Next label 174 ||||| | | | 175 ||||| +-------+------+-----+ 176 +-----------| 0x01 | 0x42 | ... | 177 |||| +-------+------+-----+ 178 |||| cth_lbloff +0x4 +0x8 cth_objtoff 179 |||| 180 |||| 181 |||| Symidx 0t15 0t43 0t44 182 |||| +------+------+------+-----+ 183 +----------| 0x00 | 0x42 | 0x36 | ... | 184 ||| +------+------+------+-----+ 185 ||| cth_objtoff +0x2 +0x4 +0x6 cth_funcoff 186 ||| 187 ||| + CTF_TYPE_INFO + CTF_TYPE_INFO 188 ||| | + Return type | 189 ||| | | + arg0 | 190 ||| +--------+------+------+-----+ 191 +---------| 0x2c10 | 0x08 | 0x0c | ... | 192 || +--------+------+------+-----+ 193 || cth_funcff +0x2 +0x4 +0x6 cth_typeoff 194 || 195 || + ctf_stype_t for type 1 196 || | integer + integer encoding 197 || | | + ctf_stype_t for type 2 198 || | | | 199 || +--------------------+-----------+-----+ 200 +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... | 201 | +--------------------+-----------+-----+ 202 | cth_typeoff +0x08 +0x0c cth_stroff 203 | 204 | +--- str 0 205 | | +--- str 1 + str 2 206 | | | | 207 | v v v 208 | +----+---+---+---+----+---+---+---+---+---+----+ 209 +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 | 210 +----+---+---+---+----+---+---+---+---+---+----+ 211 0 1 2 3 4 5 6 7 8 9 10 11 212.Ed 213.Lp 214Every 215.Nm 216file begins with a 217.Sy preamble , 218followed by a 219.Sy header . 220The 221.Sy preamble 222is defined as follows: 223.Bd -literal 224typedef struct ctf_preamble { 225 ushort_t ctp_magic; /* magic number (CTF_MAGIC) */ 226 uchar_t ctp_version; /* data format version number (CTF_VERSION) */ 227 uchar_t ctp_flags; /* flags (see below) */ 228} ctf_preamble_t; 229.Ed 230.Pp 231The 232.Sy preamble 233is four bytes long and must be four byte aligned. 234This 235.Sy preamble 236defines the version of the 237.Nm 238file which defines the format of the rest of the header. While the 239header may change in subsequent versions, the preamble will not change 240across versions, though the interpretation of its flags may change from 241version to version. The 242.Em ctp_magic 243member defines the magic number for the 244.Nm 245file format. This must always be 246.Li 0xcff1 . 247If another value is encountered, then the file should not be treated as 248a 249.Nm 250file. The 251.Em ctp_version 252member defines the version of the 253.Nm 254file. The current version is 255.Li 2 . 256It is possible to encounter an unsupported version. In that case, 257software should not try to parse the format, as it may have changed. 258Finally, the 259.Em ctp_flags 260member describes aspects of the file which modify its interpretation. 261The following flags are currently defined: 262.Bd -literal 263#define CTF_F_COMPRESS 0x01 264.Ed 265.Pp 266The flag 267.Sy CTF_F_COMPRESS 268indicates that the body of the 269.Nm 270file, all the data following the 271.Sy header , 272has been compressed through the 273.Sy zlib 274library and its 275.Sy deflate 276algorithm. If this flag is not present, then the body has not been 277compressed and no special action is needed to interpret it. All offsets 278into the data as described by 279.Sy header , 280always refer to the 281.Sy uncompressed 282data. 283.Lp 284In version two of the 285.Nm 286file format, the 287.Sy header 288denotes whether whether or not this 289.Nm 290file is the child of another 291.Nm 292file and also indicates the size of the remaining sections. The 293structure for the 294.Sy header , 295logically contains a copy of the 296.Sy preamble 297and the two have a combined size of 36 bytes. 298.Bd -literal 299typedef struct ctf_header { 300 ctf_preamble_t cth_preamble; 301 uint_t cth_parlabel; /* ref to name of parent lbl uniq'd against */ 302 uint_t cth_parname; /* ref to basename of parent */ 303 uint_t cth_lbloff; /* offset of label section */ 304 uint_t cth_objtoff; /* offset of object section */ 305 uint_t cth_funcoff; /* offset of function section */ 306 uint_t cth_typeoff; /* offset of type section */ 307 uint_t cth_stroff; /* offset of string section */ 308 uint_t cth_strlen; /* length of string section in bytes */ 309} ctf_header_t; 310.Ed 311.Pp 312After the 313.Sy preamble , 314the next two members 315.Em cth_parlablel 316and 317.Em cth_parname , 318are used to identify the parent. The value of both members are offsets 319into the 320.Sy string 321section which point to the start of a null-terminated string. For more 322information on the encoding of strings, see the subsection on 323.Sx String Identifiers . 324If the value of either is zero, then there is no entry for that 325member. If the member 326.Em cth_parlabel 327is set, then the 328.Em ctf_parname 329member must be set, otherwise it will not be possible to find the 330parent. If 331.Em ctf_parname 332is set, it is not necessary to define 333.Em cth_parlabel , 334as the parent may not have a label. For more information on labels 335and their interpretation, see 336.Sx The Label Section . 337.Lp 338The remaining members (excepting 339.Em cth_strlen ) 340describe the beginning of the corresponding sections. These offsets are 341relative to the end of the 342.Sy header . 343Therefore, something with an offset of 0 is at an offset of thirty-six 344bytes relative to the start of the 345.Nm 346file. The difference between members 347indicates the size of the section itself. Different offsets have 348different alignment requirements. The start of the 349.Em cth_objotoff 350and 351.Em cth_funcoff 352must be two byte aligned, while the sections 353.Em cth_lbloff 354and 355.Em cth_typeoff 356must be four-byte aligned. The section 357.Em cth_stroff 358has no alignment requirements. To calculate the size of a given section, 359excepting the 360.Sy string 361section, one should subtract the offset of the section from the following one. For 362example, the size of the 363.Sy types 364section can be calculated by subtracting 365.Em cth_stroff 366from 367.Em cth_typeoff . 368.Lp 369Finally, the member 370.Em cth_strlen 371describes the length of the string section itself. From it, you can also 372calculate the size of the entire 373.Nm 374file by adding together the size of the 375.Sy ctf_header_t , 376the offset of the string section in 377.Em cth_stroff , 378and the size of the string section in 379.Em cth_srlen . 380.Ss Type Identifiers 381Through the 382.Nm ctf 383data, types are referred to by identifiers. A given 384.Nm 385file supports up to 32767 (0x7fff) types. The first valid type identifier is 0x1. 386When a given 387.Nm 388file is a child, indicated by a non-zero entry for the 389.Sy header Ns 's 390.Em cth_parname , 391then the first valid type identifier is 0x8000 and the last is 0xffff. 392In this case, type identifiers 0x1 through 0x7fff are references to the 393parent. 394.Lp 395The type identifier zero is a sentinel value used to indicate that there 396is no type information available or it is an unknown type. 397.Lp 398Throughout the file format, the identifier is stored in different sized 399values; however, the minimum size to represent a given identifier is a 400.Sy uint16_t . 401Other consumers of 402.Nm 403information may use larger or opaque identifiers. 404.Ss String Identifiers 405String identifiers are always encoded as four byte unsigned integers 406which are an offset into a string table. The 407.Nm 408format supports two different string tables which have an identifier of 409zero or one. This identifier is stored in the high-order bit of the 410unsigned four byte offset. Therefore, the maximum supported offset into 411one of these tables is 0x7ffffffff. 412.Lp 413Table identifier zero, always refers to the 414.Sy string 415section in the CTF file itself. String table identifier one refers to an 416external string table which is the ELF string table for the ELF symbol 417table associated with the 418.Nm 419container. 420.Ss Type Encoding 421Every 422.Nm 423type begins with metadata encoded into a 424.Sy uint16_t . 425This encoded information tells us three different pieces of information: 426.Bl -bullet -offset indent -compact 427.It 428The kind of the type 429.It 430Whether this type is a root type or not 431.It 432The length of the variable data 433.El 434.Lp 435The 16 bits that make up the encoding are broken down such that you have 436five bits for the kind, one bit for indicating whether or not it is a 437root type, and 10 bits for the variable length. This is laid out as 438follows: 439.Bd -literal -offset indent 440+--------------------+ 441| kind | root | vlen | 442+--------------------+ 44315 11 10 9 0 444.Ed 445.Lp 446The current version of the file format defines 14 different kinds. The 447interpretation of these different kinds will be discussed in the section 448.Sx The Type Section . 449If a kind is encountered that is not listed below, then it is not a valid 450.Nm 451file. The kinds are defined as follows: 452.Bd -literal -offset indent 453#define CTF_K_UNKNOWN 0 454#define CTF_K_INTEGER 1 455#define CTF_K_FLOAT 2 456#define CTF_K_POINTER 3 457#define CTF_K_ARRAY 4 458#define CTF_K_FUNCTION 5 459#define CTF_K_STRUCT 6 460#define CTF_K_UNION 7 461#define CTF_K_ENUM 8 462#define CTF_K_FORWARD 9 463#define CTF_K_TYPEDEF 10 464#define CTF_K_VOLATILE 11 465#define CTF_K_CONST 12 466#define CTF_K_RESTRICT 13 467.Ed 468.Lp 469Programs directly reference many types; however, other types are referenced 470indirectly because they are part of some other structure. These types that are 471referenced directly and used are called 472.Sy root 473types. Other types may be used indirectly, for example, a program may reference 474a structure directly, but not one of its members which has a type. That type is 475not considered a 476.Sy root 477type. If a type is a 478.Sy root 479type, then it will have bit 10 set. 480.Lp 481The variable length section is specific to each kind and is discussed in the 482section 483.Sx The Type Section . 484.Lp 485The following macros are useful for constructing and deconstructing the encoded 486type information: 487.Bd -literal -offset indent 488 489#define CTF_MAX_VLEN 0x3ff 490#define CTF_INFO_KIND(info) (((info) & 0xf800) >> 11) 491#define CTF_INFO_ISROOT(info) (((info) & 0x0400) >> 10) 492#define CTF_INFO_VLEN(info) (((info) & CTF_MAX_VLEN)) 493 494#define CTF_TYPE_INFO(kind, isroot, vlen) \\ 495 (((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN)) 496.Ed 497.Ss The Label Section 498When consuming 499.Nm 500data, it is often useful to know whether two different 501.Nm 502containers come from the same source base and version. For example, when 503building illumos, there are many kernel modules that are built against a 504single collection of source code. A label is encoded into the 505.Nm 506files that corresponds with the particular build. This ensures that if 507files on the system were to become mixed up from multiple releases, that 508they are not used together by tools, particularly when a child needs to 509refer to a type in the parent. Because they are linked used the type 510identifiers, if the wrong parent is used then the wrong type will be 511encountered. 512.Lp 513Each label is encoded in the file format using the following eight byte 514structure: 515.Bd -literal 516typedef struct ctf_lblent { 517 uint_t ctl_label; /* ref to name of label */ 518 uint_t ctl_typeidx; /* last type associated with this label */ 519} ctf_lblent_t; 520.Ed 521.Lp 522Each label has two different components, a name and a type identifier. 523The name is encoded in the 524.Em ctl_label 525member which is in the format defined in the section 526.Sx String Identifiers . 527Generally, the names of all labels are found in the internal string 528section. 529.Lp 530The type identifier encoded in the member 531.Em ctl_typeidx 532refers to the last type identifier that a label refers to in the current 533file. Labels only refer to types in the current file, if the 534.Nm 535file is a child, then it will have the same label as its parent; 536however, its label will only refer to its types, not its parents. 537.Lp 538It is also possible, though rather uncommon, for a 539.Nm 540file to have multiple labels. Labels are placed one after another, every 541eight bytes. When multiple labels are present, types may only belong to 542a single label. 543.Ss The Object Section 544The object section provides a mapping from ELF symbols of type 545.Sy STT_OBJECT 546in the symbol table to a type identifier. Every entry in this section is 547a 548.Sy uint16_t 549which contains a type identifier as described in the section 550.Sx Type Identifiers . 551If there is no information for an object, then the type identifier 0x0 552is stored for that entry. 553.Lp 554To walk the object section, you need to have a corresponding 555.Sy symbol table 556in the ELF object that contains the 557.Nm 558data. Not every object is included in this section. Specifically, when 559walking the symbol table. An entry is skipped if it matches any of the 560following conditions: 561.Lp 562.Bl -bullet -offset indent -compact 563.It 564The symbol type is not 565.Sy STT_OBJECT 566.It 567The symbol's section index is 568.Sy SHN_UNDEF 569.It 570The symbol's name offset is zero 571.It 572The symbol's section index is 573.Sy SHN_ABS 574and the value of the symbol is zero. 575.It 576The symbol's name is 577.Li _START_ 578or 579.Li _END_ . 580These are skipped because they are used for scoping local symbols in 581ELF. 582.El 583.Lp 584The following sample code shows an example of iterating the object 585section and skipping the correct symbols: 586.Bd -literal 587#include <gelf.h> 588#include <stdio.h> 589 590/* 591 * Given the start of the object section in the CTF file, the number of symbols, 592 * and the ELF Data sections for the symbol table and the string table, this 593 * prints the type identifiers that correspond to objects. Note, a more robust 594 * implementation should ensure that they don't walk beyond the end of the CTF 595 * object section. 596 */ 597static int 598walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata, 599 long nsyms) 600{ 601 long i; 602 uintptr_t strbase = strdata->d_buf; 603 604 for (i = 1; i < nsyms; i++, objftoff++) { 605 const char *name; 606 GElf_Sym sym; 607 608 if (gelf_getsym(symdata, i, &sym) == NULL) 609 return (1); 610 611 if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT) 612 continue; 613 if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0) 614 continue; 615 if (sym.st_shndx == SHN_ABS && sym.st_value == 0) 616 continue; 617 name = (const char *)(strbase + sym.st_name); 618 if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0) 619 continue; 620 621 (void) printf("Symbol %d has type %d\n", i, *objtoff); 622 } 623 624 return (0); 625} 626.Ed 627.Ss The Function Section 628The function section of the 629.Nm 630file encodes the types of both the function's arguments and the function's 631return type. Similar to 632.Sx The Object Section , 633the function section encodes information for all symbols of type 634.Sy STT_FUNCTION , 635excepting those that fit specific criteria. Unlike with objects, because 636functions have a variable number of arguments, they start with a type encoding 637as defined in 638.Sx Type Encoding , 639which is the size of a 640.Sy uint16_t . 641For functions which have no type information available, they are encoded as 642.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) . 643Functions with arguments are encoded differently. Here, the variable length is 644turned into the number of arguments in the function. If a function is a 645.Sy varargs 646type function, then the number of arguments is increased by one. Functions with 647type information are encoded as: 648.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) . 649.Lp 650For functions that have no type information, nothing else is encoded, and the 651next function is encoded. For functions with type information, the next 652.Sy uint16_t 653is encoded with the type identifier of the return type of the function. It is 654followed by each of the type identifiers of the arguments, if any exist, in the 655order that they appear in the function. Therefore, argument 0 is the first type 656identifier and so on. When a function has a final varargs argument, that is 657encoded with the type identifier of zero. 658.Lp 659Like 660.Sx The Object Section , 661the function section is encoded in the order of the symbol table. It has 662similar, but slightly different considerations from objects. While iterating the 663symbol table, if any of the following conditions are true, then the entry is 664skipped and no corresponding entry is written: 665.Lp 666.Bl -bullet -offset indent -compact 667.It 668The symbol type is not 669.Sy STT_FUNCTION 670.It 671The symbol's section index is 672.Sy SHN_UNDEF 673.It 674The symbol's name offset is zero 675.It 676The symbol's name is 677.Li _START_ 678or 679.Li _END_ . 680These are skipped because they are used for scoping local symbols in 681ELF. 682.El 683.Ss The Type Section 684The type section is the heart of the 685.Nm 686data. It encodes all of the information about the types themselves. The base of 687the type information comes in two forms, a short form and a long form, each of 688which may be followed by a variable number of arguments. The following 689definitions describe the short and long forms: 690.Bd -literal 691#define CTF_MAX_SIZE 0xfffe /* max size of a type in bytes */ 692#define CTF_LSIZE_SENT 0xffff /* sentinel for ctt_size */ 693#define CTF_MAX_LSIZE UINT64_MAX 694 695typedef struct ctf_stype { 696 uint_t ctt_name; /* reference to name in string table */ 697 ushort_t ctt_info; /* encoded kind, variant length */ 698 union { 699 ushort_t _size; /* size of entire type in bytes */ 700 ushort_t _type; /* reference to another type */ 701 } _u; 702} ctf_stype_t; 703 704typedef struct ctf_type { 705 uint_t ctt_name; /* reference to name in string table */ 706 ushort_t ctt_info; /* encoded kind, variant length */ 707 union { 708 ushort_t _size; /* always CTF_LSIZE_SENT */ 709 ushort_t _type; /* do not use */ 710 } _u; 711 uint_t ctt_lsizehi; /* high 32 bits of type size in bytes */ 712 uint_t ctt_lsizelo; /* low 32 bits of type size in bytes */ 713} ctf_type_t; 714 715#define ctt_size _u._size /* for fundamental types that have a size */ 716#define ctt_type _u._type /* for types that reference another type */ 717.Ed 718.Pp 719Type sizes are stored in 720.Sy bytes . 721The basic small form uses a 722.Sy ushort_t 723to store the number of bytes. If the number of bytes in a structure would exceed 7240xfffe, then the alternate form, the 725.Sy ctf_type_t , 726is used instead. To indicate that the larger form is being used, the member 727.Em ctt_size 728is set to value of 729.Sy CTF_LSIZE_SENT 730(0xffff). In general, when going through the type section, consumers use the 731.Sy ctf_type_t 732structure, but pay attention to the value of the member 733.Em ctt_size 734to determine whether they should increment their scan by the size of the 735.Sy ctf_stype_t 736or 737.Sy ctf_type_t . 738Not all kinds of types use 739.Sy ctt_size . 740Those which do not, will always use the 741.Sy ctf_stype_t 742structure. The individual sections for each kind have more information. 743.Lp 744Types are written out in order. Therefore the first entry encountered has a type 745id of 0x1, or 0x8000 if a child. The member 746.Em ctt_name 747is encoded as described in the section 748.Sx String Identifiers . 749The string that it points to is the name of the type. If the identifier points 750to an empty string (one that consists solely of a null terminator) then the type 751does not have a name, this is common with anonymous structures and unions that 752only have a typedef to name them, as well as, pointers and qualifiers. 753.Lp 754The next member, the 755.Em ctt_info , 756is encoded as described in the section 757.Sx Type Encoding . 758The types kind tells us how to interpret the remaining data in the 759.Sy ctf_type_t 760and any variable length data that may exist. The rest of this section will be 761broken down into the interpretation of the various kinds. 762.Ss Encoding of Integers 763Integers, which are of type 764.Sy CTF_K_INTEGER , 765have no variable length arguments. Instead, they are followed by a four byte 766.Sy uint_t 767which describes their encoding. All integers must be encoded with a variable 768length of zero. The 769.Em ctt_size 770member describes the length of the integer in bytes. In general, integer sizes 771will be rounded up to the closest power of two. 772.Lp 773The integer encoding contains three different pieces of information: 774.Bl -bullet -offset indent -compact 775.It 776The encoding of the integer 777.It 778The offset in 779.Sy bits 780of the type 781.It 782The size in 783.Sy bits 784of the type 785.El 786.Pp 787This encoding can be expressed through the following macros: 788.Bd -literal -offset indent 789#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24) 790#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16) 791#define CTF_INT_BITS(data) (((data) & 0x0000ffff)) 792 793#define CTF_INT_DATA(encoding, offset, bits) \\ 794 (((encoding) << 24) | ((offset) << 16) | (bits)) 795.Ed 796.Pp 797The following flags are defined for the encoding at this time: 798.Bd -literal -offset indent 799#define CTF_INT_SIGNED 0x01 800#define CTF_INT_CHAR 0x02 801#define CTF_INT_BOOL 0x04 802#define CTF_INT_VARARGS 0x08 803.Ed 804.Lp 805By default, an integer is considered to be unsigned, unless it has the 806.Sy CTF_INT_SIGNED 807flag set. If the flag 808.Sy CTF_INT_CHAR 809is set, that indicates that the integer is of a type that stores character 810data, for example the intrinsic C type 811.Sy char 812would have the 813.Sy CTF_INT_CHAR 814flag set. If the flag 815.Sy CTF_INT_BOOL 816is set, that indicates that the integer represents a boolean type. For example, 817the intrinsic C type 818.Sy _Bool 819would have the 820.Sy CTF_INT_BOOL 821flag set. Finally, the flag 822.Sy CTF_INT_VARARGS 823indicates that the integer is used as part of a variable number of arguments. 824This encoding is rather uncommon. 825.Ss Encoding of Floats 826Floats, which are of type 827.Sy CTF_K_FLOAT , 828are similar to their integer counterparts. They have no variable length 829arguments and are followed by a four byte encoding which describes the kind of 830float that exists. The 831.Em ctt_size 832member is the size, in bytes, of the float. The float encoding has three 833different pieces of information inside of it: 834.Lp 835.Bl -bullet -offset indent -compact 836.It 837The specific kind of float that exists 838.It 839The offset in 840.Sy bits 841of the float 842.It 843The size in 844.Sy bits 845of the float 846.El 847.Lp 848This encoding can be expressed through the following macros: 849.Bd -literal -offset indent 850#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24) 851#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16) 852#define CTF_FP_BITS(data) (((data) & 0x0000ffff)) 853 854#define CTF_FP_DATA(encoding, offset, bits) \\ 855 (((encoding) << 24) | ((offset) << 16) | (bits)) 856.Ed 857.Lp 858Where as the encoding for integers was a series of flags, the encoding for 859floats maps to a specific kind of float. It is not a flag-based value. The kinds of floats 860correspond to both their size, and the encoding. This covers all of the basic C 861intrinsic floating point types. The following are the different kinds of floats 862represented in the encoding: 863.Bd -literal -offset indent 864#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */ 865#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */ 866#define CTF_FP_CPLX 3 /* Complex encoding */ 867#define CTF_FP_DCPLX 4 /* Double complex encoding */ 868#define CTF_FP_LDCPLX 5 /* Long double complex encoding */ 869#define CTF_FP_LDOUBLE 6 /* Long double encoding */ 870#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */ 871#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */ 872#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */ 873#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */ 874#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */ 875#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */ 876.Ed 877.Ss Encoding of Arrays 878Arrays, which are of type 879.Sy CTF_K_ARRAY , 880have no variable length arguments. They are followed by a structure which 881describes the number of elements in the array, the type identifier of the 882elements in the array, and the type identifier of the index of the array. With 883arrays, the 884.Em ctt_size 885member is set to zero. The structure that follows an array is defined as: 886.Bd -literal 887typedef struct ctf_array { 888 ushort_t cta_contents; /* reference to type of array contents */ 889 ushort_t cta_index; /* reference to type of array index */ 890 uint_t cta_nelems; /* number of elements */ 891} ctf_array_t; 892.Ed 893.Lp 894The 895.Em cta_contents 896and 897.Em cta_index 898members of the 899.Sy ctf_array_t 900are type identifiers which are encoded as per the section 901.Sx Type Identifiers . 902The member 903.Em cta_nelems 904is a simple four byte unsigned count of the number of elements. This count may 905be zero when encountering C99's flexible array members. 906.Ss Encoding of Functions 907Function types, which are of type 908.Sy CTF_K_FUNCTION , 909use the variable length list to be the number of arguments in the function. When 910the function has a final member which is a varargs, then the argument count is 911incremented by one to account for the variable argument. Here, the 912.Em ctt_type 913member is encoded with the type identifier of the return type of the function. 914Note that the 915.Em ctt_size 916member is not used here. 917.Lp 918The variable argument list contains the type identifiers for the arguments of 919the function, if any. Each one is represented by a 920.Sy uint16_t 921and encoded according to the 922.Sx Type Identifiers 923section. If the function's last argument is of type varargs, then it is also 924written out, but the type identifier is zero. This is included in the count of 925the function's arguments. 926.Ss Encoding of Structures and Unions 927Structures and Unions, which are encoded with 928.Sy CTF_K_STRUCT 929and 930.Sy CTF_K_UNION 931respectively, are very similar constructs in C. The main difference 932between them is the fact that every member of a structure follows one another, 933where as in a union, all members share the same memory. They are also very 934similar in terms of their encoding in 935.Nm . 936The variable length argument for structures and unions represents the number of 937members that they have. The value of the member 938.Em ctt_size 939is the size of the structure and union. There are two different structures which 940are used to encode members in the variable list. When the size of a structure or 941union is greater than or equal to the large member threshold, 8192, then a 942different structure is used to encode the member, all members are encoded using 943the same structure. The structure for members is as follows: 944.Bd -literal 945typedef struct ctf_member { 946 uint_t ctm_name; /* reference to name in string table */ 947 ushort_t ctm_type; /* reference to type of member */ 948 ushort_t ctm_offset; /* offset of this member in bits */ 949} ctf_member_t; 950 951typedef struct ctf_lmember { 952 uint_t ctlm_name; /* reference to name in string table */ 953 ushort_t ctlm_type; /* reference to type of member */ 954 ushort_t ctlm_pad; /* padding */ 955 uint_t ctlm_offsethi; /* high 32 bits of member offset in bits */ 956 uint_t ctlm_offsetlo; /* low 32 bits of member offset in bits */ 957} ctf_lmember_t; 958.Ed 959.Lp 960Both the 961.Em ctm_name 962and 963.Em ctlm_name 964refer to the name of the member. The name is encoded as an offset into the 965string table as described by the section 966.Sx String Identifiers . 967The members 968.Sy ctm_type 969and 970.Sy ctlm_type 971both refer to the type of the member. They are encoded as per the section 972.Sx Type Identifiers . 973.Lp 974The last piece of information that is present is the offset which describes the 975offset in memory that the member begins at. For unions, this value will always 976be zero because the start of unions in memory is always zero. For structures, 977this is the offset in 978.Sy bits 979that the member begins at. Note that a compiler may lay out a type with padding. 980This means that the difference in offset between two consecutive members may be 981larger than the size of the member. When the size of the overall structure is 982strictly less than 8192 bytes, the normal structure, 983.Sy ctf_member_t , 984is used and the offset in bits is stored in the member 985.Em ctm_offset . 986However, when the size of the structure is greater than or equal to 8192 bytes, 987then the number of bits is split into two 32-bit quantities. One member, 988.Em ctlm_offsethi , 989represents the upper 32 bits of the offset, while the other member, 990.Em ctlm_offsetlo , 991represents the lower 32 bits of the offset. These can be joined together to get 992a 64-bit sized offset in bits by shifting the member 993.Em ctlm_offsethi 994to the left by thirty two and then doing a binary or of 995.Em ctlm_offsetlo . 996.Ss Encoding of Enumerations 997Enumerations, noted by the type 998.Sy CTF_K_ENUM , 999are similar to structures. Enumerations use the variable list to note the number 1000of values that the enumeration contains, which we'll term enumerators. In C, an 1001enumeration is always equivalent to the intrinsic type 1002.Sy int , 1003thus the value of the member 1004.Em ctt_size 1005is always the size of an integer which is determined based on the current model. 1006For illumos systems, this will always be 4, as an integer is always defined to 1007be 4 bytes large in both 1008.Sy ILP32 1009and 1010.Sy LP64 , 1011regardless of the architecture. 1012.Lp 1013The enumerators encoded in an enumeration have the following structure in the 1014variable list: 1015.Bd -literal 1016typedef struct ctf_enum { 1017 uint_t cte_name; /* reference to name in string table */ 1018 int cte_value; /* value associated with this name */ 1019} ctf_enum_t; 1020.Ed 1021.Pp 1022The member 1023.Em cte_name 1024refers to the name of the enumerator's value, it is encoded according to the 1025rules in the section 1026.Sx String Identifiers . 1027The member 1028.Em cte_value 1029contains the integer value of this enumerator. 1030.Ss Encoding of Forward References 1031Forward references, types of kind 1032.Sy CTF_K_FORWARD , 1033in a 1034.Nm 1035file refer to types which may not have a definition at all, only a name. If 1036the 1037.Nm 1038file is a child, then it may be that the forward is resolved to an 1039actual type in the parent, otherwise the definition may be in another 1040.Nm 1041container or may not be known at all. The only member of the 1042.Sy ctf_type_t 1043that matters for a forward declaration is the 1044.Em ctt_name 1045which points to the name of the forward reference in the string table as 1046described earlier. There is no other information recorded for forward 1047references. 1048.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict 1049Pointers, typedefs, volatile, const, and restrict are all similar in 1050.Nm . 1051They all refer to another type. In the case of typedefs, they provide an 1052alternate name, while volatile, const, and restrict change how the type is 1053interpreted in the C programming language. This covers the 1054.Nm 1055kinds 1056.Sy CTF_K_POINTER , 1057.Sy CTF_K_TYPEDEF , 1058.Sy CTF_K_VOLATILE , 1059.Sy CTF_K_RESTRICT , 1060and 1061.Sy CTF_K_CONST . 1062.Lp 1063These types have no variable list entries and use the member 1064.Em ctt_type 1065to refer to the base type that they modify. 1066.Ss Encoding of Unknown Types 1067Types with the kind 1068.Sy CTF_K_UNKNOWN 1069are used to indicate gaps in the type identifier space. These entries consume an 1070identifier, but do not define anything. Nothing should refer to these gap 1071identifiers. 1072.Ss Dependencies Between Types 1073C types can be imagined as a directed, cyclic, graph. Structures and unions may 1074refer to each other in a way that creates a cyclic dependency. In cases such as 1075these, the entire type section must be read in and processed. Consumers must 1076not assume that every type can be laid out in dependency order; they 1077cannot. 1078.Ss The String Section 1079The last section of the 1080.Nm 1081file is the 1082.Sy string 1083section. This section encodes all of the strings that appear throughout 1084the other sections. It is laid out as a series of characters followed by 1085a null terminator. Generally, all names are written out in ASCII, as 1086most C compilers do not allow and characters to appear in identifiers 1087outside of a subset of ASCII. However, any extended characters sets 1088should be written out as a series of UTF-8 bytes. 1089.Lp 1090The first entry in the section, at offset zero, is a single null 1091terminator to reference the empty string. Following that, each C string 1092should be written out, including the null terminator. Offsets that refer 1093to something in this section should refer to the first byte which begins 1094a string. Beyond the first byte in the section being the null 1095terminator, the order of strings is unimportant. 1096.Sh Data Encoding and ELF Considerations 1097.Nm 1098data is generally included in ELF objects which specify information to 1099identify the architecture and endianness of the file. A 1100.Nm 1101container inside such an object must match the endianness of the ELF 1102object. Aside from the question of the endian encoding of data, there 1103should be no other differences between architectures. While many of the 1104types in this document refer to non-fixed size C integral types, they 1105are equivalent in the models 1106.Sy ILP32 1107and 1108.Sy LP64 . 1109If any other model is being used with 1110.Nm 1111data that has different sizes, then it must not use the model's sizes for 1112those integral types and instead use the fixed size equivalents based on an 1113.Sy ILP32 1114environment. 1115.Lp 1116When placing a 1117.Nm 1118container inside of an ELF object, there are certain conventions that are 1119expected for the purposes of tooling being able to find the 1120.Nm 1121data. In particular, a given ELF object should only contain a single 1122.Nm 1123section. Multiple containers should be merged together into a single 1124one. 1125.Lp 1126The 1127.Nm 1128file should be included in its own ELF section. The section's name 1129must be 1130.Ql .SUNW_ctf . 1131The type of the section must be 1132.Sy SHT_PROGBITS . 1133The section should have a link set to the symbol table and its address 1134alignment must be 4. 1135.Sh SEE ALSO 1136.Xr mdb 1 , 1137.Xr dtrace 1M , 1138.Xr libelf 3LIB , 1139.Xr gelf 3ELF , 1140.Xr a.out 4 1141