1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright (c) 2014 Joyent, Inc. 13.\" 14.Dd Sep 26, 2014 15.Dt CTF 5 16.Os 17.Sh NAME 18.Nm ctf 19.Nd Compact C Type Format 20.Sh SYNOPSIS 21.In sys/ctf.h 22.Sh DESCRIPTION 23.Nm 24is designed to be a compact representation of the C programming 25language's type information focused on serving the needs of dynamic 26tracing, debuggers, and other in-situ and post-mortem introspection 27tools. 28.Nm 29data is generally included in 30.Sy ELF 31objects and is tagged as 32.Sy SHT_PROGBITS 33to ensure that the data is accessible in a running process and in subsequent 34core dumps, if generated. 35.Lp 36The 37.Nm 38data contained in each file has information about the layout and 39sizes of C types, including intrinsic types, enumerations, structures, 40typedefs, and unions, that are used by the corresponding 41.Sy ELF 42object. 43The 44.Nm 45data may also include information about the types of global objects and 46the return type and arguments of functions in the symbol table. 47.Lp 48Because a 49.Nm 50file is often embedded inside a file, rather than being a standalone 51file itself, it may also be referred to as a 52.Nm 53.Sy container . 54.Lp 55On 56.Fx 57systems, 58.Nm 59data is consumed by 60.Xr dtrace 1 . 61Programmatic access to 62.Nm 63data can be obtained through libctf. 64.Lp 65The 66.Nm 67file format is broken down into seven different sections. 68The first section is the 69.Sy preamble 70and 71.Sy header , 72which describes the version of the 73.Nm 74file, the links it has to other 75.Nm 76files, and the sizes of the other sections. 77The next section is the 78.Sy label 79section, 80which provides a way of identifying similar groups of 81.Nm 82data across multiple files. 83This is followed by the 84.Sy object 85information section, which describes the types of global 86symbols. 87The subsequent section is the 88.Sy function 89information section, which describes the return 90types and arguments of functions. 91The next section is the 92.Sy type 93information section, which describes 94the format and layout of the C types themselves, and finally the last 95section is the 96.Sy string 97section, which contains the names of types, enumerations, members, and 98labels. 99.Lp 100While strictly speaking, only the 101.Sy preamble 102and 103.Sy header 104are required, to be actually useful, both the type and string 105sections are necessary. 106.Lp 107A 108.Nm 109file may contain all of the type information that it requires, or it 110may optionally refer to another 111.Nm 112file which holds the remaining types. 113When a 114.Nm 115file refers to another file, it is called the 116.Sy child 117and the file it refers to is called the 118.Sy parent . 119A given file may only refer to one parent. 120This process is called 121.Em uniquification 122because it ensures each child only has type information that is 123unique to it. 124A common example of this is that most kernel modules in illumos are uniquified 125against the kernel module 126.Sy genunix 127and the type information that comes from the 128.Sy IP 129module. 130This means that a module only has types that are unique to itself and the most 131common types in the kernel are not duplicated. 132.Sh FILE FORMAT 133This documents version 134.Em two 135of the 136.Nm 137file format. 138All applications and tools on 139.Fx 140currently produce and operate on this version. 141.Lp 142The file format can be summarized with the following image, the 143following sections will cover this in more detail. 144.Bd -literal 145 146 +-------------+ 0t0 147+--------| Preamble | 148| +-------------+ 0t4 149|+-------| Header | 150|| +-------------+ 0t36 + cth_lbloff 151||+------| Labels | 152||| +-------------+ 0t36 + cth_objtoff 153|||+-----| Objects | 154|||| +-------------+ 0t36 + cth_funcoff 155||||+----| Functions | 156||||| +-------------+ 0t36 + cth_typeoff 157|||||+---| Types | 158|||||| +-------------+ 0t36 + cth_stroff 159||||||+--| Strings | 160||||||| +-------------+ 0t36 + cth_stroff + cth_strlen 161||||||| 162||||||| 163||||||| 164||||||| +-- magic - vers flags 165||||||| | | | | 166||||||| +------+------+------+------+ 167+---------| 0xcf | 0xf1 | 0x02 | 0x00 | 168 |||||| +------+------+------+------+ 169 |||||| 0 1 2 3 4 170 |||||| 171 |||||| + parent label + objects 172 |||||| | + parent name | + functions + strings 173 |||||| | | + label | | + types | + strlen 174 |||||| | | | | | | | | 175 |||||| +------+------+------+------+------+-------+-------+-------+ 176 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 | 177 ||||| +------+------+------+------+------+-------+-------+-------+ 178 ||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24 179 ||||| 180 ||||| + Label name 181 ||||| | + Label type 182 ||||| | | + Next label 183 ||||| | | | 184 ||||| +-------+------+-----+ 185 +-----------| 0x01 | 0x42 | ... | 186 |||| +-------+------+-----+ 187 |||| cth_lbloff +0x4 +0x8 cth_objtoff 188 |||| 189 |||| 190 |||| Symidx 0t15 0t43 0t44 191 |||| +------+------+------+-----+ 192 +----------| 0x00 | 0x42 | 0x36 | ... | 193 ||| +------+------+------+-----+ 194 ||| cth_objtoff +0x2 +0x4 +0x6 cth_funcoff 195 ||| 196 ||| + CTF_TYPE_INFO + CTF_TYPE_INFO 197 ||| | + Return type | 198 ||| | | + arg0 | 199 ||| +--------+------+------+-----+ 200 +---------| 0x2c10 | 0x08 | 0x0c | ... | 201 || +--------+------+------+-----+ 202 || cth_funcff +0x2 +0x4 +0x6 cth_typeoff 203 || 204 || + ctf_stype_t for type 1 205 || | integer + integer encoding 206 || | | + ctf_stype_t for type 2 207 || | | | 208 || +--------------------+-----------+-----+ 209 +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... | 210 | +--------------------+-----------+-----+ 211 | cth_typeoff +0x08 +0x0c cth_stroff 212 | 213 | +--- str 0 214 | | +--- str 1 + str 2 215 | | | | 216 | v v v 217 | +----+---+---+---+----+---+---+---+---+---+----+ 218 +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 | 219 +----+---+---+---+----+---+---+---+---+---+----+ 220 0 1 2 3 4 5 6 7 8 9 10 11 221.Ed 222.Lp 223Every 224.Nm 225file begins with a 226.Sy preamble , 227followed by a 228.Sy header . 229The 230.Sy preamble 231is defined as follows: 232.Bd -literal 233typedef struct ctf_preamble { 234 ushort_t ctp_magic; /* magic number (CTF_MAGIC) */ 235 uchar_t ctp_version; /* data format version number (CTF_VERSION) */ 236 uchar_t ctp_flags; /* flags (see below) */ 237} ctf_preamble_t; 238.Ed 239.Pp 240The 241.Sy preamble 242is four bytes long and must be four byte aligned. 243This 244.Sy preamble 245defines the version of the 246.Nm 247file which defines the format of the rest of the header. 248While the header may change in subsequent versions, the preamble will not change 249across versions, though the interpretation of its flags may change from 250version to version. 251The 252.Em ctp_magic 253member defines the magic number for the 254.Nm 255file format. 256This must always be 257.Li 0xcff1 . 258If another value is encountered, then the file should not be treated as 259a 260.Nm 261file. 262The 263.Em ctp_version 264member defines the version of the 265.Nm 266file. 267The current version is 268.Li 2 . 269It is possible to encounter an unsupported version. 270In that case, software should not try to parse the format, as it may have 271changed. 272Finally, the 273.Em ctp_flags 274member describes aspects of the file which modify its interpretation. 275The following flags are currently defined: 276.Bd -literal 277#define CTF_F_COMPRESS 0x01 278.Ed 279.Pp 280The flag 281.Sy CTF_F_COMPRESS 282indicates that the body of the 283.Nm 284file, all the data following the 285.Sy header , 286has been compressed through the 287.Sy zlib 288library and its 289.Sy deflate 290algorithm. 291If this flag is not present, then the body has not been compressed and no 292special action is needed to interpret it. 293All offsets into the data as described by 294.Sy header , 295always refer to the 296.Sy uncompressed 297data. 298.Lp 299In version two of the 300.Nm 301file format, the 302.Sy header 303denotes whether or not this 304.Nm 305file is the child of another 306.Nm 307file and also indicates the size of the remaining sections. 308The structure for the 309.Sy header 310logically contains a copy of the 311.Sy preamble 312and the two have a combined size of 36 bytes. 313.Bd -literal 314typedef struct ctf_header { 315 ctf_preamble_t cth_preamble; 316 uint_t cth_parlabel; /* ref to name of parent lbl uniq'd against */ 317 uint_t cth_parname; /* ref to basename of parent */ 318 uint_t cth_lbloff; /* offset of label section */ 319 uint_t cth_objtoff; /* offset of object section */ 320 uint_t cth_funcoff; /* offset of function section */ 321 uint_t cth_typeoff; /* offset of type section */ 322 uint_t cth_stroff; /* offset of string section */ 323 uint_t cth_strlen; /* length of string section in bytes */ 324} ctf_header_t; 325.Ed 326.Pp 327After the 328.Sy preamble , 329the next two members 330.Em cth_parlablel 331and 332.Em cth_parname , 333are used to identify the parent. 334The value of both members are offsets into the 335.Sy string 336section which point to the start of a null-terminated string. 337For more information on the encoding of strings, see the subsection on 338.Sx String Identifiers . 339If the value of either is zero, then there is no entry for that 340member. 341If the member 342.Em cth_parlabel 343is set, then the 344.Em ctf_parname 345member must be set, otherwise it will not be possible to find the 346parent. 347If 348.Em ctf_parname 349is set, it is not necessary to define 350.Em cth_parlabel , 351as the parent may not have a label. 352For more information on labels and their interpretation, see 353.Sx The Label Section . 354.Lp 355The remaining members (excepting 356.Em cth_strlen ) 357describe the beginning of the corresponding sections. 358These offsets are relative to the end of the 359.Sy header . 360Therefore, something with an offset of 0 is at an offset of thirty-six 361bytes relative to the start of the 362.Nm 363file. 364The difference between members indicates the size of the section itself. 365Different offsets have different alignment requirements. 366The start of the 367.Em cth_objotoff 368and 369.Em cth_funcoff 370must be two byte aligned, while the sections 371.Em cth_lbloff 372and 373.Em cth_typeoff 374must be four-byte aligned. 375The section 376.Em cth_stroff 377has no alignment requirements. 378To calculate the size of a given section, excepting the 379.Sy string 380section, one should subtract the offset of the section from the following one. 381For example, the size of the 382.Sy types 383section can be calculated by subtracting 384.Em cth_stroff 385from 386.Em cth_typeoff . 387.Lp 388Finally, the member 389.Em cth_strlen 390describes the length of the string section itself. 391From it, you can also calculate the size of the entire 392.Nm 393file by adding together the size of the 394.Sy ctf_header_t , 395the offset of the string section in 396.Em cth_stroff , 397and the size of the string section in 398.Em cth_srlen . 399.Ss Type Identifiers 400Through the 401.Nm ctf 402data, types are referred to by identifiers. 403A given 404.Nm 405file supports up to 32767 (0x7fff) types. 406The first valid type identifier is 0x1. 407When a given 408.Nm 409file is a child, indicated by a non-zero entry for the 410.Sy header Ns 's 411.Em cth_parname , 412then the first valid type identifier is 0x8000 and the last is 0xffff. 413In this case, type identifiers 0x1 through 0x7fff are references to the 414parent. 415.Lp 416The type identifier zero is a sentinel value used to indicate that there 417is no type information available or it is an unknown type. 418.Lp 419Throughout the file format, the identifier is stored in different sized 420values; however, the minimum size to represent a given identifier is a 421.Sy uint16_t . 422Other consumers of 423.Nm 424information may use larger or opaque identifiers. 425.Ss String Identifiers 426String identifiers are always encoded as four byte unsigned integers 427which are an offset into a string table. 428The 429.Nm 430format supports two different string tables which have an identifier of 431zero or one. 432This identifier is stored in the high-order bit of the unsigned four byte 433offset. 434Therefore, the maximum supported offset into one of these tables is 0x7ffffffff. 435.Lp 436Table identifier zero, always refers to the 437.Sy string 438section in the CTF file itself. 439String table identifier one refers to an external string table which is the ELF 440string table for the ELF symbol table associated with the 441.Nm 442container. 443.Ss Type Encoding 444Every 445.Nm 446type begins with metadata encoded into a 447.Sy uint16_t . 448This encoded information tells us three different pieces of information: 449.Bl -bullet -offset indent -compact 450.It 451The kind of the type 452.It 453Whether this type is a root type or not 454.It 455The length of the variable data 456.El 457.Lp 458The 16 bits that make up the encoding are broken down such that you have 459five bits for the kind, one bit for indicating whether or not it is a 460root type, and 10 bits for the variable length. 461This is laid out as follows: 462.Bd -literal -offset indent 463+--------------------+ 464| kind | root | vlen | 465+--------------------+ 46615 11 10 9 0 467.Ed 468.Lp 469The current version of the file format defines 14 different kinds. 470The interpretation of these different kinds will be discussed in the section 471.Sx The Type Section . 472If a kind is encountered that is not listed below, then it is not a valid 473.Nm 474file. 475The kinds are defined as follows: 476.Bd -literal -offset indent 477#define CTF_K_UNKNOWN 0 478#define CTF_K_INTEGER 1 479#define CTF_K_FLOAT 2 480#define CTF_K_POINTER 3 481#define CTF_K_ARRAY 4 482#define CTF_K_FUNCTION 5 483#define CTF_K_STRUCT 6 484#define CTF_K_UNION 7 485#define CTF_K_ENUM 8 486#define CTF_K_FORWARD 9 487#define CTF_K_TYPEDEF 10 488#define CTF_K_VOLATILE 11 489#define CTF_K_CONST 12 490#define CTF_K_RESTRICT 13 491.Ed 492.Lp 493Programs directly reference many types; however, other types are referenced 494indirectly because they are part of some other structure. 495These types that are referenced directly and used are called 496.Sy root 497types. 498Other types may be used indirectly, for example, a program may reference 499a structure directly, but not one of its members which has a type. 500That type is not considered a 501.Sy root 502type. 503If a type is a 504.Sy root 505type, then it will have bit 10 set. 506.Lp 507The variable length section is specific to each kind and is discussed in the 508section 509.Sx The Type Section . 510.Lp 511The following macros are useful for constructing and deconstructing the encoded 512type information: 513.Bd -literal -offset indent 514 515#define CTF_MAX_VLEN 0x3ff 516#define CTF_INFO_KIND(info) (((info) & 0xf800) >> 11) 517#define CTF_INFO_ISROOT(info) (((info) & 0x0400) >> 10) 518#define CTF_INFO_VLEN(info) (((info) & CTF_MAX_VLEN)) 519 520#define CTF_TYPE_INFO(kind, isroot, vlen) \\ 521 (((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN)) 522.Ed 523.Ss The Label Section 524When consuming 525.Nm 526data, it is often useful to know whether two different 527.Nm 528containers come from the same source base and version. 529For example, when building illumos, there are many kernel modules that are built 530against a single collection of source code. 531A label is encoded into the 532.Nm 533files that corresponds with the particular build. 534This ensures that if files on the system were to become mixed up from multiple 535releases, that they are not used together by tools, particularly when a child 536needs to refer to a type in the parent. 537Because they are linked using the type identifiers, if the wrong parent is used 538then the wrong type will be encountered. 539.Lp 540Each label is encoded in the file format using the following eight byte 541structure: 542.Bd -literal 543typedef struct ctf_lblent { 544 uint_t ctl_label; /* ref to name of label */ 545 uint_t ctl_typeidx; /* last type associated with this label */ 546} ctf_lblent_t; 547.Ed 548.Lp 549Each label has two different components, a name and a type identifier. 550The name is encoded in the 551.Em ctl_label 552member which is in the format defined in the section 553.Sx String Identifiers . 554Generally, the names of all labels are found in the internal string 555section. 556.Lp 557The type identifier encoded in the member 558.Em ctl_typeidx 559refers to the last type identifier that a label refers to in the current 560file. 561Labels only refer to types in the current file, if the 562.Nm 563file is a child, then it will have the same label as its parent; 564however, its label will only refer to its types, not its parent's. 565.Lp 566It is also possible, though rather uncommon, for a 567.Nm 568file to have multiple labels. 569Labels are placed one after another, every eight bytes. 570When multiple labels are present, types may only belong to a single label. 571.Ss The Object Section 572The object section provides a mapping from ELF symbols of type 573.Sy STT_OBJECT 574in the symbol table to a type identifier. 575Every entry in this section is a 576.Sy uint16_t 577which contains a type identifier as described in the section 578.Sx Type Identifiers . 579If there is no information for an object, then the type identifier 0x0 580is stored for that entry. 581.Lp 582To walk the object section, you need to have a corresponding 583.Sy symbol table 584in the ELF object that contains the 585.Nm 586data. 587Not every object is included in this section. 588Specifically, when walking the symbol table, an entry is skipped if it matches 589any of the following conditions: 590.Lp 591.Bl -bullet -offset indent -compact 592.It 593The symbol type is not 594.Sy STT_OBJECT 595.It 596The symbol's section index is 597.Sy SHN_UNDEF 598.It 599The symbol's name offset is zero 600.It 601The symbol's section index is 602.Sy SHN_ABS 603and the value of the symbol is zero. 604.It 605The symbol's name is 606.Li _START_ 607or 608.Li _END_ . 609These are skipped because they are used for scoping local symbols in 610ELF. 611.El 612.Lp 613The following sample code shows an example of iterating the object 614section and skipping the correct symbols: 615.Bd -literal 616#include <gelf.h> 617#include <stdio.h> 618 619/* 620 * Given the start of the object section in the CTF file, the number of symbols, 621 * and the ELF Data sections for the symbol table and the string table, this 622 * prints the type identifiers that correspond to objects. Note, a more robust 623 * implementation should ensure that they don't walk beyond the end of the CTF 624 * object section. 625 */ 626static int 627walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata, 628 long nsyms) 629{ 630 long i; 631 uintptr_t strbase = strdata->d_buf; 632 633 for (i = 1; i < nsyms; i++, objftoff++) { 634 const char *name; 635 GElf_Sym sym; 636 637 if (gelf_getsym(symdata, i, &sym) == NULL) 638 return (1); 639 640 if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT) 641 continue; 642 if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0) 643 continue; 644 if (sym.st_shndx == SHN_ABS && sym.st_value == 0) 645 continue; 646 name = (const char *)(strbase + sym.st_name); 647 if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0) 648 continue; 649 650 (void) printf("Symbol %d has type %d\n", i, *objtoff); 651 } 652 653 return (0); 654} 655.Ed 656.Ss The Function Section 657The function section of the 658.Nm 659file encodes the types of both the function's arguments and the function's 660return value. 661Similar to 662.Sx The Object Section , 663the function section encodes information for all symbols of type 664.Sy STT_FUNCTION , 665excepting those that fit specific criteria. 666Unlike with objects, because functions have a variable number of arguments, they 667start with a type encoding as defined in 668.Sx Type Encoding , 669which is the size of a 670.Sy uint16_t . 671For functions which have no type information available, they are encoded as 672.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) . 673Functions with arguments are encoded differently. 674Here, the variable length is turned into the number of arguments in the 675function. 676If a function is a 677.Sy varargs 678type function, then the number of arguments is increased by one. 679Functions with type information are encoded as: 680.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) . 681.Lp 682For functions that have no type information, nothing else is encoded, and the 683next function is encoded. 684For functions with type information, the next 685.Sy uint16_t 686is encoded with the type identifier of the return type of the function. 687It is followed by each of the type identifiers of the arguments, if any exist, 688in the order that they appear in the function. 689Therefore, argument 0 is the first type identifier and so on. 690When a function has a final varargs argument, that is encoded with the type 691identifier of zero. 692.Lp 693Like 694.Sx The Object Section , 695the function section is encoded in the order of the symbol table. 696It has similar, but slightly different considerations from objects. 697While iterating the symbol table, if any of the following conditions are true, 698then the entry is skipped and no corresponding entry is written: 699.Lp 700.Bl -bullet -offset indent -compact 701.It 702The symbol type is not 703.Sy STT_FUNCTION 704.It 705The symbol's section index is 706.Sy SHN_UNDEF 707.It 708The symbol's name offset is zero 709.It 710The symbol's name is 711.Li _START_ 712or 713.Li _END_ . 714These are skipped because they are used for scoping local symbols in 715ELF. 716.El 717.Ss The Type Section 718The type section is the heart of the 719.Nm 720data. 721It encodes all of the information about the types themselves. 722The base of the type information comes in two forms, a short form and a long 723form, each of which may be followed by a variable number of arguments. 724The following definitions describe the short and long forms: 725.Bd -literal 726#define CTF_MAX_SIZE 0xfffe /* max size of a type in bytes */ 727#define CTF_LSIZE_SENT 0xffff /* sentinel for ctt_size */ 728#define CTF_MAX_LSIZE UINT64_MAX 729 730typedef struct ctf_stype { 731 uint_t ctt_name; /* reference to name in string table */ 732 ushort_t ctt_info; /* encoded kind, variant length */ 733 union { 734 ushort_t _size; /* size of entire type in bytes */ 735 ushort_t _type; /* reference to another type */ 736 } _u; 737} ctf_stype_t; 738 739typedef struct ctf_type { 740 uint_t ctt_name; /* reference to name in string table */ 741 ushort_t ctt_info; /* encoded kind, variant length */ 742 union { 743 ushort_t _size; /* always CTF_LSIZE_SENT */ 744 ushort_t _type; /* do not use */ 745 } _u; 746 uint_t ctt_lsizehi; /* high 32 bits of type size in bytes */ 747 uint_t ctt_lsizelo; /* low 32 bits of type size in bytes */ 748} ctf_type_t; 749 750#define ctt_size _u._size /* for fundamental types that have a size */ 751#define ctt_type _u._type /* for types that reference another type */ 752.Ed 753.Pp 754Type sizes are stored in 755.Sy bytes . 756The basic small form uses a 757.Sy ushort_t 758to store the number of bytes. 759If the number of bytes in a structure would exceed 0xfffe, then the alternate 760form, the 761.Sy ctf_type_t , 762is used instead. 763To indicate that the larger form is being used, the member 764.Em ctt_size 765is set to value of 766.Sy CTF_LSIZE_SENT 767(0xffff). 768In general, when going through the type section, consumers use the 769.Sy ctf_type_t 770structure, but pay attention to the value of the member 771.Em ctt_size 772to determine whether they should increment their scan by the size of the 773.Sy ctf_stype_t 774or 775.Sy ctf_type_t . 776Not all kinds of types use 777.Sy ctt_size . 778Those which do not, will always use the 779.Sy ctf_stype_t 780structure. 781The individual sections for each kind have more information. 782.Lp 783Types are written out in order. 784Therefore the first entry encountered has a type id of 0x1, or 0x8000 if a 785child. 786The member 787.Em ctt_name 788is encoded as described in the section 789.Sx String Identifiers . 790The string that it points to is the name of the type. 791If the identifier points to an empty string (one that consists solely of a null 792terminator) then the type does not have a name, this is common with anonymous 793structures and unions that only have a typedef to name them, as well as 794pointers and qualifiers. 795.Lp 796The next member, the 797.Em ctt_info , 798is encoded as described in the section 799.Sx Type Encoding . 800The type's kind tells us how to interpret the remaining data in the 801.Sy ctf_type_t 802and any variable length data that may exist. 803The rest of this section will be broken down into the interpretation of the 804various kinds. 805.Ss Encoding of Integers 806Integers, which are of type 807.Sy CTF_K_INTEGER , 808have no variable length arguments. 809Instead, they are followed by a four byte 810.Sy uint_t 811which describes their encoding. 812All integers must be encoded with a variable length of zero. 813The 814.Em ctt_size 815member describes the length of the integer in bytes. 816In general, integer sizes will be rounded up to the closest power of two. 817.Lp 818The integer encoding contains three different pieces of information: 819.Bl -bullet -offset indent -compact 820.It 821The encoding of the integer 822.It 823The offset in 824.Sy bits 825of the type 826.It 827The size in 828.Sy bits 829of the type 830.El 831.Pp 832This encoding can be expressed through the following macros: 833.Bd -literal -offset indent 834#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24) 835#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16) 836#define CTF_INT_BITS(data) (((data) & 0x0000ffff)) 837 838#define CTF_INT_DATA(encoding, offset, bits) \\ 839 (((encoding) << 24) | ((offset) << 16) | (bits)) 840.Ed 841.Pp 842The following flags are defined for the encoding at this time: 843.Bd -literal -offset indent 844#define CTF_INT_SIGNED 0x01 845#define CTF_INT_CHAR 0x02 846#define CTF_INT_BOOL 0x04 847#define CTF_INT_VARARGS 0x08 848.Ed 849.Lp 850By default, an integer is considered to be unsigned, unless it has the 851.Sy CTF_INT_SIGNED 852flag set. 853If the flag 854.Sy CTF_INT_CHAR 855is set, that indicates that the integer is of a type that stores character 856data, for example the intrinsic C type 857.Sy char 858would have the 859.Sy CTF_INT_CHAR 860flag set. 861If the flag 862.Sy CTF_INT_BOOL 863is set, that indicates that the integer represents a boolean type. 864For example, the intrinsic C type 865.Sy _Bool 866would have the 867.Sy CTF_INT_BOOL 868flag set. 869Finally, the flag 870.Sy CTF_INT_VARARGS 871indicates that the integer is used as part of a variable number of arguments. 872This encoding is rather uncommon. 873.Ss Encoding of Floats 874Floats, which are of type 875.Sy CTF_K_FLOAT , 876are similar to their integer counterparts. 877They have no variable length arguments and are followed by a four byte encoding 878which describes the kind of float that exists. 879The 880.Em ctt_size 881member is the size, in bytes, of the float. 882The float encoding has three different pieces of information inside of it: 883.Lp 884.Bl -bullet -offset indent -compact 885.It 886The specific kind of float that exists 887.It 888The offset in 889.Sy bits 890of the float 891.It 892The size in 893.Sy bits 894of the float 895.El 896.Lp 897This encoding can be expressed through the following macros: 898.Bd -literal -offset indent 899#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24) 900#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16) 901#define CTF_FP_BITS(data) (((data) & 0x0000ffff)) 902 903#define CTF_FP_DATA(encoding, offset, bits) \\ 904 (((encoding) << 24) | ((offset) << 16) | (bits)) 905.Ed 906.Lp 907Where as the encoding for integers is a series of flags, the encoding for 908floats maps to a specific kind of float. 909It is not a flag-based value. 910The kinds of floats correspond to both their size, and the encoding. 911This covers all of the basic C intrinsic floating point types. 912The following are the different kinds of floats represented in the encoding: 913.Bd -literal -offset indent 914#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */ 915#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */ 916#define CTF_FP_CPLX 3 /* Complex encoding */ 917#define CTF_FP_DCPLX 4 /* Double complex encoding */ 918#define CTF_FP_LDCPLX 5 /* Long double complex encoding */ 919#define CTF_FP_LDOUBLE 6 /* Long double encoding */ 920#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */ 921#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */ 922#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */ 923#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */ 924#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */ 925#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */ 926.Ed 927.Ss Encoding of Arrays 928Arrays, which are of type 929.Sy CTF_K_ARRAY , 930have no variable length arguments. 931They are followed by a structure which describes the number of elements in the 932array, the type identifier of the elements in the array, and the type identifier 933of the index of the array. 934With arrays, the 935.Em ctt_size 936member is set to zero. 937The structure that follows an array is defined as: 938.Bd -literal 939typedef struct ctf_array { 940 ushort_t cta_contents; /* reference to type of array contents */ 941 ushort_t cta_index; /* reference to type of array index */ 942 uint_t cta_nelems; /* number of elements */ 943} ctf_array_t; 944.Ed 945.Lp 946The 947.Em cta_contents 948and 949.Em cta_index 950members of the 951.Sy ctf_array_t 952are type identifiers which are encoded as per the section 953.Sx Type Identifiers . 954The member 955.Em cta_nelems 956is a simple four byte unsigned count of the number of elements. 957This count may be zero when encountering C99's flexible array members. 958.Ss Encoding of Functions 959Function types, which are of type 960.Sy CTF_K_FUNCTION , 961use the variable length list to be the number of arguments in the function. 962When the function has a final member which is a varargs, then the argument count 963is incremented by one to account for the variable argument. 964Here, the 965.Em ctt_type 966member is encoded with the type identifier of the return type of the function. 967Note that the 968.Em ctt_size 969member is not used here. 970.Lp 971The variable argument list contains the type identifiers for the arguments of 972the function, if any. 973Each one is represented by a 974.Sy uint16_t 975and encoded according to the 976.Sx Type Identifiers 977section. 978If the function's last argument is of type varargs, then it is also written out, 979but the type identifier is zero. 980This is included in the count of the function's arguments. 981An extra type identifier may follow the argument and return type identifiers 982in order to maintain four-byte alignment for the following type definition. 983Such a type identifier is not included in the argument count and has a value 984of zero. 985.Ss Encoding of Structures and Unions 986Structures and Unions, which are encoded with 987.Sy CTF_K_STRUCT 988and 989.Sy CTF_K_UNION 990respectively, are very similar constructs in C. 991The main difference between them is the fact that members of a structure 992follow one another, where as in a union, all members share the same memory. 993They are also very similar in terms of their encoding in 994.Nm . 995The variable length argument for structures and unions represents the number of 996members that they have. 997The value of the member 998.Em ctt_size 999is the size of the structure and union. 1000There are two different structures which are used to encode members in the 1001variable list. 1002When the size of a structure or union is greater than or equal to the large 1003member threshold, 8192, then a different structure is used to encode the member, 1004all members are encoded using the same structure. 1005The structure for members is as follows: 1006.Bd -literal 1007typedef struct ctf_member { 1008 uint_t ctm_name; /* reference to name in string table */ 1009 ushort_t ctm_type; /* reference to type of member */ 1010 ushort_t ctm_offset; /* offset of this member in bits */ 1011} ctf_member_t; 1012 1013typedef struct ctf_lmember { 1014 uint_t ctlm_name; /* reference to name in string table */ 1015 ushort_t ctlm_type; /* reference to type of member */ 1016 ushort_t ctlm_pad; /* padding */ 1017 uint_t ctlm_offsethi; /* high 32 bits of member offset in bits */ 1018 uint_t ctlm_offsetlo; /* low 32 bits of member offset in bits */ 1019} ctf_lmember_t; 1020.Ed 1021.Lp 1022Both the 1023.Em ctm_name 1024and 1025.Em ctlm_name 1026refer to the name of the member. 1027The name is encoded as an offset into the string table as described by the 1028section 1029.Sx String Identifiers . 1030The members 1031.Sy ctm_type 1032and 1033.Sy ctlm_type 1034both refer to the type of the member. 1035They are encoded as per the section 1036.Sx Type Identifiers . 1037.Lp 1038The last piece of information that is present is the offset which describes the 1039offset in memory at which the member begins. 1040For unions, this value will always be zero because each member of a union has 1041an offset of zero. 1042For structures, this is the offset in 1043.Sy bits 1044at which the member begins. 1045Note that a compiler may lay out a type with padding. 1046This means that the difference in offset between two consecutive members may be 1047larger than the size of the member. 1048When the size of the overall structure is strictly less than 8192 bytes, the 1049normal structure, 1050.Sy ctf_member_t , 1051is used and the offset in bits is stored in the member 1052.Em ctm_offset . 1053However, when the size of the structure is greater than or equal to 8192 bytes, 1054then the number of bits is split into two 32-bit quantities. 1055One member, 1056.Em ctlm_offsethi , 1057represents the upper 32 bits of the offset, while the other member, 1058.Em ctlm_offsetlo , 1059represents the lower 32 bits of the offset. 1060These can be joined together to get a 64-bit sized offset in bits by shifting 1061the member 1062.Em ctlm_offsethi 1063to the left by thirty two and then doing a binary or of 1064.Em ctlm_offsetlo . 1065.Ss Encoding of Enumerations 1066Enumerations, noted by the type 1067.Sy CTF_K_ENUM , 1068are similar to structures. 1069Enumerations use the variable list to note the number of values that the 1070enumeration contains, which we'll term enumerators. 1071In C, an enumeration is always equivalent to the intrinsic type 1072.Sy int , 1073thus the value of the member 1074.Em ctt_size 1075is always the size of an integer which is determined based on the current model. 1076For 1077.Fx 1078systems, this will always be 4, as an integer is always defined to 1079be 4 bytes large in both 1080.Sy ILP32 1081and 1082.Sy LP64 , 1083regardless of the architecture. 1084For further details, see 1085.Xr arch 7 . 1086.Lp 1087The enumerators encoded in an enumeration have the following structure in the 1088variable list: 1089.Bd -literal 1090typedef struct ctf_enum { 1091 uint_t cte_name; /* reference to name in string table */ 1092 int cte_value; /* value associated with this name */ 1093} ctf_enum_t; 1094.Ed 1095.Pp 1096The member 1097.Em cte_name 1098refers to the name of the enumerator's value, it is encoded according to the 1099rules in the section 1100.Sx String Identifiers . 1101The member 1102.Em cte_value 1103contains the integer value of this enumerator. 1104.Ss Encoding of Forward References 1105Forward references, types of kind 1106.Sy CTF_K_FORWARD , 1107in a 1108.Nm 1109file refer to types which may not have a definition at all, only a name. 1110If the 1111.Nm 1112file is a child, then it may be that the forward is resolved to an 1113actual type in the parent, otherwise the definition may be in another 1114.Nm 1115container or may not be known at all. 1116The only member of the 1117.Sy ctf_type_t 1118that matters for a forward declaration is the 1119.Em ctt_name 1120which points to the name of the forward reference in the string table as 1121described earlier. 1122There is no other information recorded for forward references. 1123.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict 1124Pointers, typedefs, volatile, const, and restrict are all similar in 1125.Nm . 1126They all refer to another type. 1127In the case of typedefs, they provide an alternate name, while volatile, const, 1128and restrict change how the type is interpreted in the C programming language. 1129This covers the 1130.Nm 1131kinds 1132.Sy CTF_K_POINTER , 1133.Sy CTF_K_TYPEDEF , 1134.Sy CTF_K_VOLATILE , 1135.Sy CTF_K_RESTRICT , 1136and 1137.Sy CTF_K_CONST . 1138.Lp 1139These types have no variable list entries and use the member 1140.Em ctt_type 1141to refer to the base type that they modify. 1142.Ss Encoding of Unknown Types 1143Types with the kind 1144.Sy CTF_K_UNKNOWN 1145are used to indicate gaps in the type identifier space. 1146These entries consume an identifier, but do not define anything. 1147Nothing should refer to these gap identifiers. 1148.Ss Dependencies Between Types 1149C types can be imagined as a directed, cyclic, graph. 1150Structures and unions may refer to each other in a way that creates a cyclic 1151dependency. 1152In cases such as these, the entire type section must be read in and processed. 1153Consumers must not assume that every type can be laid out in dependency order; 1154they cannot. 1155.Ss The String Section 1156The last section of the 1157.Nm 1158file is the 1159.Sy string 1160section. 1161This section encodes all of the strings that appear throughout the other 1162sections. 1163It is laid out as a series of characters followed by a null terminator. 1164Generally, all names are written out in ASCII, as most C compilers do not allow 1165any characters to appear in identifiers outside of a subset of ASCII. 1166However, any extended characters sets should be written out as a series of UTF-8 1167bytes. 1168.Lp 1169The first entry in the section, at offset zero, is a single null 1170terminator to reference the empty string. 1171Following that, each C string should be written out, including the null 1172terminator. 1173Offsets that refer to something in this section should refer to the first byte 1174which begins a string. 1175Beyond the first byte in the section being the null terminator, the order of 1176strings is unimportant. 1177.Ss Data Encoding and ELF Considerations 1178.Nm 1179data is generally included in ELF objects which specify information to 1180identify the architecture and endianness of the file. 1181A 1182.Nm 1183container inside such an object must match the endianness of the ELF object. 1184Aside from the question of the endian encoding of data, there should be no other 1185differences between architectures. 1186While many of the types in this document refer to non-fixed size C integral 1187types, they are equivalent in the models 1188.Sy ILP32 1189and 1190.Sy LP64 . 1191If any other model is being used with 1192.Nm 1193data that has different sizes, then it must not use the model's sizes for 1194those integral types and instead use the fixed size equivalents based on an 1195.Sy ILP32 1196environment. 1197.Lp 1198When placing a 1199.Nm 1200container inside of an ELF object, there are certain conventions that are 1201expected for the purposes of tooling being able to find the 1202.Nm 1203data. 1204In particular, a given ELF object should only contain a single 1205.Nm 1206section. 1207Multiple containers should be merged together into a single one. 1208.Lp 1209The 1210.Nm 1211file should be included in its own ELF section. 1212The section's name must be 1213.Ql .SUNW_ctf . 1214The type of the section must be 1215.Sy SHT_PROGBITS . 1216The section should have a link set to the symbol table and its address 1217alignment must be 4. 1218.Sh SEE ALSO 1219.Xr dtrace 1 , 1220.Xr elf 3 , 1221.Xr gelf 3 , 1222.Xr a.out 5 , 1223.Xr elf 5 , 1224.Xr arch 7 1225