1The ELF, COFF and Wasm Linkers 2============================== 3 4The ELF Linker as a Library 5--------------------------- 6 7You can embed LLD to your program by linking against it and calling the linker's 8entry point function lld::elf::link. 9 10The current policy is that it is your reponsibility to give trustworthy object 11files. The function is guaranteed to return as long as you do not pass corrupted 12or malicious object files. A corrupted file could cause a fatal error or SEGV. 13That being said, you don't need to worry too much about it if you create object 14files in the usual way and give them to the linker. It is naturally expected to 15work, or otherwise it's a linker's bug. 16 17Design 18====== 19 20We will describe the design of the linkers in the rest of the document. 21 22Key Concepts 23------------ 24 25Linkers are fairly large pieces of software. 26There are many design choices you have to make to create a complete linker. 27 28This is a list of design choices we've made for ELF and COFF LLD. 29We believe that these high-level design choices achieved a right balance 30between speed, simplicity and extensibility. 31 32* Implement as native linkers 33 34 We implemented the linkers as native linkers for each file format. 35 36 The linkers share the same design but share very little code. 37 Sharing code makes sense if the benefit is worth its cost. 38 In our case, the object formats are different enough that we thought the layer 39 to abstract the differences wouldn't be worth its complexity and run-time 40 cost. Elimination of the abstract layer has greatly simplified the 41 implementation. 42 43* Speed by design 44 45 One of the most important things in archiving high performance is to 46 do less rather than do it efficiently. 47 Therefore, the high-level design matters more than local optimizations. 48 Since we are trying to create a high-performance linker, 49 it is very important to keep the design as efficient as possible. 50 51 Broadly speaking, we do not do anything until we have to do it. 52 For example, we do not read section contents or relocations 53 until we need them to continue linking. 54 When we need to do some costly operation (such as looking up 55 a hash table for each symbol), we do it only once. 56 We obtain a handle (which is typically just a pointer to actual data) 57 on the first operation and use it throughout the process. 58 59* Efficient archive file handling 60 61 LLD's handling of archive files (the files with ".a" file extension) is 62 different from the traditional Unix linkers and similar to Windows linkers. 63 We'll describe how the traditional Unix linker handles archive files, what the 64 problem is, and how LLD approached the problem. 65 66 The traditional Unix linker maintains a set of undefined symbols during 67 linking. The linker visits each file in the order as they appeared in the 68 command line until the set becomes empty. What the linker would do depends on 69 file type. 70 71 - If the linker visits an object file, the linker links object files to the 72 result, and undefined symbols in the object file are added to the set. 73 74 - If the linker visits an archive file, it checks for the archive file's 75 symbol table and extracts all object files that have definitions for any 76 symbols in the set. 77 78 This algorithm sometimes leads to a counter-intuitive behavior. If you give 79 archive files before object files, nothing will happen because when the linker 80 visits archives, there is no undefined symbols in the set. As a result, no 81 files are extracted from the first archive file, and the link is done at that 82 point because the set is empty after it visits one file. 83 84 You can fix the problem by reordering the files, 85 but that cannot fix the issue of mutually-dependent archive files. 86 87 Linking mutually-dependent archive files is tricky. You may specify the same 88 archive file multiple times to let the linker visit it more than once. Or, 89 you may use the special command line options, `--start-group` and 90 `--end-group`, to let the linker loop over the files between the options until 91 no new symbols are added to the set. 92 93 Visiting the same archive files multiple times makes the linker slower. 94 95 Here is how LLD approaches the problem. Instead of memorizing only undefined 96 symbols, we program LLD so that it memorizes all symbols. When it sees an 97 undefined symbol that can be resolved by extracting an object file from an 98 archive file it previously visited, it immediately extracts the file and links 99 it. It is doable because LLD does not forget symbols it has seen in archive 100 files. 101 102 We believe that LLD's way is efficient and easy to justify. 103 104 The semantics of LLD's archive handling are different from the traditional 105 Unix's. You can observe it if you carefully craft archive files to exploit 106 it. However, in reality, we don't know any program that cannot link with our 107 algorithm so far, so it's not going to cause trouble. 108 109Numbers You Want to Know 110------------------------ 111 112To give you intuition about what kinds of data the linker is mainly working on, 113I'll give you the list of objects and their numbers LLD has to read and process 114in order to link a very large executable. In order to link Chrome with debug 115info, which is roughly 2 GB in output size, LLD reads 116 117- 17,000 files, 118- 1,800,000 sections, 119- 6,300,000 symbols, and 120- 13,000,000 relocations. 121 122LLD produces the 2 GB executable in 15 seconds. 123 124These numbers vary depending on your program, but in general, 125you have a lot of relocations and symbols for each file. 126If your program is written in C++, symbol names are likely to be 127pretty long because of name mangling. 128 129It is important to not waste time on relocations and symbols. 130 131In the above case, the total amount of symbol strings is 450 MB, 132and inserting all of them to a hash table takes 1.5 seconds. 133Therefore, if you causally add a hash table lookup for each symbol, 134it would slow down the linker by 10%. So, don't do that. 135 136On the other hand, you don't have to pursue efficiency 137when handling files. 138 139Important Data Structures 140------------------------- 141 142We will describe the key data structures in LLD in this section. The linker can 143be understood as the interactions between them. Once you understand their 144functions, the code of the linker should look obvious to you. 145 146* Symbol 147 148 This class represents a symbol. 149 They are created for symbols in object files or archive files. 150 The linker creates linker-defined symbols as well. 151 152 There are basically three types of Symbols: Defined, Undefined, or Lazy. 153 154 - Defined symbols are for all symbols that are considered as "resolved", 155 including real defined symbols, COMDAT symbols, common symbols, 156 absolute symbols, linker-created symbols, etc. 157 - Undefined symbols represent undefined symbols, which need to be replaced by 158 Defined symbols by the resolver until the link is complete. 159 - Lazy symbols represent symbols we found in archive file headers 160 which can turn into Defined if we read archive members. 161 162 There's only one Symbol instance for each unique symbol name. This uniqueness 163 is guaranteed by the symbol table. As the resolver reads symbols from input 164 files, it replaces an existing Symbol with the "best" Symbol for its symbol 165 name using the placement new. 166 167 The above mechanism allows you to use pointers to Symbols as a very cheap way 168 to access name resolution results. Assume for example that you have a pointer 169 to an undefined symbol before name resolution. If the symbol is resolved to a 170 defined symbol by the resolver, the pointer will "automatically" point to the 171 defined symbol, because the undefined symbol the pointer pointed to will have 172 been replaced by the defined symbol in-place. 173 174* SymbolTable 175 176 SymbolTable is basically a hash table from strings to Symbols 177 with logic to resolve symbol conflicts. It resolves conflicts by symbol type. 178 179 - If we add Defined and Undefined symbols, the symbol table will keep the 180 former. 181 - If we add Defined and Lazy symbols, it will keep the former. 182 - If we add Lazy and Undefined, it will keep the former, 183 but it will also trigger the Lazy symbol to load the archive member 184 to actually resolve the symbol. 185 186* Chunk (COFF specific) 187 188 Chunk represents a chunk of data that will occupy space in an output. 189 Each regular section becomes a chunk. 190 Chunks created for common or BSS symbols are not backed by sections. 191 The linker may create chunks to append additional data to an output as well. 192 193 Chunks know about their size, how to copy their data to mmap'ed outputs, 194 and how to apply relocations to them. 195 Specifically, section-based chunks know how to read relocation tables 196 and how to apply them. 197 198* InputSection (ELF specific) 199 200 Since we have less synthesized data for ELF, we don't abstract slices of 201 input files as Chunks for ELF. Instead, we directly use the input section 202 as an internal data type. 203 204 InputSection knows about their size and how to copy themselves to 205 mmap'ed outputs, just like COFF Chunks. 206 207* OutputSection 208 209 OutputSection is a container of InputSections (ELF) or Chunks (COFF). 210 An InputSection or Chunk belongs to at most one OutputSection. 211 212There are mainly three actors in this linker. 213 214* InputFile 215 216 InputFile is a superclass of file readers. 217 We have a different subclass for each input file type, 218 such as regular object file, archive file, etc. 219 They are responsible for creating and owning Symbols and InputSections/Chunks. 220 221* Writer 222 223 The writer is responsible for writing file headers and InputSections/Chunks to 224 a file. It creates OutputSections, put all InputSections/Chunks into them, 225 assign unique, non-overlapping addresses and file offsets to them, and then 226 write them down to a file. 227 228* Driver 229 230 The linking process is driven by the driver. The driver: 231 232 - processes command line options, 233 - creates a symbol table, 234 - creates an InputFile for each input file and puts all symbols within into 235 the symbol table, 236 - checks if there's no remaining undefined symbols, 237 - creates a writer, 238 - and passes the symbol table to the writer to write the result to a file. 239 240Link-Time Optimization 241---------------------- 242 243LTO is implemented by handling LLVM bitcode files as object files. 244The linker resolves symbols in bitcode files normally. If all symbols 245are successfully resolved, it then runs LLVM passes 246with all bitcode files to convert them to one big regular ELF/COFF file. 247Finally, the linker replaces bitcode symbols with ELF/COFF symbols, 248so that they are linked as if they were in the native format from the beginning. 249 250The details are described in this document. 251http://llvm.org/docs/LinkTimeOptimization.html 252 253Glossary 254-------- 255 256* RVA (COFF) 257 258 Short for Relative Virtual Address. 259 260 Windows executables or DLLs are not position-independent; they are 261 linked against a fixed address called an image base. RVAs are 262 offsets from an image base. 263 264 Default image bases are 0x140000000 for executables and 0x18000000 265 for DLLs. For example, when we are creating an executable, we assume 266 that the executable will be loaded at address 0x140000000 by the 267 loader, so we apply relocations accordingly. Result texts and data 268 will contain raw absolute addresses. 269 270* VA 271 272 Short for Virtual Address. For COFF, it is equivalent to RVA + image base. 273 274* Base relocations (COFF) 275 276 Relocation information for the loader. If the loader decides to map 277 an executable or a DLL to a different address than their image 278 bases, it fixes up binaries using information contained in the base 279 relocation table. A base relocation table consists of a list of 280 locations containing addresses. The loader adds a difference between 281 RVA and actual load address to all locations listed there. 282 283 Note that this run-time relocation mechanism is much simpler than ELF. 284 There's no PLT or GOT. Images are relocated as a whole just 285 by shifting entire images in memory by some offsets. Although doing 286 this breaks text sharing, I think this mechanism is not actually bad 287 on today's computers. 288 289* ICF 290 291 Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). 292 293 ICF is an optimization to reduce output size by merging read-only sections 294 by not only their names but by their contents. If two read-only sections 295 happen to have the same metadata, actual contents and relocations, 296 they are merged by ICF. It is known as an effective technique, 297 and it usually reduces C++ program's size by a few percent or more. 298 299 Note that this is not an entirely sound optimization. C/C++ require 300 different functions have different addresses. If a program depends on 301 that property, it would fail at runtime. 302 303 On Windows, that's not really an issue because MSVC link.exe enabled 304 the optimization by default. As long as your program works 305 with the linker's default settings, your program should be safe with ICF. 306 307 On Unix, your program is generally not guaranteed to be safe with ICF, 308 although large programs happen to work correctly. 309 LLD works fine with ICF for example. 310 311Other Info 312---------- 313 314.. toctree:: 315 :maxdepth: 1 316 317 missingkeyfunction 318