1# @(#)TOUR 8.1 (Berkeley) 5/31/93 2# $FreeBSD$ 3 4NOTE -- This is the original TOUR paper distributed with ash and 5does not represent the current state of the shell. It is provided anyway 6since it provides helpful information for how the shell is structured, 7but be warned that things have changed -- the current shell is 8still under development. 9 10================================================================ 11 12 A Tour through Ash 13 14 Copyright 1989 by Kenneth Almquist. 15 16 17DIRECTORIES: The subdirectory bltin contains commands which can 18be compiled stand-alone. The rest of the source is in the main 19ash directory. 20 21SOURCE CODE GENERATORS: Files whose names begin with "mk" are 22programs that generate source code. A complete list of these 23programs is: 24 25 program input files generates 26 ------- ----------- --------- 27 mkbuiltins builtins builtins.h builtins.c 28 mknodes nodetypes nodes.h nodes.c 29 mksyntax - syntax.h syntax.c 30 mktokens - token.h 31 32There are undoubtedly too many of these. 33 34EXCEPTIONS: Code for dealing with exceptions appears in 35exceptions.c. The C language doesn't include exception handling, 36so I implement it using setjmp and longjmp. The global variable 37exception contains the type of exception. EXERROR is raised by 38calling error. EXINT is an interrupt. 39 40INTERRUPTS: In an interactive shell, an interrupt will cause an 41EXINT exception to return to the main command loop. (Exception: 42EXINT is not raised if the user traps interrupts using the trap 43command.) The INTOFF and INTON macros (defined in exception.h) 44provide uninterruptible critical sections. Between the execution 45of INTOFF and the execution of INTON, interrupt signals will be 46held for later delivery. INTOFF and INTON can be nested. 47 48MEMALLOC.C: Memalloc.c defines versions of malloc and realloc 49which call error when there is no memory left. It also defines a 50stack oriented memory allocation scheme. Allocating off a stack 51is probably more efficient than allocation using malloc, but the 52big advantage is that when an exception occurs all we have to do 53to free up the memory in use at the time of the exception is to 54restore the stack pointer. The stack is implemented using a 55linked list of blocks. 56 57STPUTC: If the stack were contiguous, it would be easy to store 58strings on the stack without knowing in advance how long the 59string was going to be: 60 p = stackptr; 61 *p++ = c; /* repeated as many times as needed */ 62 stackptr = p; 63The following three macros (defined in memalloc.h) perform these 64operations, but grow the stack if you run off the end: 65 STARTSTACKSTR(p); 66 STPUTC(c, p); /* repeated as many times as needed */ 67 grabstackstr(p); 68 69We now start a top-down look at the code: 70 71MAIN.C: The main routine performs some initialization, executes 72the user's profile if necessary, and calls cmdloop. Cmdloop 73repeatedly parses and executes commands. 74 75OPTIONS.C: This file contains the option processing code. It is 76called from main to parse the shell arguments when the shell is 77invoked, and it also contains the set builtin. The -i and -m op- 78tions (the latter turns on job control) require changes in signal 79handling. The routines setjobctl (in jobs.c) and setinteractive 80(in trap.c) are called to handle changes to these options. 81 82PARSING: The parser code is all in parser.c. A recursive des- 83cent parser is used. Syntax tables (generated by mksyntax) are 84used to classify characters during lexical analysis. There are 85four tables: one for normal use, one for use when inside single 86quotes and dollar single quotes, one for use when inside double 87quotes and one for use in arithmetic. The tables are machine 88dependent because they are indexed by character variables and 89the range of a char varies from machine to machine. 90 91PARSE OUTPUT: The output of the parser consists of a tree of 92nodes. The various types of nodes are defined in the file node- 93types. 94 95Nodes of type NARG are used to represent both words and the con- 96tents of here documents. An early version of ash kept the con- 97tents of here documents in temporary files, but keeping here do- 98cuments in memory typically results in significantly better per- 99formance. It would have been nice to make it an option to use 100temporary files for here documents, for the benefit of small 101machines, but the code to keep track of when to delete the tem- 102porary files was complex and I never fixed all the bugs in it. 103(AT&T has been maintaining the Bourne shell for more than ten 104years, and to the best of my knowledge they still haven't gotten 105it to handle temporary files correctly in obscure cases.) 106 107The text field of a NARG structure points to the text of the 108word. The text consists of ordinary characters and a number of 109special codes defined in parser.h. The special codes are: 110 111 CTLVAR Variable substitution 112 CTLENDVAR End of variable substitution 113 CTLBACKQ Command substitution 114 CTLBACKQ|CTLQUOTE Command substitution inside double quotes 115 CTLESC Escape next character 116 117A variable substitution contains the following elements: 118 119 CTLVAR type name '=' [ alternative-text CTLENDVAR ] 120 121The type field is a single character specifying the type of sub- 122stitution. The possible types are: 123 124 VSNORMAL $var 125 VSMINUS ${var-text} 126 VSMINUS|VSNUL ${var:-text} 127 VSPLUS ${var+text} 128 VSPLUS|VSNUL ${var:+text} 129 VSQUESTION ${var?text} 130 VSQUESTION|VSNUL ${var:?text} 131 VSASSIGN ${var=text} 132 VSASSIGN|VSNUL ${var:=text} 133 134In addition, the type field will have the VSQUOTE flag set if the 135variable is enclosed in double quotes. The name of the variable 136comes next, terminated by an equals sign. If the type is not 137VSNORMAL, then the text field in the substitution follows, ter- 138minated by a CTLENDVAR byte. 139 140Commands in back quotes are parsed and stored in a linked list. 141The locations of these commands in the string are indicated by 142CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 143the back quotes were enclosed in double quotes. 144 145The character CTLESC escapes the next character, so that in case 146any of the CTL characters mentioned above appear in the input, 147they can be passed through transparently. CTLESC is also used to 148escape '*', '?', '[', and '!' characters which were quoted by the 149user and thus should not be used for file name generation. 150 151CTLESC characters have proved to be particularly tricky to get 152right. In the case of here documents which are not subject to 153variable and command substitution, the parser doesn't insert any 154CTLESC characters to begin with (so the contents of the text 155field can be written without any processing). Other here docu- 156ments, and words which are not subject to splitting and file name 157generation, have the CTLESC characters removed during the vari- 158able and command substitution phase. Words which are subject to 159splitting and file name generation have the CTLESC characters re- 160moved as part of the file name phase. 161 162EXECUTION: Command execution is handled by the following files: 163 eval.c The top level routines. 164 redir.c Code to handle redirection of input and output. 165 jobs.c Code to handle forking, waiting, and job control. 166 exec.c Code to do path searches and the actual exec sys call. 167 expand.c Code to evaluate arguments. 168 var.c Maintains the variable symbol table. Called from expand.c. 169 170EVAL.C: Evaltree recursively executes a parse tree. The exit 171status is returned in the global variable exitstatus. The alter- 172native entry evalbackcmd is called to evaluate commands in back 173quotes. It saves the result in memory if the command is a buil- 174tin; otherwise it forks off a child to execute the command and 175connects the standard output of the child to a pipe. 176 177JOBS.C: To create a process, you call makejob to return a job 178structure, and then call forkshell (passing the job structure as 179an argument) to create the process. Waitforjob waits for a job 180to complete. These routines take care of process groups if job 181control is defined. 182 183REDIR.C: Ash allows file descriptors to be redirected and then 184restored without forking off a child process. This is accom- 185plished by duplicating the original file descriptors. The redir- 186tab structure records where the file descriptors have been dupli- 187cated to. 188 189EXEC.C: The routine find_command locates a command, and enters 190the command in the hash table if it is not already there. The 191third argument specifies whether it is to print an error message 192if the command is not found. (When a pipeline is set up, 193find_command is called for all the commands in the pipeline be- 194fore any forking is done, so to get the commands into the hash 195table of the parent process. But to make command hashing as 196transparent as possible, we silently ignore errors at that point 197and only print error messages if the command cannot be found 198later.) 199 200The routine shellexec is the interface to the exec system call. 201 202EXPAND.C: Arguments are processed in three passes. The first 203(performed by the routine argstr) performs variable and command 204substitution. The second (ifsbreakup) performs word splitting 205and the third (expandmeta) performs file name generation. 206 207VAR.C: Variables are stored in a hash table. Probably we should 208switch to extensible hashing. The variable name is stored in the 209same string as the value (using the format "name=value") so that 210no string copying is needed to create the environment of a com- 211mand. Variables which the shell references internally are preal- 212located so that the shell can reference the values of these vari- 213ables without doing a lookup. 214 215When a program is run, the code in eval.c sticks any environment 216variables which precede the command (as in "PATH=xxx command") in 217the variable table as the simplest way to strip duplicates, and 218then calls "environment" to get the value of the environment. 219 220BUILTIN COMMANDS: The procedures for handling these are scat- 221tered throughout the code, depending on which location appears 222most appropriate. They can be recognized because their names al- 223ways end in "cmd". The mapping from names to procedures is 224specified in the file builtins, which is processed by the mkbuilt- 225ins command. 226 227A builtin command is invoked with argc and argv set up like a 228normal program. A builtin command is allowed to overwrite its 229arguments. Builtin routines can call nextopt to do option pars- 230ing. This is kind of like getopt, but you don't pass argc and 231argv to it. Builtin routines can also call error. This routine 232normally terminates the shell (or returns to the main command 233loop if the shell is interactive), but when called from a builtin 234command it causes the builtin command to terminate with an exit 235status of 2. 236 237The directory bltins contains commands which can be compiled in- 238dependently but can also be built into the shell for efficiency 239reasons. The makefile in this directory compiles these programs 240in the normal fashion (so that they can be run regardless of 241whether the invoker is ash), but also creates a library named 242bltinlib.a which can be linked with ash. The header file bltin.h 243takes care of most of the differences between the ash and the 244stand-alone environment. The user should call the main routine 245"main", and #define main to be the name of the routine to use 246when the program is linked into ash. This #define should appear 247before bltin.h is included; bltin.h will #undef main if the pro- 248gram is to be compiled stand-alone. 249 250CD.C: This file defines the cd and pwd builtins. 251 252SIGNALS: Trap.c implements the trap command. The routine set- 253signal figures out what action should be taken when a signal is 254received and invokes the signal system call to set the signal ac- 255tion appropriately. When a signal that a user has set a trap for 256is caught, the routine "onsig" sets a flag. The routine dotrap 257is called at appropriate points to actually handle the signal. 258When an interrupt is caught and no trap has been set for that 259signal, the routine "onint" in error.c is called. 260 261OUTPUT: Ash uses it's own output routines. There are three out- 262put structures allocated. "Output" represents the standard out- 263put, "errout" the standard error, and "memout" contains output 264which is to be stored in memory. This last is used when a buil- 265tin command appears in backquotes, to allow its output to be col- 266lected without doing any I/O through the UNIX operating system. 267The variables out1 and out2 normally point to output and errout, 268respectively, but they are set to point to memout when appropri- 269ate inside backquotes. 270 271INPUT: The basic input routine is pgetc, which reads from the 272current input file. There is a stack of input files; the current 273input file is the top file on this stack. The code allows the 274input to come from a string rather than a file. (This is for the 275-c option and the "." and eval builtin commands.) The global 276variable plinno is saved and restored when files are pushed and 277popped from the stack. The parser routines store the number of 278the current line in this variable. 279 280DEBUGGING: If DEBUG is defined in shell.h, then the shell will 281write debugging information to the file $HOME/trace. Most of 282this is done using the TRACE macro, which takes a set of printf 283arguments inside two sets of parenthesis. Example: 284"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 285cause the preprocessor can't handle functions with a variable 286number of arguments. Defining DEBUG also causes the shell to 287generate a core dump if it is sent a quit signal. The tracing 288code is in show.c. 289