14b88c807SRodney W. Grimes 24b88c807SRodney W. GrimesNOTE -- This is the original TOUR paper distributed with ash and 34b88c807SRodney W. Grimesdoes not represent the current state of the shell. It is provided anyway 44b88c807SRodney W. Grimessince it provides helpful information for how the shell is structured, 54b88c807SRodney W. Grimesbut be warned that things have changed -- the current shell is 64b88c807SRodney W. Grimesstill under development. 74b88c807SRodney W. Grimes 84b88c807SRodney W. Grimes================================================================ 94b88c807SRodney W. Grimes 104b88c807SRodney W. Grimes A Tour through Ash 114b88c807SRodney W. Grimes 124b88c807SRodney W. Grimes Copyright 1989 by Kenneth Almquist. 134b88c807SRodney W. Grimes 144b88c807SRodney W. Grimes 154b88c807SRodney W. GrimesDIRECTORIES: The subdirectory bltin contains commands which can 164b88c807SRodney W. Grimesbe compiled stand-alone. The rest of the source is in the main 174b88c807SRodney W. Grimesash directory. 184b88c807SRodney W. Grimes 194b88c807SRodney W. GrimesSOURCE CODE GENERATORS: Files whose names begin with "mk" are 204b88c807SRodney W. Grimesprograms that generate source code. A complete list of these 214b88c807SRodney W. Grimesprograms is: 224b88c807SRodney W. Grimes 230ef05a46SJens Schweikhardt program input files generates 240ef05a46SJens Schweikhardt ------- ----------- --------- 25b9807277SJilles Tjoelker mkbuiltins builtins.def builtins.h builtins.c 264b88c807SRodney W. Grimes mknodes nodetypes nodes.h nodes.c 274b88c807SRodney W. Grimes mksyntax - syntax.h syntax.c 28aa9caaf6SPeter Wemm mktokens - token.h 294b88c807SRodney W. Grimes 30338b821bSJilles TjoelkerThere are undoubtedly too many of these. 314b88c807SRodney W. Grimes 324b88c807SRodney W. GrimesEXCEPTIONS: Code for dealing with exceptions appears in 334b88c807SRodney W. Grimesexceptions.c. The C language doesn't include exception handling, 344b88c807SRodney W. Grimesso I implement it using setjmp and longjmp. The global variable 354b88c807SRodney W. Grimesexception contains the type of exception. EXERROR is raised by 36*bb324af6SJilles Tjoelkercalling error or errorwithstatus. EXINT is an interrupt. 374b88c807SRodney W. Grimes 384b88c807SRodney W. GrimesINTERRUPTS: In an interactive shell, an interrupt will cause an 394b88c807SRodney W. GrimesEXINT exception to return to the main command loop. (Exception: 404b88c807SRodney W. GrimesEXINT is not raised if the user traps interrupts using the trap 414b88c807SRodney W. Grimescommand.) The INTOFF and INTON macros (defined in exception.h) 420ef05a46SJens Schweikhardtprovide uninterruptible critical sections. Between the execution 434b88c807SRodney W. Grimesof INTOFF and the execution of INTON, interrupt signals will be 444b88c807SRodney W. Grimesheld for later delivery. INTOFF and INTON can be nested. 454b88c807SRodney W. Grimes 464b88c807SRodney W. GrimesMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 474b88c807SRodney W. Grimeswhich call error when there is no memory left. It also defines a 484b88c807SRodney W. Grimesstack oriented memory allocation scheme. Allocating off a stack 494b88c807SRodney W. Grimesis probably more efficient than allocation using malloc, but the 504b88c807SRodney W. Grimesbig advantage is that when an exception occurs all we have to do 514b88c807SRodney W. Grimesto free up the memory in use at the time of the exception is to 524b88c807SRodney W. Grimesrestore the stack pointer. The stack is implemented using a 534b88c807SRodney W. Grimeslinked list of blocks. 544b88c807SRodney W. Grimes 554b88c807SRodney W. GrimesSTPUTC: If the stack were contiguous, it would be easy to store 564b88c807SRodney W. Grimesstrings on the stack without knowing in advance how long the 574b88c807SRodney W. Grimesstring was going to be: 584b88c807SRodney W. Grimes p = stackptr; 594b88c807SRodney W. Grimes *p++ = c; /* repeated as many times as needed */ 604b88c807SRodney W. Grimes stackptr = p; 610ef05a46SJens SchweikhardtThe following three macros (defined in memalloc.h) perform these 624b88c807SRodney W. Grimesoperations, but grow the stack if you run off the end: 634b88c807SRodney W. Grimes STARTSTACKSTR(p); 644b88c807SRodney W. Grimes STPUTC(c, p); /* repeated as many times as needed */ 654b88c807SRodney W. Grimes grabstackstr(p); 664b88c807SRodney W. Grimes 674b88c807SRodney W. GrimesWe now start a top-down look at the code: 684b88c807SRodney W. Grimes 694b88c807SRodney W. GrimesMAIN.C: The main routine performs some initialization, executes 700ef05a46SJens Schweikhardtthe user's profile if necessary, and calls cmdloop. Cmdloop 714b88c807SRodney W. Grimesrepeatedly parses and executes commands. 724b88c807SRodney W. Grimes 734b88c807SRodney W. GrimesOPTIONS.C: This file contains the option processing code. It is 744b88c807SRodney W. Grimescalled from main to parse the shell arguments when the shell is 75241f15dfSJilles Tjoelkerinvoked, and it also contains the set builtin. The -i and -m op- 764b88c807SRodney W. Grimestions (the latter turns on job control) require changes in signal 774b88c807SRodney W. Grimeshandling. The routines setjobctl (in jobs.c) and setinteractive 784b88c807SRodney W. Grimes(in trap.c) are called to handle changes to these options. 794b88c807SRodney W. Grimes 804b88c807SRodney W. GrimesPARSING: The parser code is all in parser.c. A recursive des- 814b88c807SRodney W. Grimescent parser is used. Syntax tables (generated by mksyntax) are 824b88c807SRodney W. Grimesused to classify characters during lexical analysis. There are 83241f15dfSJilles Tjoelkerfour tables: one for normal use, one for use when inside single 84241f15dfSJilles Tjoelkerquotes and dollar single quotes, one for use when inside double 85241f15dfSJilles Tjoelkerquotes and one for use in arithmetic. The tables are machine 86241f15dfSJilles Tjoelkerdependent because they are indexed by character variables and 87241f15dfSJilles Tjoelkerthe range of a char varies from machine to machine. 884b88c807SRodney W. Grimes 894b88c807SRodney W. GrimesPARSE OUTPUT: The output of the parser consists of a tree of 904b88c807SRodney W. Grimesnodes. The various types of nodes are defined in the file node- 914b88c807SRodney W. Grimestypes. 924b88c807SRodney W. Grimes 934b88c807SRodney W. GrimesNodes of type NARG are used to represent both words and the con- 944b88c807SRodney W. Grimestents of here documents. An early version of ash kept the con- 954b88c807SRodney W. Grimestents of here documents in temporary files, but keeping here do- 964b88c807SRodney W. Grimescuments in memory typically results in significantly better per- 974b88c807SRodney W. Grimesformance. It would have been nice to make it an option to use 984b88c807SRodney W. Grimestemporary files for here documents, for the benefit of small 994b88c807SRodney W. Grimesmachines, but the code to keep track of when to delete the tem- 1004b88c807SRodney W. Grimesporary files was complex and I never fixed all the bugs in it. 1014b88c807SRodney W. Grimes(AT&T has been maintaining the Bourne shell for more than ten 1024b88c807SRodney W. Grimesyears, and to the best of my knowledge they still haven't gotten 1034b88c807SRodney W. Grimesit to handle temporary files correctly in obscure cases.) 1044b88c807SRodney W. Grimes 1054b88c807SRodney W. GrimesThe text field of a NARG structure points to the text of the 1064b88c807SRodney W. Grimesword. The text consists of ordinary characters and a number of 1074b88c807SRodney W. Grimesspecial codes defined in parser.h. The special codes are: 1084b88c807SRodney W. Grimes 109b9807277SJilles Tjoelker CTLVAR Parameter expansion 110b9807277SJilles Tjoelker CTLENDVAR End of parameter expansion 1114b88c807SRodney W. Grimes CTLBACKQ Command substitution 1124b88c807SRodney W. Grimes CTLBACKQ|CTLQUOTE Command substitution inside double quotes 113b9807277SJilles Tjoelker CTLARI Arithmetic expansion 114b9807277SJilles Tjoelker CTLENDARI End of arithmetic expansion 1154b88c807SRodney W. Grimes CTLESC Escape next character 1164b88c807SRodney W. Grimes 1174b88c807SRodney W. GrimesA variable substitution contains the following elements: 1184b88c807SRodney W. Grimes 1194b88c807SRodney W. Grimes CTLVAR type name '=' [ alternative-text CTLENDVAR ] 1204b88c807SRodney W. Grimes 1214b88c807SRodney W. GrimesThe type field is a single character specifying the type of sub- 1224b88c807SRodney W. Grimesstitution. The possible types are: 1234b88c807SRodney W. Grimes 1244b88c807SRodney W. Grimes VSNORMAL $var 1254b88c807SRodney W. Grimes VSMINUS ${var-text} 1264b88c807SRodney W. Grimes VSMINUS|VSNUL ${var:-text} 1274b88c807SRodney W. Grimes VSPLUS ${var+text} 1284b88c807SRodney W. Grimes VSPLUS|VSNUL ${var:+text} 1294b88c807SRodney W. Grimes VSQUESTION ${var?text} 1304b88c807SRodney W. Grimes VSQUESTION|VSNUL ${var:?text} 1314b88c807SRodney W. Grimes VSASSIGN ${var=text} 1320ef05a46SJens Schweikhardt VSASSIGN|VSNUL ${var:=text} 133b9807277SJilles Tjoelker VSTRIMLEFT ${var#text} 134b9807277SJilles Tjoelker VSTRIMLEFTMAX ${var##text} 135b9807277SJilles Tjoelker VSTRIMRIGHT ${var%text} 136b9807277SJilles Tjoelker VSTRIMRIGHTMAX ${var%%text} 137b9807277SJilles Tjoelker VSLENGTH ${#var} 138b9807277SJilles Tjoelker VSERROR delayed error 1394b88c807SRodney W. Grimes 1404b88c807SRodney W. GrimesIn addition, the type field will have the VSQUOTE flag set if the 141b9807277SJilles Tjoelkervariable is enclosed in double quotes and the VSLINENO flag if 142b9807277SJilles TjoelkerLINENO is being expanded (the parameter name is the decimal line 143b9807277SJilles Tjoelkernumber). The parameter's name comes next, terminated by an equals 144b9807277SJilles Tjoelkersign. If the type is not VSNORMAL (including when it is VSLENGTH), 145b9807277SJilles Tjoelkerthen the text field in the substitution follows, terminated by a 146b9807277SJilles TjoelkerCTLENDVAR byte. 147b9807277SJilles Tjoelker 148b9807277SJilles TjoelkerThe type VSERROR is used to allow parsing bad substitutions like 149b9807277SJilles Tjoelker${var[7]} and generate an error when they are expanded. 1504b88c807SRodney W. Grimes 1514b88c807SRodney W. GrimesCommands in back quotes are parsed and stored in a linked list. 1524b88c807SRodney W. GrimesThe locations of these commands in the string are indicated by 1534b88c807SRodney W. GrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 1544b88c807SRodney W. Grimesthe back quotes were enclosed in double quotes. 1554b88c807SRodney W. Grimes 156b9807277SJilles TjoelkerArithmetic expansion starts with CTLARI and ends with CTLENDARI. 157b9807277SJilles Tjoelker 1584b88c807SRodney W. GrimesThe character CTLESC escapes the next character, so that in case 1594b88c807SRodney W. Grimesany of the CTL characters mentioned above appear in the input, 1604b88c807SRodney W. Grimesthey can be passed through transparently. CTLESC is also used to 1614b88c807SRodney W. Grimesescape '*', '?', '[', and '!' characters which were quoted by the 1624b88c807SRodney W. Grimesuser and thus should not be used for file name generation. 1634b88c807SRodney W. Grimes 1644b88c807SRodney W. GrimesCTLESC characters have proved to be particularly tricky to get 1654b88c807SRodney W. Grimesright. In the case of here documents which are not subject to 1664b88c807SRodney W. Grimesvariable and command substitution, the parser doesn't insert any 1674b88c807SRodney W. GrimesCTLESC characters to begin with (so the contents of the text 1684b88c807SRodney W. Grimesfield can be written without any processing). Other here docu- 169b9807277SJilles Tjoelkerments, and words which are not subject to file name generation, 170b9807277SJilles Tjoelkerhave the CTLESC characters removed during the variable and command 171b9807277SJilles Tjoelkersubstitution phase. Words which are subject to file name 172b9807277SJilles Tjoelkergeneration have the CTLESC characters removed as part of the file 173b9807277SJilles Tjoelkername phase. 1744b88c807SRodney W. Grimes 1754b88c807SRodney W. GrimesEXECUTION: Command execution is handled by the following files: 1764b88c807SRodney W. Grimes eval.c The top level routines. 1774b88c807SRodney W. Grimes redir.c Code to handle redirection of input and output. 1784b88c807SRodney W. Grimes jobs.c Code to handle forking, waiting, and job control. 1790ef05a46SJens Schweikhardt exec.c Code to do path searches and the actual exec sys call. 1804b88c807SRodney W. Grimes expand.c Code to evaluate arguments. 1814b88c807SRodney W. Grimes var.c Maintains the variable symbol table. Called from expand.c. 1824b88c807SRodney W. Grimes 1834b88c807SRodney W. GrimesEVAL.C: Evaltree recursively executes a parse tree. The exit 1844b88c807SRodney W. Grimesstatus is returned in the global variable exitstatus. The alter- 1854b88c807SRodney W. Grimesnative entry evalbackcmd is called to evaluate commands in back 1864b88c807SRodney W. Grimesquotes. It saves the result in memory if the command is a buil- 1874b88c807SRodney W. Grimestin; otherwise it forks off a child to execute the command and 1884b88c807SRodney W. Grimesconnects the standard output of the child to a pipe. 1894b88c807SRodney W. Grimes 1904b88c807SRodney W. GrimesJOBS.C: To create a process, you call makejob to return a job 1914b88c807SRodney W. Grimesstructure, and then call forkshell (passing the job structure as 1924b88c807SRodney W. Grimesan argument) to create the process. Waitforjob waits for a job 1934b88c807SRodney W. Grimesto complete. These routines take care of process groups if job 1944b88c807SRodney W. Grimescontrol is defined. 1954b88c807SRodney W. Grimes 1964b88c807SRodney W. GrimesREDIR.C: Ash allows file descriptors to be redirected and then 1974b88c807SRodney W. Grimesrestored without forking off a child process. This is accom- 1984b88c807SRodney W. Grimesplished by duplicating the original file descriptors. The redir- 1990ef05a46SJens Schweikhardttab structure records where the file descriptors have been dupli- 2004b88c807SRodney W. Grimescated to. 2014b88c807SRodney W. Grimes 2024b88c807SRodney W. GrimesEXEC.C: The routine find_command locates a command, and enters 2034b88c807SRodney W. Grimesthe command in the hash table if it is not already there. The 2044b88c807SRodney W. Grimesthird argument specifies whether it is to print an error message 2054b88c807SRodney W. Grimesif the command is not found. (When a pipeline is set up, 2064b88c807SRodney W. Grimesfind_command is called for all the commands in the pipeline be- 2074b88c807SRodney W. Grimesfore any forking is done, so to get the commands into the hash 2084b88c807SRodney W. Grimestable of the parent process. But to make command hashing as 2094b88c807SRodney W. Grimestransparent as possible, we silently ignore errors at that point 2104b88c807SRodney W. Grimesand only print error messages if the command cannot be found 2114b88c807SRodney W. Grimeslater.) 2124b88c807SRodney W. Grimes 2134b88c807SRodney W. GrimesThe routine shellexec is the interface to the exec system call. 2144b88c807SRodney W. Grimes 215b9807277SJilles TjoelkerEXPAND.C: As the routine argstr generates words by parameter 216b9807277SJilles Tjoelkerexpansion, command substitution and arithmetic expansion, it 217b9807277SJilles Tjoelkerperforms word splitting on the result. As each word is output, 218b9807277SJilles Tjoelkerthe routine expandmeta performs file name generation (if enabled). 2194b88c807SRodney W. Grimes 2204b88c807SRodney W. GrimesVAR.C: Variables are stored in a hash table. Probably we should 2214b88c807SRodney W. Grimesswitch to extensible hashing. The variable name is stored in the 2224b88c807SRodney W. Grimessame string as the value (using the format "name=value") so that 2234b88c807SRodney W. Grimesno string copying is needed to create the environment of a com- 2244b88c807SRodney W. Grimesmand. Variables which the shell references internally are preal- 2254b88c807SRodney W. Grimeslocated so that the shell can reference the values of these vari- 2264b88c807SRodney W. Grimesables without doing a lookup. 2274b88c807SRodney W. Grimes 2284b88c807SRodney W. GrimesWhen a program is run, the code in eval.c sticks any environment 2294b88c807SRodney W. Grimesvariables which precede the command (as in "PATH=xxx command") in 2304b88c807SRodney W. Grimesthe variable table as the simplest way to strip duplicates, and 2314b88c807SRodney W. Grimesthen calls "environment" to get the value of the environment. 2324b88c807SRodney W. Grimes 2334b88c807SRodney W. GrimesBUILTIN COMMANDS: The procedures for handling these are scat- 2344b88c807SRodney W. Grimestered throughout the code, depending on which location appears 2354b88c807SRodney W. Grimesmost appropriate. They can be recognized because their names al- 2364b88c807SRodney W. Grimesways end in "cmd". The mapping from names to procedures is 237b9807277SJilles Tjoelkerspecified in the file builtins.def, which is processed by the 238b9807277SJilles Tjoelkermkbuiltins command. 2394b88c807SRodney W. Grimes 2404b88c807SRodney W. GrimesA builtin command is invoked with argc and argv set up like a 2414b88c807SRodney W. Grimesnormal program. A builtin command is allowed to overwrite its 2424b88c807SRodney W. Grimesarguments. Builtin routines can call nextopt to do option pars- 2434b88c807SRodney W. Grimesing. This is kind of like getopt, but you don't pass argc and 2444b88c807SRodney W. Grimesargv to it. Builtin routines can also call error. This routine 2454b88c807SRodney W. Grimesnormally terminates the shell (or returns to the main command 246b9807277SJilles Tjoelkerloop if the shell is interactive), but when called from a non- 247b9807277SJilles Tjoelkerspecial builtin command it causes the builtin command to 248b9807277SJilles Tjoelkerterminate with an exit status of 2. 2494b88c807SRodney W. Grimes 2504b88c807SRodney W. GrimesThe directory bltins contains commands which can be compiled in- 2514b88c807SRodney W. Grimesdependently but can also be built into the shell for efficiency 252b9807277SJilles Tjoelkerreasons. The header file bltin.h takes care of most of the 253b9807277SJilles Tjoelkerdifferences between the ash and the stand-alone environment. 254b9807277SJilles TjoelkerThe user should call the main routine "main", and #define main to 255b9807277SJilles Tjoelkerbe the name of the routine to use when the program is linked into 256b9807277SJilles Tjoelkerash. This #define should appear before bltin.h is included; 257b9807277SJilles Tjoelkerbltin.h will #undef main if the program is to be compiled 258b9807277SJilles Tjoelkerstand-alone. A similar approach is used for a few utilities from 259b9807277SJilles Tjoelkerbin and usr.bin. 2604b88c807SRodney W. Grimes 261241f15dfSJilles TjoelkerCD.C: This file defines the cd and pwd builtins. 2624b88c807SRodney W. Grimes 2634b88c807SRodney W. GrimesSIGNALS: Trap.c implements the trap command. The routine set- 2644b88c807SRodney W. Grimessignal figures out what action should be taken when a signal is 2654b88c807SRodney W. Grimesreceived and invokes the signal system call to set the signal ac- 2664b88c807SRodney W. Grimestion appropriately. When a signal that a user has set a trap for 2674b88c807SRodney W. Grimesis caught, the routine "onsig" sets a flag. The routine dotrap 2684b88c807SRodney W. Grimesis called at appropriate points to actually handle the signal. 2694b88c807SRodney W. GrimesWhen an interrupt is caught and no trap has been set for that 2704b88c807SRodney W. Grimessignal, the routine "onint" in error.c is called. 2714b88c807SRodney W. Grimes 272b9807277SJilles TjoelkerOUTPUT: Ash uses its own output routines. There are three out- 2734b88c807SRodney W. Grimesput structures allocated. "Output" represents the standard out- 2744b88c807SRodney W. Grimesput, "errout" the standard error, and "memout" contains output 2754b88c807SRodney W. Grimeswhich is to be stored in memory. This last is used when a buil- 2764b88c807SRodney W. Grimestin command appears in backquotes, to allow its output to be col- 2774b88c807SRodney W. Grimeslected without doing any I/O through the UNIX operating system. 2784b88c807SRodney W. GrimesThe variables out1 and out2 normally point to output and errout, 2794b88c807SRodney W. Grimesrespectively, but they are set to point to memout when appropri- 2804b88c807SRodney W. Grimesate inside backquotes. 2814b88c807SRodney W. Grimes 2824b88c807SRodney W. GrimesINPUT: The basic input routine is pgetc, which reads from the 2834b88c807SRodney W. Grimescurrent input file. There is a stack of input files; the current 2844b88c807SRodney W. Grimesinput file is the top file on this stack. The code allows the 2854b88c807SRodney W. Grimesinput to come from a string rather than a file. (This is for the 2864b88c807SRodney W. Grimes-c option and the "." and eval builtin commands.) The global 2874b88c807SRodney W. Grimesvariable plinno is saved and restored when files are pushed and 2884b88c807SRodney W. Grimespopped from the stack. The parser routines store the number of 2894b88c807SRodney W. Grimesthe current line in this variable. 2904b88c807SRodney W. Grimes 2914b88c807SRodney W. GrimesDEBUGGING: If DEBUG is defined in shell.h, then the shell will 2924b88c807SRodney W. Grimeswrite debugging information to the file $HOME/trace. Most of 2934b88c807SRodney W. Grimesthis is done using the TRACE macro, which takes a set of printf 2944b88c807SRodney W. Grimesarguments inside two sets of parenthesis. Example: 2954b88c807SRodney W. Grimes"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 2964b88c807SRodney W. Grimescause the preprocessor can't handle functions with a variable 2974b88c807SRodney W. Grimesnumber of arguments. Defining DEBUG also causes the shell to 2984b88c807SRodney W. Grimesgenerate a core dump if it is sent a quit signal. The tracing 2994b88c807SRodney W. Grimescode is in show.c. 300