14b88c807SRodney W. Grimes# @(#)TOUR 8.1 (Berkeley) 5/31/93 22a456239SPeter Wemm# $FreeBSD$ 34b88c807SRodney W. Grimes 44b88c807SRodney W. GrimesNOTE -- This is the original TOUR paper distributed with ash and 54b88c807SRodney W. Grimesdoes not represent the current state of the shell. It is provided anyway 64b88c807SRodney W. Grimessince it provides helpful information for how the shell is structured, 74b88c807SRodney W. Grimesbut be warned that things have changed -- the current shell is 84b88c807SRodney W. Grimesstill under development. 94b88c807SRodney W. Grimes 104b88c807SRodney W. Grimes================================================================ 114b88c807SRodney W. Grimes 124b88c807SRodney W. Grimes A Tour through Ash 134b88c807SRodney W. Grimes 144b88c807SRodney W. Grimes Copyright 1989 by Kenneth Almquist. 154b88c807SRodney W. Grimes 164b88c807SRodney W. Grimes 174b88c807SRodney W. GrimesDIRECTORIES: The subdirectory bltin contains commands which can 184b88c807SRodney W. Grimesbe compiled stand-alone. The rest of the source is in the main 194b88c807SRodney W. Grimesash directory. 204b88c807SRodney W. Grimes 214b88c807SRodney W. GrimesSOURCE CODE GENERATORS: Files whose names begin with "mk" are 224b88c807SRodney W. Grimesprograms that generate source code. A complete list of these 234b88c807SRodney W. Grimesprograms is: 244b88c807SRodney W. Grimes 250ef05a46SJens Schweikhardt program input files generates 260ef05a46SJens Schweikhardt ------- ----------- --------- 274b88c807SRodney W. Grimes mkbuiltins builtins builtins.h builtins.c 284b88c807SRodney W. Grimes mknodes nodetypes nodes.h nodes.c 294b88c807SRodney W. Grimes mksyntax - syntax.h syntax.c 30aa9caaf6SPeter Wemm mktokens - token.h 314b88c807SRodney W. Grimes 32*338b821bSJilles TjoelkerThere are undoubtedly too many of these. 334b88c807SRodney W. Grimes 344b88c807SRodney W. GrimesEXCEPTIONS: Code for dealing with exceptions appears in 354b88c807SRodney W. Grimesexceptions.c. The C language doesn't include exception handling, 364b88c807SRodney W. Grimesso I implement it using setjmp and longjmp. The global variable 374b88c807SRodney W. Grimesexception contains the type of exception. EXERROR is raised by 383835f47cSJilles Tjoelkercalling error. EXINT is an interrupt. 394b88c807SRodney W. Grimes 404b88c807SRodney W. GrimesINTERRUPTS: In an interactive shell, an interrupt will cause an 414b88c807SRodney W. GrimesEXINT exception to return to the main command loop. (Exception: 424b88c807SRodney W. GrimesEXINT is not raised if the user traps interrupts using the trap 434b88c807SRodney W. Grimescommand.) The INTOFF and INTON macros (defined in exception.h) 440ef05a46SJens Schweikhardtprovide uninterruptible critical sections. Between the execution 454b88c807SRodney W. Grimesof INTOFF and the execution of INTON, interrupt signals will be 464b88c807SRodney W. Grimesheld for later delivery. INTOFF and INTON can be nested. 474b88c807SRodney W. Grimes 484b88c807SRodney W. GrimesMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 494b88c807SRodney W. Grimeswhich call error when there is no memory left. It also defines a 504b88c807SRodney W. Grimesstack oriented memory allocation scheme. Allocating off a stack 514b88c807SRodney W. Grimesis probably more efficient than allocation using malloc, but the 524b88c807SRodney W. Grimesbig advantage is that when an exception occurs all we have to do 534b88c807SRodney W. Grimesto free up the memory in use at the time of the exception is to 544b88c807SRodney W. Grimesrestore the stack pointer. The stack is implemented using a 554b88c807SRodney W. Grimeslinked list of blocks. 564b88c807SRodney W. Grimes 574b88c807SRodney W. GrimesSTPUTC: If the stack were contiguous, it would be easy to store 584b88c807SRodney W. Grimesstrings on the stack without knowing in advance how long the 594b88c807SRodney W. Grimesstring was going to be: 604b88c807SRodney W. Grimes p = stackptr; 614b88c807SRodney W. Grimes *p++ = c; /* repeated as many times as needed */ 624b88c807SRodney W. Grimes stackptr = p; 630ef05a46SJens SchweikhardtThe following three macros (defined in memalloc.h) perform these 644b88c807SRodney W. Grimesoperations, but grow the stack if you run off the end: 654b88c807SRodney W. Grimes STARTSTACKSTR(p); 664b88c807SRodney W. Grimes STPUTC(c, p); /* repeated as many times as needed */ 674b88c807SRodney W. Grimes grabstackstr(p); 684b88c807SRodney W. Grimes 694b88c807SRodney W. GrimesWe now start a top-down look at the code: 704b88c807SRodney W. Grimes 714b88c807SRodney W. GrimesMAIN.C: The main routine performs some initialization, executes 720ef05a46SJens Schweikhardtthe user's profile if necessary, and calls cmdloop. Cmdloop 734b88c807SRodney W. Grimesrepeatedly parses and executes commands. 744b88c807SRodney W. Grimes 754b88c807SRodney W. GrimesOPTIONS.C: This file contains the option processing code. It is 764b88c807SRodney W. Grimescalled from main to parse the shell arguments when the shell is 77241f15dfSJilles Tjoelkerinvoked, and it also contains the set builtin. The -i and -m op- 784b88c807SRodney W. Grimestions (the latter turns on job control) require changes in signal 794b88c807SRodney W. Grimeshandling. The routines setjobctl (in jobs.c) and setinteractive 804b88c807SRodney W. Grimes(in trap.c) are called to handle changes to these options. 814b88c807SRodney W. Grimes 824b88c807SRodney W. GrimesPARSING: The parser code is all in parser.c. A recursive des- 834b88c807SRodney W. Grimescent parser is used. Syntax tables (generated by mksyntax) are 844b88c807SRodney W. Grimesused to classify characters during lexical analysis. There are 85241f15dfSJilles Tjoelkerfour tables: one for normal use, one for use when inside single 86241f15dfSJilles Tjoelkerquotes and dollar single quotes, one for use when inside double 87241f15dfSJilles Tjoelkerquotes and one for use in arithmetic. The tables are machine 88241f15dfSJilles Tjoelkerdependent because they are indexed by character variables and 89241f15dfSJilles Tjoelkerthe range of a char varies from machine to machine. 904b88c807SRodney W. Grimes 914b88c807SRodney W. GrimesPARSE OUTPUT: The output of the parser consists of a tree of 924b88c807SRodney W. Grimesnodes. The various types of nodes are defined in the file node- 934b88c807SRodney W. Grimestypes. 944b88c807SRodney W. Grimes 954b88c807SRodney W. GrimesNodes of type NARG are used to represent both words and the con- 964b88c807SRodney W. Grimestents of here documents. An early version of ash kept the con- 974b88c807SRodney W. Grimestents of here documents in temporary files, but keeping here do- 984b88c807SRodney W. Grimescuments in memory typically results in significantly better per- 994b88c807SRodney W. Grimesformance. It would have been nice to make it an option to use 1004b88c807SRodney W. Grimestemporary files for here documents, for the benefit of small 1014b88c807SRodney W. Grimesmachines, but the code to keep track of when to delete the tem- 1024b88c807SRodney W. Grimesporary files was complex and I never fixed all the bugs in it. 1034b88c807SRodney W. Grimes(AT&T has been maintaining the Bourne shell for more than ten 1044b88c807SRodney W. Grimesyears, and to the best of my knowledge they still haven't gotten 1054b88c807SRodney W. Grimesit to handle temporary files correctly in obscure cases.) 1064b88c807SRodney W. Grimes 1074b88c807SRodney W. GrimesThe text field of a NARG structure points to the text of the 1084b88c807SRodney W. Grimesword. The text consists of ordinary characters and a number of 1094b88c807SRodney W. Grimesspecial codes defined in parser.h. The special codes are: 1104b88c807SRodney W. Grimes 1114b88c807SRodney W. Grimes CTLVAR Variable substitution 1124b88c807SRodney W. Grimes CTLENDVAR End of variable substitution 1134b88c807SRodney W. Grimes CTLBACKQ Command substitution 1144b88c807SRodney W. Grimes CTLBACKQ|CTLQUOTE Command substitution inside double quotes 1154b88c807SRodney W. Grimes CTLESC Escape next character 1164b88c807SRodney W. Grimes 1174b88c807SRodney W. GrimesA variable substitution contains the following elements: 1184b88c807SRodney W. Grimes 1194b88c807SRodney W. Grimes CTLVAR type name '=' [ alternative-text CTLENDVAR ] 1204b88c807SRodney W. Grimes 1214b88c807SRodney W. GrimesThe type field is a single character specifying the type of sub- 1224b88c807SRodney W. Grimesstitution. The possible types are: 1234b88c807SRodney W. Grimes 1244b88c807SRodney W. Grimes VSNORMAL $var 1254b88c807SRodney W. Grimes VSMINUS ${var-text} 1264b88c807SRodney W. Grimes VSMINUS|VSNUL ${var:-text} 1274b88c807SRodney W. Grimes VSPLUS ${var+text} 1284b88c807SRodney W. Grimes VSPLUS|VSNUL ${var:+text} 1294b88c807SRodney W. Grimes VSQUESTION ${var?text} 1304b88c807SRodney W. Grimes VSQUESTION|VSNUL ${var:?text} 1314b88c807SRodney W. Grimes VSASSIGN ${var=text} 1320ef05a46SJens Schweikhardt VSASSIGN|VSNUL ${var:=text} 1334b88c807SRodney W. Grimes 1344b88c807SRodney W. GrimesIn addition, the type field will have the VSQUOTE flag set if the 1354b88c807SRodney W. Grimesvariable is enclosed in double quotes. The name of the variable 1364b88c807SRodney W. Grimescomes next, terminated by an equals sign. If the type is not 1374b88c807SRodney W. GrimesVSNORMAL, then the text field in the substitution follows, ter- 1384b88c807SRodney W. Grimesminated by a CTLENDVAR byte. 1394b88c807SRodney W. Grimes 1404b88c807SRodney W. GrimesCommands in back quotes are parsed and stored in a linked list. 1414b88c807SRodney W. GrimesThe locations of these commands in the string are indicated by 1424b88c807SRodney W. GrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 1434b88c807SRodney W. Grimesthe back quotes were enclosed in double quotes. 1444b88c807SRodney W. Grimes 1454b88c807SRodney W. GrimesThe character CTLESC escapes the next character, so that in case 1464b88c807SRodney W. Grimesany of the CTL characters mentioned above appear in the input, 1474b88c807SRodney W. Grimesthey can be passed through transparently. CTLESC is also used to 1484b88c807SRodney W. Grimesescape '*', '?', '[', and '!' characters which were quoted by the 1494b88c807SRodney W. Grimesuser and thus should not be used for file name generation. 1504b88c807SRodney W. Grimes 1514b88c807SRodney W. GrimesCTLESC characters have proved to be particularly tricky to get 1524b88c807SRodney W. Grimesright. In the case of here documents which are not subject to 1534b88c807SRodney W. Grimesvariable and command substitution, the parser doesn't insert any 1544b88c807SRodney W. GrimesCTLESC characters to begin with (so the contents of the text 1554b88c807SRodney W. Grimesfield can be written without any processing). Other here docu- 1564b88c807SRodney W. Grimesments, and words which are not subject to splitting and file name 1574b88c807SRodney W. Grimesgeneration, have the CTLESC characters removed during the vari- 1580ef05a46SJens Schweikhardtable and command substitution phase. Words which are subject to 1594b88c807SRodney W. Grimessplitting and file name generation have the CTLESC characters re- 1604b88c807SRodney W. Grimesmoved as part of the file name phase. 1614b88c807SRodney W. Grimes 1624b88c807SRodney W. GrimesEXECUTION: Command execution is handled by the following files: 1634b88c807SRodney W. Grimes eval.c The top level routines. 1644b88c807SRodney W. Grimes redir.c Code to handle redirection of input and output. 1654b88c807SRodney W. Grimes jobs.c Code to handle forking, waiting, and job control. 1660ef05a46SJens Schweikhardt exec.c Code to do path searches and the actual exec sys call. 1674b88c807SRodney W. Grimes expand.c Code to evaluate arguments. 1684b88c807SRodney W. Grimes var.c Maintains the variable symbol table. Called from expand.c. 1694b88c807SRodney W. Grimes 1704b88c807SRodney W. GrimesEVAL.C: Evaltree recursively executes a parse tree. The exit 1714b88c807SRodney W. Grimesstatus is returned in the global variable exitstatus. The alter- 1724b88c807SRodney W. Grimesnative entry evalbackcmd is called to evaluate commands in back 1734b88c807SRodney W. Grimesquotes. It saves the result in memory if the command is a buil- 1744b88c807SRodney W. Grimestin; otherwise it forks off a child to execute the command and 1754b88c807SRodney W. Grimesconnects the standard output of the child to a pipe. 1764b88c807SRodney W. Grimes 1774b88c807SRodney W. GrimesJOBS.C: To create a process, you call makejob to return a job 1784b88c807SRodney W. Grimesstructure, and then call forkshell (passing the job structure as 1794b88c807SRodney W. Grimesan argument) to create the process. Waitforjob waits for a job 1804b88c807SRodney W. Grimesto complete. These routines take care of process groups if job 1814b88c807SRodney W. Grimescontrol is defined. 1824b88c807SRodney W. Grimes 1834b88c807SRodney W. GrimesREDIR.C: Ash allows file descriptors to be redirected and then 1844b88c807SRodney W. Grimesrestored without forking off a child process. This is accom- 1854b88c807SRodney W. Grimesplished by duplicating the original file descriptors. The redir- 1860ef05a46SJens Schweikhardttab structure records where the file descriptors have been dupli- 1874b88c807SRodney W. Grimescated to. 1884b88c807SRodney W. Grimes 1894b88c807SRodney W. GrimesEXEC.C: The routine find_command locates a command, and enters 1904b88c807SRodney W. Grimesthe command in the hash table if it is not already there. The 1914b88c807SRodney W. Grimesthird argument specifies whether it is to print an error message 1924b88c807SRodney W. Grimesif the command is not found. (When a pipeline is set up, 1934b88c807SRodney W. Grimesfind_command is called for all the commands in the pipeline be- 1944b88c807SRodney W. Grimesfore any forking is done, so to get the commands into the hash 1954b88c807SRodney W. Grimestable of the parent process. But to make command hashing as 1964b88c807SRodney W. Grimestransparent as possible, we silently ignore errors at that point 1974b88c807SRodney W. Grimesand only print error messages if the command cannot be found 1984b88c807SRodney W. Grimeslater.) 1994b88c807SRodney W. Grimes 2004b88c807SRodney W. GrimesThe routine shellexec is the interface to the exec system call. 2014b88c807SRodney W. Grimes 2024b88c807SRodney W. GrimesEXPAND.C: Arguments are processed in three passes. The first 2034b88c807SRodney W. Grimes(performed by the routine argstr) performs variable and command 2044b88c807SRodney W. Grimessubstitution. The second (ifsbreakup) performs word splitting 205241f15dfSJilles Tjoelkerand the third (expandmeta) performs file name generation. 2064b88c807SRodney W. Grimes 2074b88c807SRodney W. GrimesVAR.C: Variables are stored in a hash table. Probably we should 2084b88c807SRodney W. Grimesswitch to extensible hashing. The variable name is stored in the 2094b88c807SRodney W. Grimessame string as the value (using the format "name=value") so that 2104b88c807SRodney W. Grimesno string copying is needed to create the environment of a com- 2114b88c807SRodney W. Grimesmand. Variables which the shell references internally are preal- 2124b88c807SRodney W. Grimeslocated so that the shell can reference the values of these vari- 2134b88c807SRodney W. Grimesables without doing a lookup. 2144b88c807SRodney W. Grimes 2154b88c807SRodney W. GrimesWhen a program is run, the code in eval.c sticks any environment 2164b88c807SRodney W. Grimesvariables which precede the command (as in "PATH=xxx command") in 2174b88c807SRodney W. Grimesthe variable table as the simplest way to strip duplicates, and 2184b88c807SRodney W. Grimesthen calls "environment" to get the value of the environment. 2194b88c807SRodney W. Grimes 2204b88c807SRodney W. GrimesBUILTIN COMMANDS: The procedures for handling these are scat- 2214b88c807SRodney W. Grimestered throughout the code, depending on which location appears 2224b88c807SRodney W. Grimesmost appropriate. They can be recognized because their names al- 2234b88c807SRodney W. Grimesways end in "cmd". The mapping from names to procedures is 2240ef05a46SJens Schweikhardtspecified in the file builtins, which is processed by the mkbuilt- 2250ef05a46SJens Schweikhardtins command. 2264b88c807SRodney W. Grimes 2274b88c807SRodney W. GrimesA builtin command is invoked with argc and argv set up like a 2284b88c807SRodney W. Grimesnormal program. A builtin command is allowed to overwrite its 2294b88c807SRodney W. Grimesarguments. Builtin routines can call nextopt to do option pars- 2304b88c807SRodney W. Grimesing. This is kind of like getopt, but you don't pass argc and 2314b88c807SRodney W. Grimesargv to it. Builtin routines can also call error. This routine 2324b88c807SRodney W. Grimesnormally terminates the shell (or returns to the main command 2334b88c807SRodney W. Grimesloop if the shell is interactive), but when called from a builtin 2344b88c807SRodney W. Grimescommand it causes the builtin command to terminate with an exit 2354b88c807SRodney W. Grimesstatus of 2. 2364b88c807SRodney W. Grimes 2374b88c807SRodney W. GrimesThe directory bltins contains commands which can be compiled in- 2384b88c807SRodney W. Grimesdependently but can also be built into the shell for efficiency 2394b88c807SRodney W. Grimesreasons. The makefile in this directory compiles these programs 2404b88c807SRodney W. Grimesin the normal fashion (so that they can be run regardless of 2414b88c807SRodney W. Grimeswhether the invoker is ash), but also creates a library named 2424b88c807SRodney W. Grimesbltinlib.a which can be linked with ash. The header file bltin.h 2434b88c807SRodney W. Grimestakes care of most of the differences between the ash and the 2444b88c807SRodney W. Grimesstand-alone environment. The user should call the main routine 2454b88c807SRodney W. Grimes"main", and #define main to be the name of the routine to use 2464b88c807SRodney W. Grimeswhen the program is linked into ash. This #define should appear 2474b88c807SRodney W. Grimesbefore bltin.h is included; bltin.h will #undef main if the pro- 2484b88c807SRodney W. Grimesgram is to be compiled stand-alone. 2494b88c807SRodney W. Grimes 250241f15dfSJilles TjoelkerCD.C: This file defines the cd and pwd builtins. 2514b88c807SRodney W. Grimes 2524b88c807SRodney W. GrimesSIGNALS: Trap.c implements the trap command. The routine set- 2534b88c807SRodney W. Grimessignal figures out what action should be taken when a signal is 2544b88c807SRodney W. Grimesreceived and invokes the signal system call to set the signal ac- 2554b88c807SRodney W. Grimestion appropriately. When a signal that a user has set a trap for 2564b88c807SRodney W. Grimesis caught, the routine "onsig" sets a flag. The routine dotrap 2574b88c807SRodney W. Grimesis called at appropriate points to actually handle the signal. 2584b88c807SRodney W. GrimesWhen an interrupt is caught and no trap has been set for that 2594b88c807SRodney W. Grimessignal, the routine "onint" in error.c is called. 2604b88c807SRodney W. Grimes 2614b88c807SRodney W. GrimesOUTPUT: Ash uses it's own output routines. There are three out- 2624b88c807SRodney W. Grimesput structures allocated. "Output" represents the standard out- 2634b88c807SRodney W. Grimesput, "errout" the standard error, and "memout" contains output 2644b88c807SRodney W. Grimeswhich is to be stored in memory. This last is used when a buil- 2654b88c807SRodney W. Grimestin command appears in backquotes, to allow its output to be col- 2664b88c807SRodney W. Grimeslected without doing any I/O through the UNIX operating system. 2674b88c807SRodney W. GrimesThe variables out1 and out2 normally point to output and errout, 2684b88c807SRodney W. Grimesrespectively, but they are set to point to memout when appropri- 2694b88c807SRodney W. Grimesate inside backquotes. 2704b88c807SRodney W. Grimes 2714b88c807SRodney W. GrimesINPUT: The basic input routine is pgetc, which reads from the 2724b88c807SRodney W. Grimescurrent input file. There is a stack of input files; the current 2734b88c807SRodney W. Grimesinput file is the top file on this stack. The code allows the 2744b88c807SRodney W. Grimesinput to come from a string rather than a file. (This is for the 2754b88c807SRodney W. Grimes-c option and the "." and eval builtin commands.) The global 2764b88c807SRodney W. Grimesvariable plinno is saved and restored when files are pushed and 2774b88c807SRodney W. Grimespopped from the stack. The parser routines store the number of 2784b88c807SRodney W. Grimesthe current line in this variable. 2794b88c807SRodney W. Grimes 2804b88c807SRodney W. GrimesDEBUGGING: If DEBUG is defined in shell.h, then the shell will 2814b88c807SRodney W. Grimeswrite debugging information to the file $HOME/trace. Most of 2824b88c807SRodney W. Grimesthis is done using the TRACE macro, which takes a set of printf 2834b88c807SRodney W. Grimesarguments inside two sets of parenthesis. Example: 2844b88c807SRodney W. Grimes"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 2854b88c807SRodney W. Grimescause the preprocessor can't handle functions with a variable 2864b88c807SRodney W. Grimesnumber of arguments. Defining DEBUG also causes the shell to 2874b88c807SRodney W. Grimesgenerate a core dump if it is sent a quit signal. The tracing 2884b88c807SRodney W. Grimescode is in show.c. 289