xref: /freebsd/bin/sh/TOUR (revision 90aea514c6249118e880d75972d063362f4bf492)
14b88c807SRodney W. Grimes
24b88c807SRodney W. GrimesNOTE -- This is the original TOUR paper distributed with ash and
34b88c807SRodney W. Grimesdoes not represent the current state of the shell.  It is provided anyway
44b88c807SRodney W. Grimessince it provides helpful information for how the shell is structured,
54b88c807SRodney W. Grimesbut be warned that things have changed -- the current shell is
64b88c807SRodney W. Grimesstill under development.
74b88c807SRodney W. Grimes
84b88c807SRodney W. Grimes================================================================
94b88c807SRodney W. Grimes
104b88c807SRodney W. Grimes                       A Tour through Ash
114b88c807SRodney W. Grimes
124b88c807SRodney W. Grimes               Copyright 1989 by Kenneth Almquist.
134b88c807SRodney W. Grimes
144b88c807SRodney W. Grimes
154b88c807SRodney W. GrimesDIRECTORIES:  The subdirectory bltin contains commands which can
164b88c807SRodney W. Grimesbe compiled stand-alone.  The rest of the source is in the main
174b88c807SRodney W. Grimesash directory.
184b88c807SRodney W. Grimes
194b88c807SRodney W. GrimesSOURCE CODE GENERATORS:  Files whose names begin with "mk" are
204b88c807SRodney W. Grimesprograms that generate source code.  A complete list of these
214b88c807SRodney W. Grimesprograms is:
224b88c807SRodney W. Grimes
230ef05a46SJens Schweikhardt        program         input files         generates
240ef05a46SJens Schweikhardt        -------         -----------         ---------
25b9807277SJilles Tjoelker        mkbuiltins      builtins.def        builtins.h builtins.c
264b88c807SRodney W. Grimes        mknodes         nodetypes           nodes.h nodes.c
274b88c807SRodney W. Grimes        mksyntax            -               syntax.h syntax.c
28aa9caaf6SPeter Wemm        mktokens            -               token.h
294b88c807SRodney W. Grimes
30338b821bSJilles TjoelkerThere are undoubtedly too many of these.
314b88c807SRodney W. Grimes
324b88c807SRodney W. GrimesEXCEPTIONS:  Code for dealing with exceptions appears in
334b88c807SRodney W. Grimesexceptions.c.  The C language doesn't include exception handling,
344b88c807SRodney W. Grimesso I implement it using setjmp and longjmp.  The global variable
354b88c807SRodney W. Grimesexception contains the type of exception.  EXERROR is raised by
36*bb324af6SJilles Tjoelkercalling error or errorwithstatus.  EXINT is an interrupt.
374b88c807SRodney W. Grimes
384b88c807SRodney W. GrimesINTERRUPTS:  In an interactive shell, an interrupt will cause an
394b88c807SRodney W. GrimesEXINT exception to return to the main command loop.  (Exception:
404b88c807SRodney W. GrimesEXINT is not raised if the user traps interrupts using the trap
414b88c807SRodney W. Grimescommand.)  The INTOFF and INTON macros (defined in exception.h)
420ef05a46SJens Schweikhardtprovide uninterruptible critical sections.  Between the execution
434b88c807SRodney W. Grimesof INTOFF and the execution of INTON, interrupt signals will be
444b88c807SRodney W. Grimesheld for later delivery.  INTOFF and INTON can be nested.
454b88c807SRodney W. Grimes
464b88c807SRodney W. GrimesMEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
474b88c807SRodney W. Grimeswhich call error when there is no memory left.  It also defines a
484b88c807SRodney W. Grimesstack oriented memory allocation scheme.  Allocating off a stack
494b88c807SRodney W. Grimesis probably more efficient than allocation using malloc, but the
504b88c807SRodney W. Grimesbig advantage is that when an exception occurs all we have to do
514b88c807SRodney W. Grimesto free up the memory in use at the time of the exception is to
524b88c807SRodney W. Grimesrestore the stack pointer.  The stack is implemented using a
534b88c807SRodney W. Grimeslinked list of blocks.
544b88c807SRodney W. Grimes
554b88c807SRodney W. GrimesSTPUTC:  If the stack were contiguous, it would be easy to store
564b88c807SRodney W. Grimesstrings on the stack without knowing in advance how long the
574b88c807SRodney W. Grimesstring was going to be:
584b88c807SRodney W. Grimes        p = stackptr;
594b88c807SRodney W. Grimes        *p++ = c;       /* repeated as many times as needed */
604b88c807SRodney W. Grimes        stackptr = p;
610ef05a46SJens SchweikhardtThe following three macros (defined in memalloc.h) perform these
624b88c807SRodney W. Grimesoperations, but grow the stack if you run off the end:
634b88c807SRodney W. Grimes        STARTSTACKSTR(p);
644b88c807SRodney W. Grimes        STPUTC(c, p);   /* repeated as many times as needed */
654b88c807SRodney W. Grimes        grabstackstr(p);
664b88c807SRodney W. Grimes
674b88c807SRodney W. GrimesWe now start a top-down look at the code:
684b88c807SRodney W. Grimes
694b88c807SRodney W. GrimesMAIN.C:  The main routine performs some initialization, executes
700ef05a46SJens Schweikhardtthe user's profile if necessary, and calls cmdloop.  Cmdloop
714b88c807SRodney W. Grimesrepeatedly parses and executes commands.
724b88c807SRodney W. Grimes
734b88c807SRodney W. GrimesOPTIONS.C:  This file contains the option processing code.  It is
744b88c807SRodney W. Grimescalled from main to parse the shell arguments when the shell is
75241f15dfSJilles Tjoelkerinvoked, and it also contains the set builtin.  The -i and -m op-
764b88c807SRodney W. Grimestions (the latter turns on job control) require changes in signal
774b88c807SRodney W. Grimeshandling.  The routines setjobctl (in jobs.c) and setinteractive
784b88c807SRodney W. Grimes(in trap.c) are called to handle changes to these options.
794b88c807SRodney W. Grimes
804b88c807SRodney W. GrimesPARSING:  The parser code is all in parser.c.  A recursive des-
814b88c807SRodney W. Grimescent parser is used.  Syntax tables (generated by mksyntax) are
824b88c807SRodney W. Grimesused to classify characters during lexical analysis.  There are
83241f15dfSJilles Tjoelkerfour tables:  one for normal use, one for use when inside single
84241f15dfSJilles Tjoelkerquotes and dollar single quotes, one for use when inside double
85241f15dfSJilles Tjoelkerquotes and one for use in arithmetic.  The tables are machine
86241f15dfSJilles Tjoelkerdependent because they are indexed by character variables and
87241f15dfSJilles Tjoelkerthe range of a char varies from machine to machine.
884b88c807SRodney W. Grimes
894b88c807SRodney W. GrimesPARSE OUTPUT:  The output of the parser consists of a tree of
904b88c807SRodney W. Grimesnodes.  The various types of nodes are defined in the file node-
914b88c807SRodney W. Grimestypes.
924b88c807SRodney W. Grimes
934b88c807SRodney W. GrimesNodes of type NARG are used to represent both words and the con-
944b88c807SRodney W. Grimestents of here documents.  An early version of ash kept the con-
954b88c807SRodney W. Grimestents of here documents in temporary files, but keeping here do-
964b88c807SRodney W. Grimescuments in memory typically results in significantly better per-
974b88c807SRodney W. Grimesformance.  It would have been nice to make it an option to use
984b88c807SRodney W. Grimestemporary files for here documents, for the benefit of small
994b88c807SRodney W. Grimesmachines, but the code to keep track of when to delete the tem-
1004b88c807SRodney W. Grimesporary files was complex and I never fixed all the bugs in it.
1014b88c807SRodney W. Grimes(AT&T has been maintaining the Bourne shell for more than ten
1024b88c807SRodney W. Grimesyears, and to the best of my knowledge they still haven't gotten
1034b88c807SRodney W. Grimesit to handle temporary files correctly in obscure cases.)
1044b88c807SRodney W. Grimes
1054b88c807SRodney W. GrimesThe text field of a NARG structure points to the text of the
1064b88c807SRodney W. Grimesword.  The text consists of ordinary characters and a number of
1074b88c807SRodney W. Grimesspecial codes defined in parser.h.  The special codes are:
1084b88c807SRodney W. Grimes
109b9807277SJilles Tjoelker        CTLVAR              Parameter expansion
110b9807277SJilles Tjoelker        CTLENDVAR           End of parameter expansion
1114b88c807SRodney W. Grimes        CTLBACKQ            Command substitution
1124b88c807SRodney W. Grimes        CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
113b9807277SJilles Tjoelker        CTLARI              Arithmetic expansion
114b9807277SJilles Tjoelker        CTLENDARI           End of arithmetic expansion
1154b88c807SRodney W. Grimes        CTLESC              Escape next character
1164b88c807SRodney W. Grimes
1174b88c807SRodney W. GrimesA variable substitution contains the following elements:
1184b88c807SRodney W. Grimes
1194b88c807SRodney W. Grimes        CTLVAR type name '=' [ alternative-text CTLENDVAR ]
1204b88c807SRodney W. Grimes
1214b88c807SRodney W. GrimesThe type field is a single character specifying the type of sub-
1224b88c807SRodney W. Grimesstitution.  The possible types are:
1234b88c807SRodney W. Grimes
1244b88c807SRodney W. Grimes        VSNORMAL            $var
1254b88c807SRodney W. Grimes        VSMINUS             ${var-text}
1264b88c807SRodney W. Grimes        VSMINUS|VSNUL       ${var:-text}
1274b88c807SRodney W. Grimes        VSPLUS              ${var+text}
1284b88c807SRodney W. Grimes        VSPLUS|VSNUL        ${var:+text}
1294b88c807SRodney W. Grimes        VSQUESTION          ${var?text}
1304b88c807SRodney W. Grimes        VSQUESTION|VSNUL    ${var:?text}
1314b88c807SRodney W. Grimes        VSASSIGN            ${var=text}
1320ef05a46SJens Schweikhardt        VSASSIGN|VSNUL      ${var:=text}
133b9807277SJilles Tjoelker        VSTRIMLEFT          ${var#text}
134b9807277SJilles Tjoelker        VSTRIMLEFTMAX       ${var##text}
135b9807277SJilles Tjoelker        VSTRIMRIGHT         ${var%text}
136b9807277SJilles Tjoelker        VSTRIMRIGHTMAX      ${var%%text}
137b9807277SJilles Tjoelker        VSLENGTH            ${#var}
138b9807277SJilles Tjoelker        VSERROR             delayed error
1394b88c807SRodney W. Grimes
1404b88c807SRodney W. GrimesIn addition, the type field will have the VSQUOTE flag set if the
141b9807277SJilles Tjoelkervariable is enclosed in double quotes and the VSLINENO flag if
142b9807277SJilles TjoelkerLINENO is being expanded (the parameter name is the decimal line
143b9807277SJilles Tjoelkernumber).  The parameter's name comes next, terminated by an equals
144b9807277SJilles Tjoelkersign.  If the type is not VSNORMAL (including when it is VSLENGTH),
145b9807277SJilles Tjoelkerthen the text field in the substitution follows, terminated by a
146b9807277SJilles TjoelkerCTLENDVAR byte.
147b9807277SJilles Tjoelker
148b9807277SJilles TjoelkerThe type VSERROR is used to allow parsing bad substitutions like
149b9807277SJilles Tjoelker${var[7]} and generate an error when they are expanded.
1504b88c807SRodney W. Grimes
1514b88c807SRodney W. GrimesCommands in back quotes are parsed and stored in a linked list.
1524b88c807SRodney W. GrimesThe locations of these commands in the string are indicated by
1534b88c807SRodney W. GrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
1544b88c807SRodney W. Grimesthe back quotes were enclosed in double quotes.
1554b88c807SRodney W. Grimes
156b9807277SJilles TjoelkerArithmetic expansion starts with CTLARI and ends with CTLENDARI.
157b9807277SJilles Tjoelker
1584b88c807SRodney W. GrimesThe character CTLESC escapes the next character, so that in case
1594b88c807SRodney W. Grimesany of the CTL characters mentioned above appear in the input,
1604b88c807SRodney W. Grimesthey can be passed through transparently.  CTLESC is also used to
1614b88c807SRodney W. Grimesescape '*', '?', '[', and '!' characters which were quoted by the
1624b88c807SRodney W. Grimesuser and thus should not be used for file name generation.
1634b88c807SRodney W. Grimes
1644b88c807SRodney W. GrimesCTLESC characters have proved to be particularly tricky to get
1654b88c807SRodney W. Grimesright.  In the case of here documents which are not subject to
1664b88c807SRodney W. Grimesvariable and command substitution, the parser doesn't insert any
1674b88c807SRodney W. GrimesCTLESC characters to begin with (so the contents of the text
1684b88c807SRodney W. Grimesfield can be written without any processing).  Other here docu-
169b9807277SJilles Tjoelkerments, and words which are not subject to file name generation,
170b9807277SJilles Tjoelkerhave the CTLESC characters removed during the variable and command
171b9807277SJilles Tjoelkersubstitution phase.  Words which are subject to file name
172b9807277SJilles Tjoelkergeneration have the CTLESC characters removed as part of the file
173b9807277SJilles Tjoelkername phase.
1744b88c807SRodney W. Grimes
1754b88c807SRodney W. GrimesEXECUTION:  Command execution is handled by the following files:
1764b88c807SRodney W. Grimes        eval.c     The top level routines.
1774b88c807SRodney W. Grimes        redir.c    Code to handle redirection of input and output.
1784b88c807SRodney W. Grimes        jobs.c     Code to handle forking, waiting, and job control.
1790ef05a46SJens Schweikhardt        exec.c     Code to do path searches and the actual exec sys call.
1804b88c807SRodney W. Grimes        expand.c   Code to evaluate arguments.
1814b88c807SRodney W. Grimes        var.c      Maintains the variable symbol table.  Called from expand.c.
1824b88c807SRodney W. Grimes
1834b88c807SRodney W. GrimesEVAL.C:  Evaltree recursively executes a parse tree.  The exit
1844b88c807SRodney W. Grimesstatus is returned in the global variable exitstatus.  The alter-
1854b88c807SRodney W. Grimesnative entry evalbackcmd is called to evaluate commands in back
1864b88c807SRodney W. Grimesquotes.  It saves the result in memory if the command is a buil-
1874b88c807SRodney W. Grimestin; otherwise it forks off a child to execute the command and
1884b88c807SRodney W. Grimesconnects the standard output of the child to a pipe.
1894b88c807SRodney W. Grimes
1904b88c807SRodney W. GrimesJOBS.C:  To create a process, you call makejob to return a job
1914b88c807SRodney W. Grimesstructure, and then call forkshell (passing the job structure as
1924b88c807SRodney W. Grimesan argument) to create the process.  Waitforjob waits for a job
1934b88c807SRodney W. Grimesto complete.  These routines take care of process groups if job
1944b88c807SRodney W. Grimescontrol is defined.
1954b88c807SRodney W. Grimes
1964b88c807SRodney W. GrimesREDIR.C:  Ash allows file descriptors to be redirected and then
1974b88c807SRodney W. Grimesrestored without forking off a child process.  This is accom-
1984b88c807SRodney W. Grimesplished by duplicating the original file descriptors.  The redir-
1990ef05a46SJens Schweikhardttab structure records where the file descriptors have been dupli-
2004b88c807SRodney W. Grimescated to.
2014b88c807SRodney W. Grimes
2024b88c807SRodney W. GrimesEXEC.C:  The routine find_command locates a command, and enters
2034b88c807SRodney W. Grimesthe command in the hash table if it is not already there.  The
2044b88c807SRodney W. Grimesthird argument specifies whether it is to print an error message
2054b88c807SRodney W. Grimesif the command is not found.  (When a pipeline is set up,
2064b88c807SRodney W. Grimesfind_command is called for all the commands in the pipeline be-
2074b88c807SRodney W. Grimesfore any forking is done, so to get the commands into the hash
2084b88c807SRodney W. Grimestable of the parent process.  But to make command hashing as
2094b88c807SRodney W. Grimestransparent as possible, we silently ignore errors at that point
2104b88c807SRodney W. Grimesand only print error messages if the command cannot be found
2114b88c807SRodney W. Grimeslater.)
2124b88c807SRodney W. Grimes
2134b88c807SRodney W. GrimesThe routine shellexec is the interface to the exec system call.
2144b88c807SRodney W. Grimes
215b9807277SJilles TjoelkerEXPAND.C:  As the routine argstr generates words by parameter
216b9807277SJilles Tjoelkerexpansion, command substitution and arithmetic expansion, it
217b9807277SJilles Tjoelkerperforms word splitting on the result.  As each word is output,
218b9807277SJilles Tjoelkerthe routine expandmeta performs file name generation (if enabled).
2194b88c807SRodney W. Grimes
2204b88c807SRodney W. GrimesVAR.C:  Variables are stored in a hash table.  Probably we should
2214b88c807SRodney W. Grimesswitch to extensible hashing.  The variable name is stored in the
2224b88c807SRodney W. Grimessame string as the value (using the format "name=value") so that
2234b88c807SRodney W. Grimesno string copying is needed to create the environment of a com-
2244b88c807SRodney W. Grimesmand.  Variables which the shell references internally are preal-
2254b88c807SRodney W. Grimeslocated so that the shell can reference the values of these vari-
2264b88c807SRodney W. Grimesables without doing a lookup.
2274b88c807SRodney W. Grimes
2284b88c807SRodney W. GrimesWhen a program is run, the code in eval.c sticks any environment
2294b88c807SRodney W. Grimesvariables which precede the command (as in "PATH=xxx command") in
2304b88c807SRodney W. Grimesthe variable table as the simplest way to strip duplicates, and
2314b88c807SRodney W. Grimesthen calls "environment" to get the value of the environment.
2324b88c807SRodney W. Grimes
2334b88c807SRodney W. GrimesBUILTIN COMMANDS:  The procedures for handling these are scat-
2344b88c807SRodney W. Grimestered throughout the code, depending on which location appears
2354b88c807SRodney W. Grimesmost appropriate.  They can be recognized because their names al-
2364b88c807SRodney W. Grimesways end in "cmd".  The mapping from names to procedures is
237b9807277SJilles Tjoelkerspecified in the file builtins.def, which is processed by the
238b9807277SJilles Tjoelkermkbuiltins command.
2394b88c807SRodney W. Grimes
2404b88c807SRodney W. GrimesA builtin command is invoked with argc and argv set up like a
2414b88c807SRodney W. Grimesnormal program.  A builtin command is allowed to overwrite its
2424b88c807SRodney W. Grimesarguments.  Builtin routines can call nextopt to do option pars-
2434b88c807SRodney W. Grimesing.  This is kind of like getopt, but you don't pass argc and
2444b88c807SRodney W. Grimesargv to it.  Builtin routines can also call error.  This routine
2454b88c807SRodney W. Grimesnormally terminates the shell (or returns to the main command
246b9807277SJilles Tjoelkerloop if the shell is interactive), but when called from a non-
247b9807277SJilles Tjoelkerspecial builtin command it causes the builtin command to
248b9807277SJilles Tjoelkerterminate with an exit status of 2.
2494b88c807SRodney W. Grimes
2504b88c807SRodney W. GrimesThe directory bltins contains commands which can be compiled in-
2514b88c807SRodney W. Grimesdependently but can also be built into the shell for efficiency
252b9807277SJilles Tjoelkerreasons.  The header file bltin.h takes care of most of the
253b9807277SJilles Tjoelkerdifferences between the ash and the stand-alone environment.
254b9807277SJilles TjoelkerThe user should call the main routine "main", and #define main to
255b9807277SJilles Tjoelkerbe the name of the routine to use when the program is linked into
256b9807277SJilles Tjoelkerash.  This #define should appear before bltin.h is included;
257b9807277SJilles Tjoelkerbltin.h will #undef main if the program is to be compiled
258b9807277SJilles Tjoelkerstand-alone. A similar approach is used for a few utilities from
259b9807277SJilles Tjoelkerbin and usr.bin.
2604b88c807SRodney W. Grimes
261241f15dfSJilles TjoelkerCD.C:  This file defines the cd and pwd builtins.
2624b88c807SRodney W. Grimes
2634b88c807SRodney W. GrimesSIGNALS:  Trap.c implements the trap command.  The routine set-
2644b88c807SRodney W. Grimessignal figures out what action should be taken when a signal is
2654b88c807SRodney W. Grimesreceived and invokes the signal system call to set the signal ac-
2664b88c807SRodney W. Grimestion appropriately.  When a signal that a user has set a trap for
2674b88c807SRodney W. Grimesis caught, the routine "onsig" sets a flag.  The routine dotrap
2684b88c807SRodney W. Grimesis called at appropriate points to actually handle the signal.
2694b88c807SRodney W. GrimesWhen an interrupt is caught and no trap has been set for that
2704b88c807SRodney W. Grimessignal, the routine "onint" in error.c is called.
2714b88c807SRodney W. Grimes
272b9807277SJilles TjoelkerOUTPUT:  Ash uses its own output routines.  There are three out-
2734b88c807SRodney W. Grimesput structures allocated.  "Output" represents the standard out-
2744b88c807SRodney W. Grimesput, "errout" the standard error, and "memout" contains output
2754b88c807SRodney W. Grimeswhich is to be stored in memory.  This last is used when a buil-
2764b88c807SRodney W. Grimestin command appears in backquotes, to allow its output to be col-
2774b88c807SRodney W. Grimeslected without doing any I/O through the UNIX operating system.
2784b88c807SRodney W. GrimesThe variables out1 and out2 normally point to output and errout,
2794b88c807SRodney W. Grimesrespectively, but they are set to point to memout when appropri-
2804b88c807SRodney W. Grimesate inside backquotes.
2814b88c807SRodney W. Grimes
2824b88c807SRodney W. GrimesINPUT:  The basic input routine is pgetc, which reads from the
2834b88c807SRodney W. Grimescurrent input file.  There is a stack of input files; the current
2844b88c807SRodney W. Grimesinput file is the top file on this stack.  The code allows the
2854b88c807SRodney W. Grimesinput to come from a string rather than a file.  (This is for the
2864b88c807SRodney W. Grimes-c option and the "." and eval builtin commands.)  The global
2874b88c807SRodney W. Grimesvariable plinno is saved and restored when files are pushed and
2884b88c807SRodney W. Grimespopped from the stack.  The parser routines store the number of
2894b88c807SRodney W. Grimesthe current line in this variable.
2904b88c807SRodney W. Grimes
2914b88c807SRodney W. GrimesDEBUGGING:  If DEBUG is defined in shell.h, then the shell will
2924b88c807SRodney W. Grimeswrite debugging information to the file $HOME/trace.  Most of
2934b88c807SRodney W. Grimesthis is done using the TRACE macro, which takes a set of printf
2944b88c807SRodney W. Grimesarguments inside two sets of parenthesis.  Example:
2954b88c807SRodney W. Grimes"TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
2964b88c807SRodney W. Grimescause the preprocessor can't handle functions with a variable
2974b88c807SRodney W. Grimesnumber of arguments.  Defining DEBUG also causes the shell to
2984b88c807SRodney W. Grimesgenerate a core dump if it is sent a quit signal.  The tracing
2994b88c807SRodney W. Grimescode is in show.c.
300