xref: /freebsd/bin/sh/TOUR (revision 338b821b0f17f775962746e1b2a858b4a1d86cbe)
14b88c807SRodney W. Grimes#	@(#)TOUR	8.1 (Berkeley) 5/31/93
22a456239SPeter Wemm# $FreeBSD$
34b88c807SRodney W. Grimes
44b88c807SRodney W. GrimesNOTE -- This is the original TOUR paper distributed with ash and
54b88c807SRodney W. Grimesdoes not represent the current state of the shell.  It is provided anyway
64b88c807SRodney W. Grimessince it provides helpful information for how the shell is structured,
74b88c807SRodney W. Grimesbut be warned that things have changed -- the current shell is
84b88c807SRodney W. Grimesstill under development.
94b88c807SRodney W. Grimes
104b88c807SRodney W. Grimes================================================================
114b88c807SRodney W. Grimes
124b88c807SRodney W. Grimes                       A Tour through Ash
134b88c807SRodney W. Grimes
144b88c807SRodney W. Grimes               Copyright 1989 by Kenneth Almquist.
154b88c807SRodney W. Grimes
164b88c807SRodney W. Grimes
174b88c807SRodney W. GrimesDIRECTORIES:  The subdirectory bltin contains commands which can
184b88c807SRodney W. Grimesbe compiled stand-alone.  The rest of the source is in the main
194b88c807SRodney W. Grimesash directory.
204b88c807SRodney W. Grimes
214b88c807SRodney W. GrimesSOURCE CODE GENERATORS:  Files whose names begin with "mk" are
224b88c807SRodney W. Grimesprograms that generate source code.  A complete list of these
234b88c807SRodney W. Grimesprograms is:
244b88c807SRodney W. Grimes
250ef05a46SJens Schweikhardt        program         input files         generates
260ef05a46SJens Schweikhardt        -------         -----------         ---------
274b88c807SRodney W. Grimes        mkbuiltins      builtins            builtins.h builtins.c
284b88c807SRodney W. Grimes        mknodes         nodetypes           nodes.h nodes.c
294b88c807SRodney W. Grimes        mksyntax            -               syntax.h syntax.c
30aa9caaf6SPeter Wemm        mktokens            -               token.h
314b88c807SRodney W. Grimes
32*338b821bSJilles TjoelkerThere are undoubtedly too many of these.
334b88c807SRodney W. Grimes
344b88c807SRodney W. GrimesEXCEPTIONS:  Code for dealing with exceptions appears in
354b88c807SRodney W. Grimesexceptions.c.  The C language doesn't include exception handling,
364b88c807SRodney W. Grimesso I implement it using setjmp and longjmp.  The global variable
374b88c807SRodney W. Grimesexception contains the type of exception.  EXERROR is raised by
383835f47cSJilles Tjoelkercalling error.  EXINT is an interrupt.
394b88c807SRodney W. Grimes
404b88c807SRodney W. GrimesINTERRUPTS:  In an interactive shell, an interrupt will cause an
414b88c807SRodney W. GrimesEXINT exception to return to the main command loop.  (Exception:
424b88c807SRodney W. GrimesEXINT is not raised if the user traps interrupts using the trap
434b88c807SRodney W. Grimescommand.)  The INTOFF and INTON macros (defined in exception.h)
440ef05a46SJens Schweikhardtprovide uninterruptible critical sections.  Between the execution
454b88c807SRodney W. Grimesof INTOFF and the execution of INTON, interrupt signals will be
464b88c807SRodney W. Grimesheld for later delivery.  INTOFF and INTON can be nested.
474b88c807SRodney W. Grimes
484b88c807SRodney W. GrimesMEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
494b88c807SRodney W. Grimeswhich call error when there is no memory left.  It also defines a
504b88c807SRodney W. Grimesstack oriented memory allocation scheme.  Allocating off a stack
514b88c807SRodney W. Grimesis probably more efficient than allocation using malloc, but the
524b88c807SRodney W. Grimesbig advantage is that when an exception occurs all we have to do
534b88c807SRodney W. Grimesto free up the memory in use at the time of the exception is to
544b88c807SRodney W. Grimesrestore the stack pointer.  The stack is implemented using a
554b88c807SRodney W. Grimeslinked list of blocks.
564b88c807SRodney W. Grimes
574b88c807SRodney W. GrimesSTPUTC:  If the stack were contiguous, it would be easy to store
584b88c807SRodney W. Grimesstrings on the stack without knowing in advance how long the
594b88c807SRodney W. Grimesstring was going to be:
604b88c807SRodney W. Grimes        p = stackptr;
614b88c807SRodney W. Grimes        *p++ = c;       /* repeated as many times as needed */
624b88c807SRodney W. Grimes        stackptr = p;
630ef05a46SJens SchweikhardtThe following three macros (defined in memalloc.h) perform these
644b88c807SRodney W. Grimesoperations, but grow the stack if you run off the end:
654b88c807SRodney W. Grimes        STARTSTACKSTR(p);
664b88c807SRodney W. Grimes        STPUTC(c, p);   /* repeated as many times as needed */
674b88c807SRodney W. Grimes        grabstackstr(p);
684b88c807SRodney W. Grimes
694b88c807SRodney W. GrimesWe now start a top-down look at the code:
704b88c807SRodney W. Grimes
714b88c807SRodney W. GrimesMAIN.C:  The main routine performs some initialization, executes
720ef05a46SJens Schweikhardtthe user's profile if necessary, and calls cmdloop.  Cmdloop
734b88c807SRodney W. Grimesrepeatedly parses and executes commands.
744b88c807SRodney W. Grimes
754b88c807SRodney W. GrimesOPTIONS.C:  This file contains the option processing code.  It is
764b88c807SRodney W. Grimescalled from main to parse the shell arguments when the shell is
77241f15dfSJilles Tjoelkerinvoked, and it also contains the set builtin.  The -i and -m op-
784b88c807SRodney W. Grimestions (the latter turns on job control) require changes in signal
794b88c807SRodney W. Grimeshandling.  The routines setjobctl (in jobs.c) and setinteractive
804b88c807SRodney W. Grimes(in trap.c) are called to handle changes to these options.
814b88c807SRodney W. Grimes
824b88c807SRodney W. GrimesPARSING:  The parser code is all in parser.c.  A recursive des-
834b88c807SRodney W. Grimescent parser is used.  Syntax tables (generated by mksyntax) are
844b88c807SRodney W. Grimesused to classify characters during lexical analysis.  There are
85241f15dfSJilles Tjoelkerfour tables:  one for normal use, one for use when inside single
86241f15dfSJilles Tjoelkerquotes and dollar single quotes, one for use when inside double
87241f15dfSJilles Tjoelkerquotes and one for use in arithmetic.  The tables are machine
88241f15dfSJilles Tjoelkerdependent because they are indexed by character variables and
89241f15dfSJilles Tjoelkerthe range of a char varies from machine to machine.
904b88c807SRodney W. Grimes
914b88c807SRodney W. GrimesPARSE OUTPUT:  The output of the parser consists of a tree of
924b88c807SRodney W. Grimesnodes.  The various types of nodes are defined in the file node-
934b88c807SRodney W. Grimestypes.
944b88c807SRodney W. Grimes
954b88c807SRodney W. GrimesNodes of type NARG are used to represent both words and the con-
964b88c807SRodney W. Grimestents of here documents.  An early version of ash kept the con-
974b88c807SRodney W. Grimestents of here documents in temporary files, but keeping here do-
984b88c807SRodney W. Grimescuments in memory typically results in significantly better per-
994b88c807SRodney W. Grimesformance.  It would have been nice to make it an option to use
1004b88c807SRodney W. Grimestemporary files for here documents, for the benefit of small
1014b88c807SRodney W. Grimesmachines, but the code to keep track of when to delete the tem-
1024b88c807SRodney W. Grimesporary files was complex and I never fixed all the bugs in it.
1034b88c807SRodney W. Grimes(AT&T has been maintaining the Bourne shell for more than ten
1044b88c807SRodney W. Grimesyears, and to the best of my knowledge they still haven't gotten
1054b88c807SRodney W. Grimesit to handle temporary files correctly in obscure cases.)
1064b88c807SRodney W. Grimes
1074b88c807SRodney W. GrimesThe text field of a NARG structure points to the text of the
1084b88c807SRodney W. Grimesword.  The text consists of ordinary characters and a number of
1094b88c807SRodney W. Grimesspecial codes defined in parser.h.  The special codes are:
1104b88c807SRodney W. Grimes
1114b88c807SRodney W. Grimes        CTLVAR              Variable substitution
1124b88c807SRodney W. Grimes        CTLENDVAR           End of variable substitution
1134b88c807SRodney W. Grimes        CTLBACKQ            Command substitution
1144b88c807SRodney W. Grimes        CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
1154b88c807SRodney W. Grimes        CTLESC              Escape next character
1164b88c807SRodney W. Grimes
1174b88c807SRodney W. GrimesA variable substitution contains the following elements:
1184b88c807SRodney W. Grimes
1194b88c807SRodney W. Grimes        CTLVAR type name '=' [ alternative-text CTLENDVAR ]
1204b88c807SRodney W. Grimes
1214b88c807SRodney W. GrimesThe type field is a single character specifying the type of sub-
1224b88c807SRodney W. Grimesstitution.  The possible types are:
1234b88c807SRodney W. Grimes
1244b88c807SRodney W. Grimes        VSNORMAL            $var
1254b88c807SRodney W. Grimes        VSMINUS             ${var-text}
1264b88c807SRodney W. Grimes        VSMINUS|VSNUL       ${var:-text}
1274b88c807SRodney W. Grimes        VSPLUS              ${var+text}
1284b88c807SRodney W. Grimes        VSPLUS|VSNUL        ${var:+text}
1294b88c807SRodney W. Grimes        VSQUESTION          ${var?text}
1304b88c807SRodney W. Grimes        VSQUESTION|VSNUL    ${var:?text}
1314b88c807SRodney W. Grimes        VSASSIGN            ${var=text}
1320ef05a46SJens Schweikhardt        VSASSIGN|VSNUL      ${var:=text}
1334b88c807SRodney W. Grimes
1344b88c807SRodney W. GrimesIn addition, the type field will have the VSQUOTE flag set if the
1354b88c807SRodney W. Grimesvariable is enclosed in double quotes.  The name of the variable
1364b88c807SRodney W. Grimescomes next, terminated by an equals sign.  If the type is not
1374b88c807SRodney W. GrimesVSNORMAL, then the text field in the substitution follows, ter-
1384b88c807SRodney W. Grimesminated by a CTLENDVAR byte.
1394b88c807SRodney W. Grimes
1404b88c807SRodney W. GrimesCommands in back quotes are parsed and stored in a linked list.
1414b88c807SRodney W. GrimesThe locations of these commands in the string are indicated by
1424b88c807SRodney W. GrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
1434b88c807SRodney W. Grimesthe back quotes were enclosed in double quotes.
1444b88c807SRodney W. Grimes
1454b88c807SRodney W. GrimesThe character CTLESC escapes the next character, so that in case
1464b88c807SRodney W. Grimesany of the CTL characters mentioned above appear in the input,
1474b88c807SRodney W. Grimesthey can be passed through transparently.  CTLESC is also used to
1484b88c807SRodney W. Grimesescape '*', '?', '[', and '!' characters which were quoted by the
1494b88c807SRodney W. Grimesuser and thus should not be used for file name generation.
1504b88c807SRodney W. Grimes
1514b88c807SRodney W. GrimesCTLESC characters have proved to be particularly tricky to get
1524b88c807SRodney W. Grimesright.  In the case of here documents which are not subject to
1534b88c807SRodney W. Grimesvariable and command substitution, the parser doesn't insert any
1544b88c807SRodney W. GrimesCTLESC characters to begin with (so the contents of the text
1554b88c807SRodney W. Grimesfield can be written without any processing).  Other here docu-
1564b88c807SRodney W. Grimesments, and words which are not subject to splitting and file name
1574b88c807SRodney W. Grimesgeneration, have the CTLESC characters removed during the vari-
1580ef05a46SJens Schweikhardtable and command substitution phase.  Words which are subject to
1594b88c807SRodney W. Grimessplitting and file name generation have the CTLESC characters re-
1604b88c807SRodney W. Grimesmoved as part of the file name phase.
1614b88c807SRodney W. Grimes
1624b88c807SRodney W. GrimesEXECUTION:  Command execution is handled by the following files:
1634b88c807SRodney W. Grimes        eval.c     The top level routines.
1644b88c807SRodney W. Grimes        redir.c    Code to handle redirection of input and output.
1654b88c807SRodney W. Grimes        jobs.c     Code to handle forking, waiting, and job control.
1660ef05a46SJens Schweikhardt        exec.c     Code to do path searches and the actual exec sys call.
1674b88c807SRodney W. Grimes        expand.c   Code to evaluate arguments.
1684b88c807SRodney W. Grimes        var.c      Maintains the variable symbol table.  Called from expand.c.
1694b88c807SRodney W. Grimes
1704b88c807SRodney W. GrimesEVAL.C:  Evaltree recursively executes a parse tree.  The exit
1714b88c807SRodney W. Grimesstatus is returned in the global variable exitstatus.  The alter-
1724b88c807SRodney W. Grimesnative entry evalbackcmd is called to evaluate commands in back
1734b88c807SRodney W. Grimesquotes.  It saves the result in memory if the command is a buil-
1744b88c807SRodney W. Grimestin; otherwise it forks off a child to execute the command and
1754b88c807SRodney W. Grimesconnects the standard output of the child to a pipe.
1764b88c807SRodney W. Grimes
1774b88c807SRodney W. GrimesJOBS.C:  To create a process, you call makejob to return a job
1784b88c807SRodney W. Grimesstructure, and then call forkshell (passing the job structure as
1794b88c807SRodney W. Grimesan argument) to create the process.  Waitforjob waits for a job
1804b88c807SRodney W. Grimesto complete.  These routines take care of process groups if job
1814b88c807SRodney W. Grimescontrol is defined.
1824b88c807SRodney W. Grimes
1834b88c807SRodney W. GrimesREDIR.C:  Ash allows file descriptors to be redirected and then
1844b88c807SRodney W. Grimesrestored without forking off a child process.  This is accom-
1854b88c807SRodney W. Grimesplished by duplicating the original file descriptors.  The redir-
1860ef05a46SJens Schweikhardttab structure records where the file descriptors have been dupli-
1874b88c807SRodney W. Grimescated to.
1884b88c807SRodney W. Grimes
1894b88c807SRodney W. GrimesEXEC.C:  The routine find_command locates a command, and enters
1904b88c807SRodney W. Grimesthe command in the hash table if it is not already there.  The
1914b88c807SRodney W. Grimesthird argument specifies whether it is to print an error message
1924b88c807SRodney W. Grimesif the command is not found.  (When a pipeline is set up,
1934b88c807SRodney W. Grimesfind_command is called for all the commands in the pipeline be-
1944b88c807SRodney W. Grimesfore any forking is done, so to get the commands into the hash
1954b88c807SRodney W. Grimestable of the parent process.  But to make command hashing as
1964b88c807SRodney W. Grimestransparent as possible, we silently ignore errors at that point
1974b88c807SRodney W. Grimesand only print error messages if the command cannot be found
1984b88c807SRodney W. Grimeslater.)
1994b88c807SRodney W. Grimes
2004b88c807SRodney W. GrimesThe routine shellexec is the interface to the exec system call.
2014b88c807SRodney W. Grimes
2024b88c807SRodney W. GrimesEXPAND.C:  Arguments are processed in three passes.  The first
2034b88c807SRodney W. Grimes(performed by the routine argstr) performs variable and command
2044b88c807SRodney W. Grimessubstitution.  The second (ifsbreakup) performs word splitting
205241f15dfSJilles Tjoelkerand the third (expandmeta) performs file name generation.
2064b88c807SRodney W. Grimes
2074b88c807SRodney W. GrimesVAR.C:  Variables are stored in a hash table.  Probably we should
2084b88c807SRodney W. Grimesswitch to extensible hashing.  The variable name is stored in the
2094b88c807SRodney W. Grimessame string as the value (using the format "name=value") so that
2104b88c807SRodney W. Grimesno string copying is needed to create the environment of a com-
2114b88c807SRodney W. Grimesmand.  Variables which the shell references internally are preal-
2124b88c807SRodney W. Grimeslocated so that the shell can reference the values of these vari-
2134b88c807SRodney W. Grimesables without doing a lookup.
2144b88c807SRodney W. Grimes
2154b88c807SRodney W. GrimesWhen a program is run, the code in eval.c sticks any environment
2164b88c807SRodney W. Grimesvariables which precede the command (as in "PATH=xxx command") in
2174b88c807SRodney W. Grimesthe variable table as the simplest way to strip duplicates, and
2184b88c807SRodney W. Grimesthen calls "environment" to get the value of the environment.
2194b88c807SRodney W. Grimes
2204b88c807SRodney W. GrimesBUILTIN COMMANDS:  The procedures for handling these are scat-
2214b88c807SRodney W. Grimestered throughout the code, depending on which location appears
2224b88c807SRodney W. Grimesmost appropriate.  They can be recognized because their names al-
2234b88c807SRodney W. Grimesways end in "cmd".  The mapping from names to procedures is
2240ef05a46SJens Schweikhardtspecified in the file builtins, which is processed by the mkbuilt-
2250ef05a46SJens Schweikhardtins command.
2264b88c807SRodney W. Grimes
2274b88c807SRodney W. GrimesA builtin command is invoked with argc and argv set up like a
2284b88c807SRodney W. Grimesnormal program.  A builtin command is allowed to overwrite its
2294b88c807SRodney W. Grimesarguments.  Builtin routines can call nextopt to do option pars-
2304b88c807SRodney W. Grimesing.  This is kind of like getopt, but you don't pass argc and
2314b88c807SRodney W. Grimesargv to it.  Builtin routines can also call error.  This routine
2324b88c807SRodney W. Grimesnormally terminates the shell (or returns to the main command
2334b88c807SRodney W. Grimesloop if the shell is interactive), but when called from a builtin
2344b88c807SRodney W. Grimescommand it causes the builtin command to terminate with an exit
2354b88c807SRodney W. Grimesstatus of 2.
2364b88c807SRodney W. Grimes
2374b88c807SRodney W. GrimesThe directory bltins contains commands which can be compiled in-
2384b88c807SRodney W. Grimesdependently but can also be built into the shell for efficiency
2394b88c807SRodney W. Grimesreasons.  The makefile in this directory compiles these programs
2404b88c807SRodney W. Grimesin the normal fashion (so that they can be run regardless of
2414b88c807SRodney W. Grimeswhether the invoker is ash), but also creates a library named
2424b88c807SRodney W. Grimesbltinlib.a which can be linked with ash.  The header file bltin.h
2434b88c807SRodney W. Grimestakes care of most of the differences between the ash and the
2444b88c807SRodney W. Grimesstand-alone environment.  The user should call the main routine
2454b88c807SRodney W. Grimes"main", and #define main to be the name of the routine to use
2464b88c807SRodney W. Grimeswhen the program is linked into ash.  This #define should appear
2474b88c807SRodney W. Grimesbefore bltin.h is included; bltin.h will #undef main if the pro-
2484b88c807SRodney W. Grimesgram is to be compiled stand-alone.
2494b88c807SRodney W. Grimes
250241f15dfSJilles TjoelkerCD.C:  This file defines the cd and pwd builtins.
2514b88c807SRodney W. Grimes
2524b88c807SRodney W. GrimesSIGNALS:  Trap.c implements the trap command.  The routine set-
2534b88c807SRodney W. Grimessignal figures out what action should be taken when a signal is
2544b88c807SRodney W. Grimesreceived and invokes the signal system call to set the signal ac-
2554b88c807SRodney W. Grimestion appropriately.  When a signal that a user has set a trap for
2564b88c807SRodney W. Grimesis caught, the routine "onsig" sets a flag.  The routine dotrap
2574b88c807SRodney W. Grimesis called at appropriate points to actually handle the signal.
2584b88c807SRodney W. GrimesWhen an interrupt is caught and no trap has been set for that
2594b88c807SRodney W. Grimessignal, the routine "onint" in error.c is called.
2604b88c807SRodney W. Grimes
2614b88c807SRodney W. GrimesOUTPUT:  Ash uses it's own output routines.  There are three out-
2624b88c807SRodney W. Grimesput structures allocated.  "Output" represents the standard out-
2634b88c807SRodney W. Grimesput, "errout" the standard error, and "memout" contains output
2644b88c807SRodney W. Grimeswhich is to be stored in memory.  This last is used when a buil-
2654b88c807SRodney W. Grimestin command appears in backquotes, to allow its output to be col-
2664b88c807SRodney W. Grimeslected without doing any I/O through the UNIX operating system.
2674b88c807SRodney W. GrimesThe variables out1 and out2 normally point to output and errout,
2684b88c807SRodney W. Grimesrespectively, but they are set to point to memout when appropri-
2694b88c807SRodney W. Grimesate inside backquotes.
2704b88c807SRodney W. Grimes
2714b88c807SRodney W. GrimesINPUT:  The basic input routine is pgetc, which reads from the
2724b88c807SRodney W. Grimescurrent input file.  There is a stack of input files; the current
2734b88c807SRodney W. Grimesinput file is the top file on this stack.  The code allows the
2744b88c807SRodney W. Grimesinput to come from a string rather than a file.  (This is for the
2754b88c807SRodney W. Grimes-c option and the "." and eval builtin commands.)  The global
2764b88c807SRodney W. Grimesvariable plinno is saved and restored when files are pushed and
2774b88c807SRodney W. Grimespopped from the stack.  The parser routines store the number of
2784b88c807SRodney W. Grimesthe current line in this variable.
2794b88c807SRodney W. Grimes
2804b88c807SRodney W. GrimesDEBUGGING:  If DEBUG is defined in shell.h, then the shell will
2814b88c807SRodney W. Grimeswrite debugging information to the file $HOME/trace.  Most of
2824b88c807SRodney W. Grimesthis is done using the TRACE macro, which takes a set of printf
2834b88c807SRodney W. Grimesarguments inside two sets of parenthesis.  Example:
2844b88c807SRodney W. Grimes"TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
2854b88c807SRodney W. Grimescause the preprocessor can't handle functions with a variable
2864b88c807SRodney W. Grimesnumber of arguments.  Defining DEBUG also causes the shell to
2874b88c807SRodney W. Grimesgenerate a core dump if it is sent a quit signal.  The tracing
2884b88c807SRodney W. Grimescode is in show.c.
289