psd/01.cacm/p4

         This module is believed to contain source code proprietary to AT&T.
 Use and redistribution is subject to the Berkeley Software License
 Agreement and your Software Agreement with AT&T (Western Electric).

 @(#)p4 8.1 (Berkeley) 6/8/93
VI. THE SHELL

For most users,
communication with
the system
is carried on with the
aid of a program called the shell.
The shell is a
command-line interpreter: it reads lines typed by the user and
interprets them as requests to execute
other programs.
(The shell is described fully elsewhere,
.[
bourne shell bstj
%Q This issue
.]
so this section will discuss only the theory of its operation.)
In simplest form, a command line consists of the command
name followed by arguments to the command, all separated
by spaces:
1
command arg\*s\d1\u\*n arg\*s\d2\u\*n .\|.\|. arg\*s\dn\u\*n
2
The shell splits up the command name and the arguments into
separate strings.
Then a file with name
 command is sought;
 command may be a path name including the ``/'' character to
specify any file in the system.
If
 command is found, it is brought into
memory and executed.
The arguments
collected by the shell are accessible
to the command.
When the command is finished, the shell
resumes its own execution, and indicates its readiness
to accept another command by typing a prompt character.

If file
 command cannot be found,
the shell generally prefixes a string
such as
 /\|bin\|/ to
 command and
attempts again to find the file.
Directory
 /\|bin contains commands
intended to be generally used.
(The sequence of directories to be searched
may be changed by user request.)
6.1 Standard I/O

The discussion of I/O in Section III above seems to imply that
every file used by a program must be opened or created by the program in
order to get a file descriptor for the file.
Programs executed by the shell, however, start off with
three open files with file descriptors
0, 1, and 2.
As such a program begins execution, file 1 is open for writing,
and is best understood as the standard output file.
Except under circumstances indicated below, this file
is the user's terminal.
Thus programs that wish to write informative
information ordinarily use file descriptor 1.
Conversely, file 0 starts off open for reading, and programs that
wish to read messages typed by the user
read this file.

The shell is able to change the standard assignments of
these file descriptors from the
user's terminal printer and keyboard.
If one of the
arguments to a command is prefixed by ``>'', file descriptor
1 will, for the duration of the command, refer to the
file named after the ``>''.
For example:
1
ls
2
ordinarily lists, on the typewriter, the names of the files in the current
directory.
The command:
1
ls >there
2
creates a file called
 there and places the listing there.
Thus the argument
 >there means
``place output on
 there .'' On the other hand:
1
ed
2
ordinarily enters the editor, which takes requests from the
user via his keyboard.
The command
1
ed <script
2
interprets
 script as a file of editor commands;
thus
 <script means ``take input from
 script .''
Although the file name following ``<'' or ``>'' appears
to be an argument to the command, in fact it is interpreted
completely by the shell and is not passed to the
command at all.
Thus no special coding to handle I/O redirection is needed within each
command; the command need merely use the standard file
descriptors 0 and 1 where appropriate.

File descriptor 2 is, like file 1,
ordinarily associated with the terminal output stream.
When an output-diversion request with ``>'' is specified,
file 2 remains attached to the terminal, so that commands
may produce diagnostic messages that
do not silently end up in the output file.
6.2 Filters

An extension of the standard I/O notion is used
to direct output from one command to
the input of another.
A sequence of commands separated by
vertical bars causes the shell to
execute all the commands simultaneously and to arrange
that the standard output of each command
be delivered to the standard input of
the next command in the sequence.
Thus in the command line:
1
ls | pr -2 | opr
2
 ls lists the names of the files in the current directory;
its output is passed to
 pr , which
paginates its input with dated headings.
(The argument ``-2'' requests
double-column output.)
Likewise, the output from
 pr is input to
 opr ; this command spools its input onto a file for off-line
printing.

This procedure could have been carried out
more clumsily by:
1
ls >temp1
pr -2 <temp1 >temp2
opr <temp2
2
followed by removal of the temporary files.
In the absence of the ability
to redirect output and input,
a still clumsier method would have been to
require the
 ls command
to accept user requests to paginate its output,
to print in multi-column format, and to arrange
that its output be delivered off-line.
Actually it would be surprising, and in fact
unwise for efficiency reasons,
to expect authors of
commands such as
 ls to provide such a wide variety of output options.

A program
such as
 pr which copies its standard input to its standard output
(with processing)
is called a
T filter . Some filters that we have found useful
perform
character transliteration,
selection of lines according to a pattern,
sorting of the input,
and encryption and decryption.
6.3 Command separators; multitasking

Another feature provided by the shell is relatively straightforward.
Commands need not be on different lines; instead they may be separated
by semicolons:
1
ls; ed
2
will first list the contents of the current directory, then enter
the editor.

A related feature is more interesting.
If a command is followed
by ``\f3&\f1,'' the shell will not wait for the command to finish before
prompting again; instead, it is ready immediately
to accept a new command.
For example:
.bd 3
1
as source >output &
2
causes
 source to be assembled, with diagnostic
output going to
 output ; no matter how long the
assembly takes, the shell returns immediately.
When the shell does not wait for
the completion of a command,
the identification number of the
process running that command is printed.
This identification may be used to
wait for the completion of the command or to
terminate it.
The ``\f3&\f1'' may be used
several times in a line:
1
as source >output & ls >files &
2
does both the assembly and the listing in the background.
In these examples, an output file
other than the terminal was provided; if this had not been
done, the outputs of the various commands would have been
intermingled.

The shell also allows parentheses in the above operations.
For example:
1
(\|date; ls\|) >x &
2
writes the current date and time followed by
a list of the current directory onto the file
 x . The shell also returns immediately for another request.
 1
6.4 The shell as a command; command files

The shell is itself a command, and may be called recursively.
Suppose file
 tryout contains the lines:
1
as source
mv a.out testprog
testprog
2
The
 mv command causes the file
 a.out to be renamed
 testprog.  a.out is the (binary) output of the assembler, ready to be executed.
Thus if the three lines above were typed on the keyboard,
 source would be assembled, the resulting program renamed
 testprog , and
 testprog executed.
When the lines are in
 tryout , the command:
1
sh <tryout
2
would cause the shell
 sh to execute the commands
sequentially.

The shell has further capabilities, including the
ability to substitute parameters
and
to construct argument lists from a specified
subset of the file names in a directory.
It also provides general conditional and looping constructions.
 1
6.5 Implementation of the shell

The outline of the operation of the shell can now be understood.
Most of the time, the shell
is waiting for the user to type a command.
When the
newline character ending the line
is typed, the shell's
 read call returns.
The shell analyzes the command line, putting the
arguments in a form appropriate for
 execute . Then
 fork is called.
The child process, whose code
of course is still that of the shell, attempts
to perform an
 execute with the appropriate arguments.
If successful, this will bring in and start execution of the program whose name
was given.
Meanwhile, the other process resulting from the
 fork , which is the
parent process,
 wait s for the child process to die.
When this happens, the shell knows the command is finished, so
it types its prompt and reads the keyboard to obtain another
command.

Given this framework, the implementation of background processes
is trivial; whenever a command line contains ``\f3&\f1,''
the shell merely refrains from waiting for the process
that it created
to execute the command.

Happily, all of this mechanism meshes very nicely with
the notion of standard input and output files.
When a process is created by the
 fork primitive, it
inherits not only the memory image of its parent
but also all the files currently open in its parent,
including those with file descriptors 0, 1, and 2.
The shell, of course, uses these files to read command
lines and to write its prompts and diagnostics, and in the ordinary case
its children\(emthe command programs\(eminherit them automatically.
When an argument with ``<'' or ``>'' is given, however, the
offspring process, just before it performs
 execute, makes the standard I/O
file descriptor (0 or 1, respectively) refer to the named file.
This is easy
because, by agreement,
the smallest unused file descriptor is assigned
when a new file is
 open ed (or
 create d); it is only necessary to close file 0 (or 1)
and open the named file.
Because the process in which the command program runs simply terminates
when it is through, the association between a file
specified after ``<'' or ``>'' and file descriptor 0 or 1 is ended
automatically when the process dies.
Therefore
the shell need not know the actual names of the files
that are its own standard input and output, because it need
never reopen them.

Filters are straightforward extensions
of standard I/O redirection with pipes used
instead of files.

In ordinary circumstances, the main loop of the shell never
terminates.
(The main loop includes the
branch of the return from
 fork belonging to the
parent process; that is, the branch that does a
 wait , then
reads another command line.)
The one thing that causes the shell to terminate is
discovering an end-of-file condition on its input file.
Thus, when the shell is executed as a command with
a given input file, as in:
1
sh <comfile
2
the commands in
 comfile will be executed until
the end of
 comfile is reached; then the instance of the shell
invoked by
 sh will terminate.
Because this shell process
is the child of another instance of the shell, the
 wait executed in the latter will return, and another
command may then be processed.
6.6 Initialization

The instances of the shell to which users type
commands are themselves children of another process.
The last step in the initialization of
the system
is the creation of
a single process and the invocation (via
 execute ) of a program called
 init . The role of
 init is to create one process
for each terminal channel.
The various subinstances of
 init open the appropriate terminals
for input and output
on files 0, 1, and 2,
waiting, if necessary, for carrier to be established on dial-up lines.
Then a message is typed out requesting that the user log in.
When the user types a name or other identification,
the appropriate instance of
 init wakes up, receives the log-in
line, and reads a password file.
If the user's name is found, and if
he is able to supply the correct password,
 init changes to the user's default current directory, sets
the process's user \*sID\*n to that of the person logging in, and performs
an
 execute of the shell.
At this point, the shell is ready to receive commands
and the logging-in protocol is complete.

Meanwhile, the mainstream path of
 init (the parent of all
the subinstances of itself that will later become shells)
does a
 wait . If one of the child processes terminates, either
because a shell found an end of file or because a user
typed an incorrect name or password, this path of
 init simply recreates the defunct process, which in turn reopens the appropriate
input and output files and types another log-in message.
Thus a user may log out simply by typing the end-of-file
sequence to the shell.
6.7 Other programs as shell

The shell as described above is designed to allow users
full access to the facilities of the system, because it will
invoke the execution of any program
with appropriate protection mode.
Sometimes, however, a different interface to the system
is desirable, and this feature is easily arranged for.

Recall that after a user has successfully logged in by supplying
a name and password,
 init ordinarily invokes the shell
to interpret command lines.
The user's entry
in the password file may contain the name
of a program to be invoked after log-in instead of the shell.
This program is free to interpret the user's messages
in any way it wishes.

For example, the password file entries
for users of a secretarial editing system
might
specify that the
editor
 ed is to be used instead of the shell.
Thus when users of the editing system log in, they are inside the editor and
can begin work immediately; also, they can be prevented from
invoking
programs not intended for their use.
In practice, it has proved desirable to allow a temporary
escape from the editor
to execute the formatting program and other utilities.

Several of the games (e.g., chess, blackjack, 3D tic-tac-toe)
available on
the system
illustrate
a much more severely restricted environment.
For each of these, an entry exists
in the password file specifying that the appropriate game-playing
program is to be invoked instead of the shell.
People who log in as a player
of one of these games find themselves limited to the
game and unable to investigate the (presumably more interesting)
offerings of
the
X system
as a whole.