xref: /titanic_41/usr/src/lib/libshell/common/builtins.mm (revision b9238976491622ad75a67ab0c12edf99e36212b9)
1.ds DT July 9, 1993  \" use troff -mm
2.nr C 3
3.nr N 2
4.SA 1  \"  right justified
5.TL "311466-6713" "49059-6"  \" charging case filing case
6Guidelines for writing \f5ksh-93\fP built-in commands
7.AU "David G. Korn" DGK FP 11267 8062 D-237 "(research!dgk)"
8.AF
9.TM  11267-930???-93  \"  technical memo + TM numbers
10.MT 4
11.AS 2   \" abstract start for TM
12One of the features of \f5ksh93\fP, the latest version of \f5ksh\fP,
13is the ability to add built-in commands at run time.
14This feature only works on operating systems that have the ability
15to load and link code into the current process at run time.
16Some examples of the systems that have this feature
17are System V Release 4, Solaris, Sun OS, HP-UX Release 8 and above,
18AIX 3.2 and above, and Microsoft Windows systems.
19.P
20This memo describes how to write and compile programs
21to can be loaded into \f5ksh\fP at run  time as built-in
22commands.
23.AE   \" abstract end
24.OK Shell "Command interpreter" Language UNIX  \" keyword
25.MT 1  \"  memo type
26.H 1 INTRODUCTION
27A built-in command is executed without creating a separate process.
28Instead, the command is invoked as a C function by \f5ksh\fP.
29If this function has no side effects in the shell process,
30then the behavior of this built-in is identical to that of
31the equivalent stand-alone command.  The primary difference
32in this case is performance.  The overhead of process creation
33is eliminated.  For commands of short duration, the effect
34can be dramatic.  For example, on SUN OS 4.1, the time do
35run \f5wc\fP on a small file of about 1000 bytes, runs
36about 50 times faster as a built-in command.
37.P
38In addition, built-in commands that have side effects on the
39shell environment can be written.
40This is usually done to extend the application domain for
41shell programming.  For example, an X-windows extension
42that makes heavy use of the shell variable namespace
43was added as a group of built-ins commands that
44are added at run time.
45The result is a windowing shell that can be used to write
46X-windows applications.
47.P
48While there are definite advantages to adding built-in
49commands, there are some disadvantages as well.
50Since the built-in command and \f5ksh\fP share the same
51address space, a coding error in the built-in program
52may affect the behavior of \f5ksh\fP; perhaps causing
53it to core dump or hang.
54Debugging is also more complex since your code is now
55a part of a larger entity.
56The isolation provided by a separate process
57guarantees that all resources used by the command
58will be freed when the command completes.
59Also, since the address space of \f5ksh\fP will be larger,
60this may increase the time it takes \f5ksh\fP to fork() and
61exec() a non-builtin command.
62It makes no sense to add a built-in command that takes
63a long time to run or that is run only once, since the performance
64benefits will be negligible.
65Built-ins that have side effects in the current shell
66environment have the disadvantage of increasing the
67coupling between the built-in and \f5ksh\fP making
68the overall system less modular and more monolithic.
69.P
70Despite these drawbacks, in many cases extending
71\f5ksh\fP by adding built-in
72commands makes sense and allows reuse of the shell
73scripting ability in an application specific domain.
74This memo describes how to write \f5ksh\fP extensions.
75.H 1 "WRITING BUILT-IN COMMANDS"
76There is a development kit available for writing \f5ksh\fP
77built-ins.  The development kit has three directories,
78\f5include\fP, \f5lib\fP, and \f5bin\fP.
79The \f5include\fP directory contains a sub-directory
80named \f5ast\fP that contains interface prototypes
81for functions that you can call from built-ins.  The \f5lib\fP
82directory contains the \fBast\fP library\*F
83.FS
84\fBast\fP stands for Advanced Software Technology
85.FE
86and a library named \fBlibcmd\fP that contains a version
87of several of the standard POSIX\*(Rf
88.RS
89.I "POSIX \- Part 2: Shell and Utilities,"
90IEEE Std 1003.2-1992, ISO/IEC 9945-2:1993.
91.RF
92utilities that can be made run time built-ins.
93It is best to set the value of the environment variable
94\fB\s-1PACKAGE_\s+1ast\fP to the pathname of the directory
95containing the development kit.
96Users of \f5nmake\fP\*(Rf
97.RS
98Glenn Fowler,
99Nmake reference needed
100.RF
1012.3 and above will then be able to
102use the rule
103.nf
104.in .5i
105\f5:PACKAGE:	ast\fP
106.in
107.fi
108in their makefiles and not have to specify any \f5-I\fP switches
109to the compiler.
110.P
111A built-in command has a calling convention similar to
112the \f5main\fP function of a program,
113.nf
114.in .5i
115\f5int main(int argc, char *argv[])\fP.
116.in
117.fi
118However, instead of \f5main\fP, you must use the function name
119\f5b_\fP\fIname\fP, where \fIname\fP is the name
120of the built-in you wish to define.
121The built-in function takes a third
122\f5void*\fP  argument which you can define as \f5NULL\fP.
123Instead of \f5exit\fP, you need to use \f5return\fP
124to terminate your command.
125The return value, will become the exit status of the command.
126.P
127The steps necessary to create and add a run time built-in are
128illustrated in the following simple example.
129Suppose, you wish to add a built-in command named \f5hello\fP
130which requires one argument and prints the word hello followed
131by its argument.  First, write the following program in the file
132\f5hello.c\fP:
133.nf
134.in .5i
135\f5#include     <stdio.h>
136int b_hello(int argc, char *argv[], void *context)
137{
138        if(argc != 2)
139        {
140                fprintf(stderr,"Usage: hello arg\en");
141                return(2);
142        }
143        printf("hello %s\en",argv[1]);
144        return(0);
145}\fP
146.in
147.fi
148.P
149Next, the program needs to be compiled.
150On some systems it is necessary to specify a compiler
151option to produce position independent code
152for dynamic linking.
153If you do not compile with \f5nmake\fP
154it is important to specify the a special include directory
155when compiling built-ins.
156.nf
157.in .5i
158\f5cc -pic -I$PACKAGE_ast/include -c hello.c\fP
159.in
160.fi
161since the special version of \f5<stdio.h>\fP
162in the development kit is required.
163This command generates \f5hello.o\fP in the current
164directory.
165.P
166On some systems, you cannot load \f5hello.o\fP directly,
167you must build a shared library instead.
168Unfortunately, the method for generating a shared library
169differs with operating system.
170However, if you are building with the AT\&T \f5nmake\fP
171program you can use the \f5:LIBRARY:\fP rule to specify
172this in a system independent fashion.
173In addition, if you have several built-ins, it is desirable
174to build a shared library that contains them all.
175.P
176The final step is using the built-in.
177This can be done with the \f5ksh\fP command \f5builtin\fP.
178To load the shared library \f5hello.so\fP and to add
179the built-in \f5hello\fP, invoke the command,
180.nf
181.in .5i
182\f5builtin -f hello hello\fP
183.in
184.fi
185The suffix for the shared library can be omitted in
186which case the shell will add an appropriate suffix
187for the system that it is loading from.
188Once this command has been invoked, you can invoke \f5hello\fP
189as you do any other command.
190.P
191It is often desirable to make a command \fIbuilt-in\fP
192the first time that it is referenced.  The first
193time \f5hello\fP is invoked, \f5ksh\fP should load and execute it,
194whereas for subsequent invocations \f5ksh\fP should just execute the built-in.
195This can be done by creating a file named \f5hello\fP
196with the following contents:
197.nf
198.in .5i
199\f5function hello
200{
201        unset -f hello
202        builtin -f hello hello
203        hello "$@"
204}\fP
205.in
206.fi
207This file \f5hello\fP needs to be placed in a directory that is
208in your \fB\s-1FPATH\s+1\fP variable.  In addition, the full
209pathname for \f5hello.so\fP should be used in this script
210so that the run time loader will be able to find this shared library
211no matter where the command \f5hello\fP is invoked.
212.H 1 "CODING REQUIREMENTS AND CONVENTIONS"
213As mentioned above, the entry point for built-ins must be of
214the form \f5b_\fP\fIname\fP.
215Your built-ins can call functions from the standard C library,
216the \fBast\fP library, interface functions provided by \f5ksh\fP,
217and your own functions.
218You should avoid using any global symbols beginning with
219.BR sh_ ,
220.BR nv_ ,
221and
222.B ed_
223since these are used by \f5ksh\fP itself.
224In addition, \f5#define\fP constants in \f5ksh\fP interface
225files, use symbols beginning with \fBSH_\fP to that you should
226avoid using names beginning with \fBSH_\fP.
227.H 2 "Header Files"
228The development kit provides a portable interface
229to the C library and to libast.
230The header files in the development kit are compatible with
231K&R C\*(Rf,
232.RS
233Brian W. Kernighan and Dennis M. Ritchie,
234.IR "The C Programming Language" ,
235Prentice Hall, 1978.
236.RF
237ANSI-C\*(Rf,
238.RS
239American National Standard for Information Systems \- Programming
240Language \- C, ANSI X3.159-1989.
241.RF
242and C++\*(Rf.
243.RS
244Bjarne Stroustroup,
245.IR "C++" ,
246Addison Wesley, xxxx
247.RF
248.P
249The best thing to do is to include the header file \f5<shell.h>\fP.
250This header file causes the \f5<ast.h>\fP header, the
251\f5<error.h>\fP header and the \f5<stak.h>\fP
252header to be included as well as defining prototypes
253for functions that you can call to get shell
254services for your builtins.
255The header file \f5<ast.h>\fP
256provides prototypes for many \fBlibast\fP functions
257and all the symbol and function definitions from the
258ANSI-C headers, \f5<stddef.h>\fP,
259\f5<stdlib.h>\fP, \f5<stdarg.h>\fP, \f5<limits.h>\fP,
260and \f5<string.h>\fP.
261It also provides all the symbols and definitions for the
262POSIX\*(Rf
263.RS
264.I "POSIX \- Part 1: System Application Program Interface,"
265IEEE Std 1003.1-1990, ISO/IEC 9945-1:1990.
266.RF
267headers \f5<sys/types.h>\fP, \f5<fcntl.h>\fP, and
268\f5<unistd.h>\fP.
269You should include \f5<ast.h>\fP instead of one or more of
270these headers.
271The \f5<error.h>\fP header provides the interface to the error
272and option parsing routines defined below.
273The \f5<stak.h>\fP header provides the interface to the memory
274allocation routines described below.
275.P
276Programs that want to use the information in \f5<sys/stat.h>\fP
277should include the file \f5<ls.h>\fP instead.
278This provides the complete POSIX interface to \f5stat()\fP
279related functions even on non-POSIX systems.
280.P
281.H 2 "Input/Output"
282\f5ksh\fP uses \fBsfio\fP,
283the Safe/Fast I/O library\*(Rf,
284.RS
285David Korn and Kiem-Phong Vo,
286.IR "SFIO - A Safe/Fast Input/Output library,"
287Proceedings of the Summer Usenix,
288pp. , 1991.
289.RF
290to perform all I/O operations.
291The \fBsfio\fP library, which is part of \fBlibast\fP,
292provides a superset of the functionality provided by the standard
293I/O library defined in ANSI-C.
294If none of the additional functionality is required,
295and if you are not familiar with \fBsfio\fP and
296you do not want to spend the time learning it,
297then you can use \fBsfio\fP via the \fBstdio\fP library
298interface.  The development kit contains the header \f5<stdio.h>\fP
299which maps \fBstdio\fP calls to \fBsfio\fP calls.
300In most instances the mapping is done
301by macros or inline functions so that there is no overhead.
302The man page for the \fBsfio\fP library is in an Appendix.
303.P
304However, there are some very nice extensions and
305performance improvements in \fBsfio\fP
306and if you plan any major extensions I recommend
307that you use it natively.
308.H 2 "Error Handling"
309For error messages it is best to use the \fBast\fP library
310function \f5errormsg()\fP rather that sending output to
311\f5stderr\fP or the equivalent \f5sfstderr\fP directly.
312Using \f5errormsg()\fP will make error message appear
313more uniform to the user.
314Furthermore, using \f5errormsg()\fP should make it easier
315to do error message translation for other locales
316in future versions of \f5ksh\fP.
317.P
318The first argument to
319\f5errormsg()\fP specifies the dictionary in which the string
320will be searched for translation.
321The second argument to \f5errormsg()\fP contains that error type
322and value.  The third argument is a \fIprintf\fP style format
323and the remaining arguments are arguments to be printed
324as part of the message.  A new-line is inserted at the
325end of each message and therefore, should not appear as
326part of the format string.
327The second argument should be one of the following:
328.VL .5i
329.LI \f5ERROR_exit(\fP\fIn\fP\f5)\fP:
330If \fIn\fP is not-zero, the builtin will exit value \fIn\fP after
331printing the message.
332.LI \f5ERROR_system(\fP\fIn\fP\f5)\fP:
333Exit builtin with exit value \fIn\fP after printing the message.
334The message will display the message corresponding to \f5errno\fP
335enclosed within \f5[\ ]\fP at the end of the message.
336.LI \f5ERROR_usage(\fP\fIn\fP\f5)\fP:
337Will generate a usage message and exit.  If \fIn\fP is non-zero,
338the exit value will be 2.  Otherwise the exit value will be 0.
339.LI \f5ERROR_debug(\fP\fIn\fP\f5)\fP:
340Will print a level \fIn\fP debugging message and will then continue.
341.LI \f5ERROR_warn(\fP\fIn\fP\f5)\fP:
342Prints a warning message. \fIn\fP is ignored.
343.H 2 "Option Parsing"
344The first thing that a built-in should do is to check
345the arguments for correctness and to print any usage
346messages on standard error.
347For consistency with the rest of \f5ksh\fP, it is best
348to use the \f5libast\fP functions \f5optget()\fP and
349\f5optusage()\fPfor this
350purpose.
351The header \f5<error.h>\fP included prototypes for
352these functions.
353The \f5optget()\fP function is similar to the
354System V C library function \f5getopt()\fP,
355but provides some additional capabilities.
356Built-ins that use \f5optget()\fP provide a more
357consistent user interface.
358.P
359The \f5optget()\fP function is invoked as
360.nf
361.in .5i
362\f5int optget(char *argv[], const char *optstring)\fP
363.in
364.fi
365where \f5argv\fP is the argument list and \f5optstring\fP
366is a string that specifies the allowable arguments and
367additional information that is used to format \fIusage\fP
368messages.
369In fact a complete man page in \f5troff\fP or \f5html\fP
370can be generated by passing a usage string as described
371by the \f5getopts\fP command.
372Like \f5getopt()\fP,
373single letter options are represented by the letter itself,
374and options that take a string argument are followed by the \f5:\fP
375character.
376Option strings have the following special characters:
377.VL .5i
378.LI \f5:\fP
379Used after a letter option to indicate that the option
380takes an option argument.
381The variable \f5opt_info.arg\fP will point to this
382value after the given argument is encountered.
383.LI \f5#\fP
384Used after a letter option to indicate that the option
385can only take a numerical value.
386The variable \f5opt_info.num\fP will contain this
387value after the given argument is encountered.
388.LI \f5?\fP
389Used after a \f5:\fP or \f5#\fP (and after the optional \f5?\fP)
390to indicate the the
391preceding option argument is not required.
392.LI \f5[\fP...\f5]\fP
393After a \f5:\fP or \f5#\fP, the characters contained
394inside the brackets are used to identify the option
395argument when generating a \fIusage\fP message.
396.LI \fIspace\fP
397The remainder of the string will only be used when generating
398usage messages.
399.LE
400.P
401The \f5optget()\fP function returns the matching option letter if
402one of the legal option is matched.
403Otherwise, \f5optget()\fP returns
404.VL .5i
405.LI \f5':'\fP
406If there is an error.  In this case the variable \f5opt_info.arg\fP
407contains the error string.
408.LI \f50\fP
409Indicates the end of options.
410The variable \f5opt_info.index\fP contains the number of arguments
411processed.
412.LI \f5'?'\fP
413A usage message has been required.
414You normally call \f5optusage()\fP to generate and display
415the usage message.
416.LE
417.P
418The following is an example of the option parsing portion
419of the \f5wc\fP utility.
420.nf
421.in +5
422\f5#include <shell.h>
423while(1) switch(n=optget(argv,"xf:[file]"))
424{
425	case 'f':
426		file = opt_info.arg;
427		break;
428	case ':':
429		error(ERROR_exit(0), opt_info.arg);
430		break;
431	case '?':
432		error(ERROR_usage(2), opt_info.arg);
433		break;
434}\fP
435.in
436.fi
437.H 2 "Storage Management"
438It is important that any memory used by your built-in
439be returned.  Otherwise, if your built-in is called frequently,
440\f5ksh\fP will eventually run out of memory.
441You should avoid using \f5malloc()\fP for memory that must
442be freed before returning from you built-in, because by default,
443\f5ksh\fP will terminate you built-in in the event of an
444interrupt and the memory will not be freed.
445.P
446The best way to to allocate variable sized storage is
447through calls to the \fBstak\fP library
448which is included in \fBlibast\fP
449and which is used extensively by \f5ksh\fP itself.
450Objects allocated with the \f5stakalloc()\fP
451function are freed when you function completes
452or aborts.
453The \fBstak\fP library provides a convenient way to
454build variable length strings and other objects dynamically.
455The man page for the \fBstak\fP library is contained
456in the Appendix.
457.P
458Before \f5ksh\fP calls each built-in command, it saves
459the current stack location and restores it after
460it returns.
461It is not necessary to save and restore the stack
462location in the \f5b_\fP entry function,
463but you may want to write functions that use this stack
464are restore it when leaving the function.
465The following coding convention will do this in
466an efficient manner:
467.nf
468.in .5i
469\fIyourfunction\fP\f5()
470{
471        char	*savebase;
472        int	saveoffset;
473        if(saveoffset=staktell())
474        	savebase = stakfreeze(0);
475        \fP...\f5
476        if(saveoffset)
477        	stakset(savebase,saveoffset);
478        else
479        	stakseek(0);
480}\fP
481.in
482.fi
483.H 1 "CALLING \f5ksh\fP SERVICES"
484Some of the more interesting applications are those that extend
485the functionality of \f5ksh\fP in application specific directions.
486A prime example of this is the X-windows extension which adds
487builtins to create and delete widgets.
488The \fBnval\fP library is used to interface with the shell
489name space.
490The \fBshell\fP library is used to access other shell services.
491.H 2 "The nval library"
492A great deal of power is derived from the ability to use
493portions of the hierarchal variable namespace provided by \f5ksh-93\fP
494and turn these names into active objects.
495.P
496The \fBnval\fP library is used to interface with shell
497variables.
498A man page for this file is provided in an Appendix.
499You need to include the header \f5<nval.h>\fP
500to access the functions defined in the \fBnval\fP library.
501All the functions provided by the \fBnval\fP library begin
502with the prefix \f5nv_\fP.
503Each shell variable is an object in an associative table
504that is referenced by name.
505The type \f5Namval_t*\fP is pointer to a shell variable.
506To operate on a shell variable, you first get a handle
507to the variable with the \f5nv_open()\fP function
508and then supply the handle returned as the first
509argument of the function that provides an operation
510on the variable.
511You must call \f5nv_close()\fP when you are finished
512using this handle so that the space can be freed once
513the value is unset.
514The two most frequent operations are to get the value of
515the variable, and to assign value to the variable.
516The \f5nv_getval()\fP returns a pointer the the
517value of the variable.
518In some cases the pointer returned is to a region that
519will be overwritten by the next \f5nv_getval()\fP call
520so that if the value isn't used immediately, it should
521be copied.
522Many variables can also generate a numeric value.
523The \f5nv_getnum()\fP function returns a numeric
524value for the given variable pointer, calling the
525arithmetic evaluator if necessary.
526.P
527The \f5nv_putval()\fP function is used to assign a new
528value to a given variable.
529The second argument to \f5putval()\fP is the value
530to be assigned
531and the third argument is a \fIflag\fP which
532is used in interpreting the second argument.
533.P
534Each shell variable can have one or more attributes.
535The \f5nv_isattr()\fP is used to test for the existence
536of one or more attributes.
537See the appendix for a complete list of attributes.
538.P
539By default, each shell variable passively stores the string you
540give with with \f5nv_putval()\fP, and returns the value
541with \f5getval()\fP.  However, it is possible to turn
542any node into an active entity by assigning functions
543to it that will be called whenever \f5nv_putval()\fP
544and/or \f5nv_getval()\fP is called.
545In fact there are up to five functions that can
546associated with each variable to override the
547default actions.
548The type \f5Namfun_t\fP is used to define these functions.
549Only those that are non-\f5NULL\fP override the
550default actions.
551To override the default actions, you must allocate an
552instance of \f5Namfun_t\fP, and then assign
553the functions that you wish to override.
554The \f5putval()\fP
555function is called by the \f5nv_putval()\fP function.
556A \f5NULL\fP for the \fIvalue\fP argument
557indicates a request to unset the variable.
558The \fItype\fP argument might contain the \f5NV_INTEGER\fP
559bit so you should be prepared to do a conversion if
560necessary.
561The \f5getval()\fP
562function is called by \f5nv_getval()\fP
563value and must return a string.
564The \f5getnum()\fP
565function is called by by the arithmetic evaluator
566and must return double.
567If omitted, then it will call \f5nv_getval()\fP and
568convert the result to a number.
569.P
570The functionality of a variable can further be increased
571by adding discipline functions that
572can be associated with the variable.
573A discipline function allows a script that uses your
574variable to define functions whose name is
575\fIvarname\fP\f5.\fP\fIdiscname\fP
576where \fIvarname\fP is the name of the variable, and \fIdiscname\fP
577is the name of the discipline.
578When the user defines such a function, the \f5settrap()\fP
579function will be called with the name of the discipline and
580a pointer to the parse tree corresponding to the discipline
581function.
582The application determines when these functions are actually
583executed.
584By default, \f5ksh\fP defines \f5get\fP,
585\f5set\fP, and \f5unset\fP as discipline functions.
586.P
587In addition, it is possible to provide a data area that
588will be passed as an argument to
589each of these functions whenever any of these functions are called.
590To have private data, you need to define and allocate a structure
591that looks like
592.nf
593.in .5i
594\f5struct \fIyours\fP
595{
596        Namfun_t	fun;
597	\fIyour_data_fields\fP;
598};\fP
599.in
600.fi
601.H 2 "The shell library"
602There are several functions that are used by \f5ksh\fP itself
603that can also be called from built-in commands.
604The man page for these routines are in the Appendix.
605.P
606The \f5sh_addbuiltin()\fP function can be used to add or delete
607builtin commands.  It takes the name of the built-in, the
608address of the function that implements the built-in, and
609a \f5void*\fP pointer that will be passed to this function
610as the third agument whenever it is invoked.
611If the function address is \f5NULL\fP, the specified built-in
612will be deleted.  However, special built-in functions cannot
613be deleted or modified.
614.P
615The \f5sh_fmtq()\fP function takes a string and returns
616a string that is quoted as necessary so that it can
617be used as shell input.
618This function is used to implement the \f5%q\fP option
619of the shell built-in \f5printf\fP command.
620.P
621The \f5sh_parse()\fP function returns a parse tree corresponding
622to a give file stream.  The tree can be executed by supplying
623it as the first argument to
624the \f5sh_trap()\fP function and giving a value of \f51\fP as the
625second argument.
626Alternatively, the \f5sh_trap()\fP function can parse and execute
627a string by passing the string as the first argument and giving \f50\fP
628as the second argument.
629.P
630The \f5sh_isoption()\fP function can be used to set to see whether one
631or more of the option settings is enabled.
632