xref: /titanic_44/usr/src/lib/libpp/common/pp.3 (revision 2b4a78020b9c38d1b95e2f3fefa6d6e4be382d1f)
.fp 5 CW .. .}S 5 1 "\\$1" "\\$2" "\\$3" "\\$4" "\\$5" "\\$6" .. .}S 1 5 "\\$1" "\\$2" "\\$3" "\\$4" "\\$5" "\\$6" ..

0

..

..

PP 3
NAME \" @(#)pp.3 (gsf@research.att.com) 04/01/92
pp - ANSI C preprocessor library
SYNOPSIS
.EX :PACKAGE: ast #include <pp.h> %include "pptokens.yacc -lpp
DESCRIPTION
The pp library provides a tokenizing implementation of the C language preprocessor and supports K&R (Reiser), ANSI and C++ dialects. The preprocessor is comprised of 12 public functions, a global character class table accessed by macros, and a single global struct with 10 public elements.

pp operates in two modes. Standalone mode is used to implement the traditional standalone C preprocessor. Tokeinizing mode provides a function interface to a stream of preprocessed tokens. pp is by default ANSI; the only default predefined symbols are .L __STDC__ and .LR __STDPP__ . Dialects (K&R, C++) and local conventions are determined by compiler specific probe (1) information that is included at runtime. The probe information can be overridden by providing a file .L pp_default.h with pragmas and definitions for each compiler implementation. This file is usually located in the compiler specific default include directory.

Directive, command line argument, option and pragma syntax is described in cpp (1). pp specific semantics are described below. Most semantic differences with standard or classic implementations are in the form of optimizations.

Options and pragmas map to .L ppop function calls described below. For the remaining descriptions, ``setting \f5ppop(PP_operation\f5)'' is a shorthand for calling .L ppop with the arguments appropriate for \f5PP_operation.

The library interface describes only the public functions and struct elements. Static structs and pointers to structs are provided by the library. The user should not attempt to allocate structs. In particular, .L sizeof is meaningless for pp supplied structs.

The global struct .L pp provides readonly information. Any changes to .L pp must be done using the functions described below. .L pp has the following public elements:

.L "char* version" The pp implementaion version string.

.L "char* lineid" The current line sync directive name. Used for standalone line sync output. The default value is the empty string. See the .L ppline function below.

.L "char* outfile" The current output file name.

.L "char* pass" The pragma pass name for pp. The default value is .LR pp .

.L "char* token" The string representation for the current input token.

.L "int flags" The inclusive or of:

.L PP_comment Set if .L ppop(PP_COMMENT) was set.

.L PP_compatibility Set if .L ppop(PP_COMPATIBILITY) was set.

.L PP_linefile Set if standalone line syncs require a file argument.

.L PP_linetype Set if standalone line syncs require a third argument. The third argument is .L 1 for include file push, .L 2 for include file pop and null otherwise.

.L PP_strict Set if .L ppop(PP_STRICT) was set.

.L PP_transition Set if .L ppop(PP_TRANSITION) was set.

.L "struct ppdirs* lcldirs" The list of directories to be searched for "..." include files. If the first directory name is "" then it is replaced by the directory of the including file at include time. The public elements of .L "struct ppdirs" are:

.L "char* name" The directory pathname.

.L "struct ppdirs* next" The next directory, .L 0 if it is the last in the list.

.L "struct ppdirs* stddirs" .L pp.stddirs->next is the list of directories to be searched for <...> include files. This list may be .LR 0 .

.L "struct ppsymbol* symbol" If .L ppop(PP_COMPILE) was set then .L pp.symbol points to the symbol table entry for the current identifier token. .L pp.symbol is undefined for non-identifier tokens. Once defined, an identifier will always have the same .L ppsymbol pointer. If .L ppop(PP_NOHASH) was also set then .L pp.symbol is defined for macro and keyword tokens and .L 0 for all other identifiers. The elements of .L "struct ppsymbol" are:

.L "char* name" The identifier name.

.L "int flags" The inclusive or of the following flags:

0

.L SYM_ACTIVE Currently being expanded.

.L SYM_BUILTIN Builtin macro.

.L SYM_DISABLED Macro expansion currently disabled.

.L SYM_FUNCTION Function-like macro.

.L SYM_INIT Initialization macro.

.L SYM_KEYWORD Keyword identifier.

.L SYM_LOADED Loaded checkpoint macro.

.L SYM_MULTILINE .L #macdef macro.

.L SYM_NOEXPAND No identifiers in macro body.

.L SYM_PREDEFINED Predefined macro.

.L SYM_PREDICATE Also a .L #assert predicate.

.L SYM_READONLY Readonly macro.

.L SYM_REDEFINE Ok to redefine.

.L SYM_VARIADIC Variadic function-like macro.

.L SYM_UNUSED First unused symbol flag bit index. The bits from .L (1<<SYM_UNUSED) on are initially unset and may be set by the user.

.L "struct ppmacro* macro" Non-zero if the identifier is a macro. .L "int macro->arity" is the number of formal arguments for function-like macros and .L "char* macro->value" is the macro definition value, a .L 0 terminated string that may contain internal mark sequences.

.L "char* value" Initially set to .L 0 and never modified by pp . This field may be set by the user.

.L "Hash_table_t* symtab" The macro and identifier .L "struct ppsymbol" hash table. The hash (3) routines may be used to examine the table, with the exception that the following macros must be used for individual .L pp.symtab symbol lookup:

.L "struct ppsymbol* ppsymget(Hash_table_t* table, char* name)" Return the .L ppsymbol pointer for .LR name , 0 if .L name not defined.

.L "struct ppsymbol* ppsymset(Hash_table_t* table, char* name)" Return the .L ppsymbol pointer for .LR name . If .L name is not defined then allocate and return a new .L ppsymbol for it.

Error messages are reported using error (3) and the following globals relate to pp :

.L "int error_info.errors" The level 2 error count. Error levels above 2 cause immediate exit. If .L error_info.errors is non-zero then the user program exit status should also be non-zero.

.L "char* error_info.file" The current input file name.

.L "int error_info.line" The current input line number.

.L "int error_info.trace" The debug trace level, .L 0 by default. Larger negative numbers produce more trace information. Enabled when the user program is linked with the -g cc (1) option.

.L "int error_info.warnings" The level 1 error count. Warnings do not affect the exit status.

The functions are:

.L "extern int ppargs(char** argv, int last);" Passed to optjoin (3) to parse cpp (1) style options and arguments. The user may also supply application specific option parsers. Also handles non-standard options like the sun .L -undef and GNU .LR -trigraphs . Hello in there, ever here of getopt (3)?

.L "extern void ppcpp(void);" This is the standalone cpp (1) entry point. .L ppcpp consumes all of the input and writes the preprocessed text to the output. A single call to .L ppcpp is equivalent to, but more efficient than: .EX ppop(PP_SPACEOUT, 1); while (pplex()) ppprintf(" %s", pp.token);

.L "extern int ppcomment(char* head, char* comment, char* tail, int line);" The default comment handler that passes comments to the output. May be used as an argument to .LR ppop(PP_COMMENT) , or the user may supply an application specific handler. .L head is the comment head text, .L "/*" for C and .L "//" for C++, .L comment is the comment body, .L tail is the comment tail text, .L "*/" for C and newline for C++, and .L line is the comment starting line number.

.L "extern void pperror(int level, char* format, ...);" Equivalent to error (3). All pp error and warning messages pass through .LR pperror . The user may link with an application specific .L pperror to override the library default.

.L "extern int ppincref(char* parent, char* file, int line, int push);" The default include reference handler that outputs .L file to the standard error. May be used as an argument to the .LR ppop(PP_INCREF) , or the user may supply an application specific handler. .L parent is the including file name, .L file is the current include file name, .L line is the current line number in .LR file , and .L push is non-zero if .L file is being pushed or .L 0 if file is being popped.

.L "extern void ppinput(char* buffer, char* file, int line);" Pushes the .L 0 terminated .L buffer on the pp input stack. .L file is the pseudo file name used in line syncs for .L buffer and .L line is the starting line number.

.L "int pplex(void)" Returns the token type of the next input token. .L pp.token and where applicable .L pp.symbol are updated to refer to the new token. The token type constants are defined in .L pp.h for .L #include and .L pp.yacc for yacc (1) .LR %include . The token constant names match .LR T_[A-Z_]* ; some are encoded by oring with .L N_[A-Z_]* tokens. The numeric constant tokens and encodings are: .EX T_DOUBLE (N_NUMBER|N_REAL) T_DOUBLE_L (N_NUMBER|N_REAL|N_LONG) T_FLOAT (N_NUMBER|N_REAL|N_FLOAT) T_DECIMAL (N_NUMBER) T_DECIMAL_L (N_NUMBER|N_LONG) T_DECIMAL_U (N_NUMBER|N_UNSIGNED) T_DECIMAL_UL (N_NUMBER|N_UNSIGNED|N_LONG) T_OCTAL (N_NUMBER|N_OCTAL) T_OCTAL_L (N_NUMBER|N_OCTAL|N_LONG) T_OCTAL_U (N_NUMBER|N_OCTAL|N_UNSIGNED) T_OCTAL_UL (N_NUMBER|N_OCTAL|N_UNSIGNED|N_LONG) T_HEXADECIMAL (N_NUMBER|N_HEXADECIMAL) T_HEXADECIMAL_L (N_NUMBER|N_HEXADECIMAL|N_LONG) T_HEXADECIMAL_U (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED) T_HEXADECIMAL_UL (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED|N_LONG) The normal C tokens are: .EX T_ID C identifier T_INVALID invalid token T_HEADER <..> T_CHARCONST '..' T_WCHARCONST L'..' T_STRING ".." T_WSTRING L".." T_PTRMEM -> T_ADDADD ++ T_SUBSUB -- T_LSHIFT << T_RSHIFT >> T_LE <= T_GE >= T_EQ == T_NE != T_ANDAND && T_OROR || T_MPYEQ *= T_DIVEQ /= T_MODEQ %= T_ADDEQ += T_SUBEQ -= T_LSHIFTEQ <<= T_RSHIFTEQ >>= T_ANDEQ &= T_XOREQ ^= T_OREQ |= T_TOKCAT ## T_VARIADIC ... T_DOTREF .* [if PP_PLUSPLUS] T_PTRMEMREF ->* [if PP_PLUSPLUS] T_SCOPE :: [if PP_PLUSPLUS] T_UMINUS unary minus If .L ppop(PP_COMPILE) was set then the keyword tokens are also defined. Compiler differences and dialects are detected by the pp probe (1) information, and only the appropriate keywords are enabled. The ANSI keyword tokens are: .EX T_AUTO T_BREAK T_CASE T_CHAR T_CONTINUE T_DEFAULT T_DO T_DOUBLE_T T_ELSE T_EXTERN T_FLOAT_T T_FOR T_GOTO T_IF T_INT T_LONG T_REGISTER T_RETURN T_SHORT T_SIZEOF T_STATIC T_STRUCT T_SWITCH T_TYPEDEF T_UNION T_UNSIGNED T_WHILE T_CONST T_ENUM T_SIGNED T_VOID T_VOLATILE and the C++ keyword tokens are: .EX T_CATCH T_CLASS T_DELETE T_FRIEND T_INLINE T_NEW T_OPERATOR T_OVERLOAD T_PRIVATE T_PROTECTED T_PUBLIC T_TEMPLATE T_THIS T_THROW T_TRY T_VIRTUAL In addition, .L T_ASM is recognized where appropriate. Additional keyword tokens .L ">= T_KEYWORD" may be added using .LR ppop(PP_COMPILE) . Many C implementations show no restraint in adding new keywords; some PC compilers have tripled the number of keywords. For the most part these new keywords introduce noise constructs that can be ignored for standard ( reasonable ) analysis and compilation. The noise keywords fall in four syntactic categories that map into the two noise keyword tokens .L T_NOISE and .LR T_NOISES . For .L T_NOISES .L pp.token points to the entire noise construct, including the offending noise keyword. The basic noise keyword categories are:

.L T_NOISE The simplest noise: a single keyword that is noise in any context and maps to .LR T_NOISE .

.L T_X_GROUP A noise keyword that precedes an optional grouping construct, either .L "(..)" or .L "{..}" and maps to .LR T_NOISES .

.L T_X_LINE A noise keyword that consumes the remaining tokens in the line and maps to .LR T_NOISES .

.L T_X_STATEMENT A noise keyword that consumes the tokens up to the next .L ; and maps to .LR T_NOISES .

If .L ppop(PP_NOISE) is .L "> 0" then implementation specific noise constructs are mapped to either .L T_NOISE or .L T_NOISES , otherwise if .L ppop(PP_NOISE) is .L "< 0" then noise constructs are completely ignored, otherwise the unmapped grouping noise tokens .L T_X_.* are returned. Token encodings may be tested by the following macros:

.L "int isnumber(int token);" Non-zero if .L token is an integral or floating point numeric constant.

.L "int isinteger(int token);" Non-zero if .L token is an integral numeric constant.

.L "int isreal(int token);" Non-zero if .L token is a floating point numeric constant.

.L "int isassignop(int token);" Non-zero if .L token is a C assignment operator.

.L "int isseparate(int token);" Non-zero if .L token must be separated from other tokens by space .

.L "int isnoise(int token);" Non-zero if .L token is a noise keyword.

.L "extern int ppline(int line, char* file);" The default line sync handler that outputs line sync pragmas for the C compiler front end. May be used as an argument to .LR ppop(PP_LINE) , or the user may supply an application specific handler. .L line is the line number and .L file is the file name. If .L ppop(PP_LINEID) was set then the directive # lineid line "file" is output.

.L "extern int ppmacref(struct ppsymbol* symbol, char* file, int line, int type);" The default macro reference handler that outputs a macro reference pragmas. May be used as an argument to .LR ppop(PP_MACREF) , or the user may supply an application specific handler. .L symbol is the macro .L ppsymbol pointer, .L file is the reference file, .L line is the reference line, and if .L type is non-zero a macro value checksum is also output. The pragma syntax is #pragma pp:macref "symbol->name" line checksum.

.L "int ppop(int op, ...)" .L ppop is the option control interface. .L op determines the type(s) of the remaining argument(s). Options marked by .L "/*INIT*/" must be done before .LR PP_INIT .

.L "(PP_ASSERT, char* string) /*INIT*/" .L string is asserted as if by .LR #assert .

.L "(PP_BUILTIN, char*(*fun)(char* buf, char* name, char* args)) /*INIT*/" Installs .L fun as the unknown builtin macro handler. Builtin macros are of the form .LR "#(name args)" . .L fun is called with .L name set to the unknown builtin macro name and .L args set to the arguments. .L buf is a .L MAXTOKEN+1 buffer that can be used for the .L fun return value. .L 0 should be returned on error.

.L "(PP_COMMENT,void (*fun)(char*head,char*body,char*tail,int line) /*INIT*/"

.L "(PP_COMPATIBILITY, char* string) /*INIT*/"

.L "(PP_COMPILE, char* string) /*INIT*/"

.L "(PP_DEBUG, char* string) /*INIT*/"

.L "(PP_DEFAULT, char* string) /*INIT*/"

.L "(PP_DEFINE, char* string) /*INIT*/" .L string is defined as if by .LR #define .

.L "(PP_DIRECTIVE, char* string) /*INIT*/" The directive # string is executed.

.L "(PP_DONE, char* string) /*INIT*/"

.L "(PP_DUMP, char* string) /*INIT*/"

.L "(PP_FILEDEPS, char* string) /*INIT*/"

.L "(PP_FILENAME, char* string) /*INIT*/"

.L "(PP_HOSTDIR, char* string) /*INIT*/"

.L "(PP_HOSTED, char* string) /*INIT*/"

.L "(PP_ID, char* string) /*INIT*/"

.L "(PP_IGNORE, char* string) /*INIT*/"

.L "(PP_INCLUDE, char* string) /*INIT*/"

.L "(PP_INCREF, char* string) /*INIT*/"

.L "(PP_INIT, char* string) /*INIT*/"

.L "(PP_INPUT, char* string) /*INIT*/"

.L "(PP_LINE, char* string) /*INIT*/"

.L "(PP_LINEFILE, char* string) /*INIT*/"

.L "(PP_LINEID, char* string) /*INIT*/"

.L "(PP_LINETYPE, char* string) /*INIT*/"

.L "(PP_LOCAL, char* string) /*INIT*/"

.L "(PP_MACREF, char* string) /*INIT*/"

.L "(PP_MULTIPLE, char* string) /*INIT*/"

.L "(PP_NOHASH, char* string) /*INIT*/"

.L "(PP_NOID, char* string) /*INIT*/"

.L "(PP_NOISE, char* string) /*INIT*/"

.L "(PP_OPTION, char* string) /*INIT*/" The directive #pragma pp:string is executed.

.L "(PP_OPTARG, char* string) /*INIT*/"

.L "(PP_OUTPUT, char* string) /*INIT*/"

.L "(PP_PASSNEWLINE, char* string) /*INIT*/"

.L "(PP_PASSTHROUGH, char* string) /*INIT*/"

.L "(PP_PLUSPLUS, char* string) /*INIT*/"

.L "(PP_PRAGMA, char* string) /*INIT*/"

.L "(PP_PREFIX, char* string) /*INIT*/"

.L "(PP_PROBE, char* string) /*INIT*/"

.L "(PP_READ, char* string) /*INIT*/"

.L "(PP_RESERVED, char* string) /*INIT*/"

.L "(PP_SPACEOUT, char* string) /*INIT*/"

.L "(PP_STANDALONE, char* string) /*INIT*/"

.L "(PP_STANDARD, char* string) /*INIT*/"

.L "(PP_STRICT, char* string) /*INIT*/"

.L "(PP_TEST, char* string) /*INIT*/"

.L "(PP_TRUNCATE, char* string) /*INIT*/"

.L "(PP_UNDEF, char* string) /*INIT*/"

.L "(PP_WARN, char* string) /*INIT*/"

.L "int pppragma(char* dir, char* pass, char* name, char* value, int nl);" The default handler that copies unknown directives and pragmas to the output. May be used as an argument to .LR ppop(PP_PRAGMA) , or the user may supply an application specific handler. This function is most often called after directive and pragma mapping. Any of the arguments may be .LR 0 . .L dir is the directive name, .L pass is the pragma pass name, .L name is the pragma option name, .L value is the pragma option value, and .L nl is non-zero if a trailing newline is required if the pragma is copied to the output.

.L "int ppprintf(char* format, ...);" A printf (3) interface to the standalone pp output buffer. Macros provide limited control over output buffering: .L "void ppflushout()" flushes the output buffer, .L "void ppcheckout()" flushes the output buffer if over .L PPBUFSIZ character are buffered, .L "int pppendout()" returns the number of pending character in the output buffer, and .L "void ppputchar(int c)" places the character .L c in the output buffer.

CAVEATS
The ANSI mode is intended to be true to the standard. The compatibility mode has been proven in practice, but there are surely dark corners of some implementations that may have been omitted.
"SEE ALSO"
cc(1), cpp(1), nmake(1), probe(1), yacc(1),

ast(3), error(3), hash(3), optjoin(3)

AUTHOR
Glenn Fowler

(Dennis Ritchie provided the original table driven lexer.)

AT&T Bell Laboratories