xref: /freebsd/contrib/byacc/README.BTYACC (revision ee7b0571c2c18bdec848ed2044223cc88db29bd8)
1*497dcf4cSBaptiste Daroussin-- $Id: README.BTYACC,v 1.2 2014/04/22 08:18:57 Tom.Shields Exp $
20c8de5b0SBaptiste Daroussin
30c8de5b0SBaptiste DaroussinThe original README from btyacc is below.
40c8de5b0SBaptiste Daroussin
50c8de5b0SBaptiste DaroussinThe backtracking enhancements to byacc have been merged into Thomas Dickey's
60c8de5b0SBaptiste Daroussinbyacc baseline.
70c8de5b0SBaptiste Daroussin
80c8de5b0SBaptiste DaroussinThe %include and %define/%ifdef enhancements described below are not currently
90c8de5b0SBaptiste Daroussinincorporated.
100c8de5b0SBaptiste Daroussin
11*497dcf4cSBaptiste DaroussinThe position management functionality ("YYPOSN", "yyposn", "YYREDUCEPOSNFUNC",
12*497dcf4cSBaptiste Daroussin"YYREDUCEPOSNFUNCARG" & "YYCALLREDUCEPOSN") is replaced by a bison-compatible
13*497dcf4cSBaptiste Daroussin"%locations" implementation.
14*497dcf4cSBaptiste Daroussin
15*497dcf4cSBaptiste DaroussinThe memory management functionality ("YYDELETEVAL" & "YYDELETEPOSN") is
16*497dcf4cSBaptiste Daroussinreplaced by a bison-compatible "%destructor" implementation.
17*497dcf4cSBaptiste Daroussin
18*497dcf4cSBaptiste DaroussinThe detailed syntax error processing functionality ("YYERROR_DETAILED"
19*497dcf4cSBaptiste Daroussin& "yyerror_detailed()") is subsumed by the bison-compatible "yyerror()"
20*497dcf4cSBaptiste Daroussinimplementation, as modified by the %parse-param and %locations directives.
21*497dcf4cSBaptiste Daroussin
22*497dcf4cSBaptiste DaroussinThe debugging macro "YYDBPR()" in the parser skeleton is renamed
23*497dcf4cSBaptiste Daroussin"YYSTYPE_TOSTRING()".
24*497dcf4cSBaptiste Daroussin
250c8de5b0SBaptiste Daroussin-------------------------------------------------------------------------------
260c8de5b0SBaptiste Daroussin	     BTYACC -- backtracking yacc
270c8de5b0SBaptiste Daroussin	     ===========================
280c8de5b0SBaptiste Daroussin
290c8de5b0SBaptiste DaroussinBTYACC was created by Chris Dodd using ideas from many
300c8de5b0SBaptiste Daroussinplaces and lots of code from the Berkeley Yacc
310c8de5b0SBaptiste Daroussindistribution, which is a public domain yacc clone put
320c8de5b0SBaptiste Daroussintogether by the good folks at Berkeley.  This code is
330c8de5b0SBaptiste Daroussindistributed with NO WARRANTY and is public domain.
340c8de5b0SBaptiste DaroussinIt is certain to contain bugs, which you should
350c8de5b0SBaptiste Daroussinreport to: chrisd@collins.com.
360c8de5b0SBaptiste Daroussin
370c8de5b0SBaptiste DaroussinVadim Maslov of Siber Systems <vadik@siber.com>
380c8de5b0SBaptiste Daroussinconsiderably modified BTYACC to make it suitable
390c8de5b0SBaptiste Daroussinfor production environment.
400c8de5b0SBaptiste Daroussin
410c8de5b0SBaptiste DaroussinSeveral people have suggested bug fixes that
420c8de5b0SBaptiste Daroussinwere incorporated into BtYacc.
430c8de5b0SBaptiste Daroussin
440c8de5b0SBaptiste DaroussinSee the README.BYACC files for more about
450c8de5b0SBaptiste DaroussinBerkeley Yacc and other sources of info.
460c8de5b0SBaptiste Daroussin
470c8de5b0SBaptiste Daroussinhttp://www.siber.com/btyacc/ is the current home of BtYacc.
480c8de5b0SBaptiste DaroussinIt is provided courtesy of Siber Systems http://www.siber.com/.
490c8de5b0SBaptiste Daroussin
500c8de5b0SBaptiste Daroussin
510c8de5b0SBaptiste Daroussin		Version 3.0 changes
520c8de5b0SBaptiste Daroussin		-------------------
530c8de5b0SBaptiste Daroussin		  by Vadim Maslov
540c8de5b0SBaptiste Daroussin
550c8de5b0SBaptiste DaroussinChanges mostly occurred in btyaccpa.ske file that
560c8de5b0SBaptiste Daroussincontains the parsing shift/reduce/backtrack algorithm.
570c8de5b0SBaptiste Daroussin
580c8de5b0SBaptiste DaroussinVersion 3.0 innovations focus on:
590c8de5b0SBaptiste Daroussin- text position computation and propagation,
600c8de5b0SBaptiste Daroussin- industrial-strength error processing and recovery.
610c8de5b0SBaptiste Daroussin
620c8de5b0SBaptiste Daroussin
630c8de5b0SBaptiste Daroussin** Added mechanism for computing and propagating
640c8de5b0SBaptiste Daroussintext position of tokens and non-terminals.
650c8de5b0SBaptiste Daroussin
660c8de5b0SBaptiste DaroussinCompilers often need to build AST trees such that every node
670c8de5b0SBaptiste Daroussinin a tree can relate to the parsed program source it came from.
680c8de5b0SBaptiste DaroussinThe following applications are very likely to need this:
690c8de5b0SBaptiste Daroussin- debuggers that show actual source of the debugged program,
700c8de5b0SBaptiste Daroussin- source-to-source translators that want
710c8de5b0SBaptiste Daroussin  unchanged parts of the tree to generate the unchanged code.
720c8de5b0SBaptiste Daroussin
730c8de5b0SBaptiste DaroussinThe new YYPOSN mechanism added in this version of BtYacc
740c8de5b0SBaptiste Daroussinhelps you in automating the text position computation
750c8de5b0SBaptiste Daroussinand in assigning the computed text positions to the AST.
760c8de5b0SBaptiste DaroussinThis mechanism is successfully used in commercial
770c8de5b0SBaptiste Daroussinparsers and source-to-source translators.
780c8de5b0SBaptiste Daroussin
790c8de5b0SBaptiste DaroussinIn standard Yaccs every token and every non-terminal
800c8de5b0SBaptiste Daroussinhas an YYSTYPE semantic value attached to it.
810c8de5b0SBaptiste DaroussinIn this new version every token and every non-terminal
820c8de5b0SBaptiste Daroussinalso has an YYPOSN text position attached to it.
830c8de5b0SBaptiste DaroussinYYPOSN is a user-defined type that can be anything and
840c8de5b0SBaptiste Daroussinthat has a meaning of text position attached to
850c8de5b0SBaptiste Daroussintoken or non-terminal.
860c8de5b0SBaptiste Daroussin
870c8de5b0SBaptiste DaroussinIn addition to semantic value stack BtYacc now maintains
880c8de5b0SBaptiste Daroussintext position stack. Behavior of the text position stack
890c8de5b0SBaptiste Daroussinis similar to the behavior of the semantic value stack.
900c8de5b0SBaptiste Daroussin
910c8de5b0SBaptiste DaroussinIf using text position mechanism,
920c8de5b0SBaptiste Daroussinyou need to define the following:
930c8de5b0SBaptiste Daroussin
940c8de5b0SBaptiste DaroussinYYPOSN	Preprocessor variable that contains C/C++ type of
950c8de5b0SBaptiste Daroussin	the text position attached to
960c8de5b0SBaptiste Daroussin	every token and non-terminal.
970c8de5b0SBaptiste Daroussin
980c8de5b0SBaptiste Daroussinyyposn  Global variable of type YYPOSN.
990c8de5b0SBaptiste Daroussin        The lexer must assign text position of
1000c8de5b0SBaptiste Daroussin	the returned token to yyposn, just like it assigns
1010c8de5b0SBaptiste Daroussin	semantic value of the returned token to yylval.
1020c8de5b0SBaptiste Daroussin
1030c8de5b0SBaptiste DaroussinYYREDUCEPOSNFUNC
1040c8de5b0SBaptiste Daroussin	Preprocessor variable that points to function that
1050c8de5b0SBaptiste Daroussin	is called after the grammar rule reduction
1060c8de5b0SBaptiste Daroussin	to reduce text positions located on the stack.
1070c8de5b0SBaptiste Daroussin
1080c8de5b0SBaptiste Daroussin        This function is called by BtYacc to reduce text
1090c8de5b0SBaptiste Daroussin	positions. The function is called immediately after
1100c8de5b0SBaptiste Daroussin	the regular rule reduction occurs.
1110c8de5b0SBaptiste Daroussin
1120c8de5b0SBaptiste Daroussin	The function has the following prototype:
1130c8de5b0SBaptiste Daroussin	void ReducePosn(YYPOSN  &ret,
1140c8de5b0SBaptiste Daroussin			YYPOSN  *terms,
1150c8de5b0SBaptiste Daroussin			YYSTYPE *term_vals,
1160c8de5b0SBaptiste Daroussin			int      term_no,
1170c8de5b0SBaptiste Daroussin			int      stk_pos,
1180c8de5b0SBaptiste Daroussin			int      yychar,
1190c8de5b0SBaptiste Daroussin			YYPOSN  &yyposn,
1200c8de5b0SBaptiste Daroussin			UserType extra);
1210c8de5b0SBaptiste Daroussin
1220c8de5b0SBaptiste Daroussin        The function arguments are:
1230c8de5b0SBaptiste Daroussin        - ret
1240c8de5b0SBaptiste Daroussin	  Reference to the text position returned by
1250c8de5b0SBaptiste Daroussin          the rule. The function must write the computed
1260c8de5b0SBaptiste Daroussin          text position returned by the rule to ret.
1270c8de5b0SBaptiste Daroussin          This is analogue of the $$ semantic value.
1280c8de5b0SBaptiste Daroussin
1290c8de5b0SBaptiste Daroussin        - term_posns
1300c8de5b0SBaptiste Daroussin          Array of the right-hand side rule components
1310c8de5b0SBaptiste Daroussin	  YYPOSN text positions.  These are analogues of
1320c8de5b0SBaptiste Daroussin	  $1, $2, ..., $N in the text position world.
1330c8de5b0SBaptiste Daroussin
1340c8de5b0SBaptiste Daroussin        - term_vals
1350c8de5b0SBaptiste Daroussin	  Array of the right-hand side (RHS) rule components
1360c8de5b0SBaptiste Daroussin	  YYSTYPE values. These are the $1,...,$N themselves.
1370c8de5b0SBaptiste Daroussin
1380c8de5b0SBaptiste Daroussin        - term_no
1390c8de5b0SBaptiste Daroussin          Number of the components in RHS of the reduced rule.
1400c8de5b0SBaptiste Daroussin          Equal to size of arrays term_posns and term_vals.
1410c8de5b0SBaptiste Daroussin          Also equal to N in $1,...,$N in the reduced rule.
1420c8de5b0SBaptiste Daroussin
1430c8de5b0SBaptiste Daroussin        - stk_pos
1440c8de5b0SBaptiste Daroussin          YYSTYPE/YYPOSN stack position before the reduction.
1450c8de5b0SBaptiste Daroussin
1460c8de5b0SBaptiste Daroussin        - yychar
1470c8de5b0SBaptiste Daroussin          Lookahead token that immediately follows
1480c8de5b0SBaptiste Daroussin 	  the reduced RHS components.
1490c8de5b0SBaptiste Daroussin
1500c8de5b0SBaptiste Daroussin        - yyposn
1510c8de5b0SBaptiste Daroussin          YYPOSN of the token that immediately follows
1520c8de5b0SBaptiste Daroussin	  the reduced RHS components.
1530c8de5b0SBaptiste Daroussin
1540c8de5b0SBaptiste Daroussin        - extra
1550c8de5b0SBaptiste Daroussin          User-defined extra argument passed to ReducePosn.
1560c8de5b0SBaptiste Daroussin
1570c8de5b0SBaptiste Daroussin        Typically this function extracts text positions from
1580c8de5b0SBaptiste Daroussin	the right-hand side rule components and either
1590c8de5b0SBaptiste Daroussin	assigns them to the returned $$ structure/tree or
1600c8de5b0SBaptiste Daroussin        if no $$ value is returned, puts them into
1610c8de5b0SBaptiste Daroussin	the ret text position from where
1620c8de5b0SBaptiste Daroussin        it will be picked up by the later reduced rules.
1630c8de5b0SBaptiste Daroussin
1640c8de5b0SBaptiste DaroussinYYREDUCEPOSNFUNCARG
1650c8de5b0SBaptiste Daroussin	Extra user-defined argument passed to
1660c8de5b0SBaptiste Daroussin	the ReducePosn function. This argument can use
1670c8de5b0SBaptiste Daroussin	any variables defined in btyaccpa.ske.
1680c8de5b0SBaptiste Daroussin
1690c8de5b0SBaptiste Daroussin
1700c8de5b0SBaptiste Daroussin** Added code to btyaccpa.ske that automatically cleans up
1710c8de5b0SBaptiste Daroussinsemantic semantic values and text positions of tokens
1720c8de5b0SBaptiste Daroussinand non-terminals that are discarded and deleted as
1730c8de5b0SBaptiste Daroussina result of error processing.
1740c8de5b0SBaptiste Daroussin
1750c8de5b0SBaptiste DaroussinIn the previous versions the discarded token and non-terminal
1760c8de5b0SBaptiste Daroussinsemantic values were not cleaned that caused quite severe
1770c8de5b0SBaptiste Daroussinleaks.  The only way to fix it was to add garbage collection
1780c8de5b0SBaptiste Daroussinto YYSTYPE class.
1790c8de5b0SBaptiste Daroussin
1800c8de5b0SBaptiste DaroussinNow BtYacc skeleton calls delete functions for semantic
1810c8de5b0SBaptiste Daroussinvalues and positions of the discarded tokens and
1820c8de5b0SBaptiste Daroussinnon-terminals.
1830c8de5b0SBaptiste Daroussin
1840c8de5b0SBaptiste DaroussinYou need to define the following functions that BtYacc
1850c8de5b0SBaptiste Daroussincalls when it needs to delete semantic value or text position.
1860c8de5b0SBaptiste Daroussin
1870c8de5b0SBaptiste DaroussinYYDELETEVAL
1880c8de5b0SBaptiste Daroussin	User-defined function that is called by BtYacc
1890c8de5b0SBaptiste Daroussin	to delete semantic value of the token or non-terminal.
1900c8de5b0SBaptiste Daroussin
1910c8de5b0SBaptiste Daroussin	The user-defined function must have the prototype:
1920c8de5b0SBaptiste Daroussin	void DeleteYYval(YYSTYPE v, int type);
1930c8de5b0SBaptiste Daroussin	v    is semantic value to delete,
1940c8de5b0SBaptiste Daroussin	type is one of the following:
1950c8de5b0SBaptiste Daroussin	0 	discarding token
1960c8de5b0SBaptiste Daroussin	1       discarding state
1970c8de5b0SBaptiste Daroussin	2       cleaning up stack when aborting
1980c8de5b0SBaptiste Daroussin
1990c8de5b0SBaptiste DaroussinYYDELETEPOSN
2000c8de5b0SBaptiste Daroussin	User-defined function that is called by BtYacc
2010c8de5b0SBaptiste Daroussin	to delete text position of the token or non-terminal.
2020c8de5b0SBaptiste Daroussin
2030c8de5b0SBaptiste Daroussin	The user-defined function must have the prototype:
2040c8de5b0SBaptiste Daroussin	void DeleteYYposn(YYPOSN p, int type);
2050c8de5b0SBaptiste Daroussin	v    is semantic value to delete,
2060c8de5b0SBaptiste Daroussin	type is one of the following:
2070c8de5b0SBaptiste Daroussin	0 	discarding token
2080c8de5b0SBaptiste Daroussin	1       discarding state
2090c8de5b0SBaptiste Daroussin	2       cleaning up stack when aborting
2100c8de5b0SBaptiste Daroussin
2110c8de5b0SBaptiste Daroussin
2120c8de5b0SBaptiste Daroussin** User can define "detailed" syntax error processing
2130c8de5b0SBaptiste Daroussinfunction that reports an *exact* position of
2140c8de5b0SBaptiste Daroussinthe token that caused the error.
2150c8de5b0SBaptiste Daroussin
2160c8de5b0SBaptiste DaroussinIf you define preprocessor variable YYERROR_DETAILED in
2170c8de5b0SBaptiste Daroussinyour grammar then you need define the following
2180c8de5b0SBaptiste Daroussinerror processing function:
2190c8de5b0SBaptiste Daroussin
2200c8de5b0SBaptiste Daroussinvoid yyerror_detailed(char    *text,
2210c8de5b0SBaptiste Daroussin		      int      errt,
2220c8de5b0SBaptiste Daroussin		      YYSTYPE &errt_value,
2230c8de5b0SBaptiste Daroussin		      YYPOSN  &errt_posn);
2240c8de5b0SBaptiste Daroussin
2250c8de5b0SBaptiste DaroussinIt receives the following arguments:
2260c8de5b0SBaptiste Daroussintext		Error message.
2270c8de5b0SBaptiste Daroussinerrt		Code of the token that caused the error.
2280c8de5b0SBaptiste Daroussinerrt_value	Value of the token that caused the error.
2290c8de5b0SBaptiste Daroussinerrt_posn	Text position of token that caused error.
2300c8de5b0SBaptiste Daroussin
2310c8de5b0SBaptiste Daroussin
2320c8de5b0SBaptiste Daroussin** Dropped compatibility with C.
2330c8de5b0SBaptiste Daroussin
2340c8de5b0SBaptiste DaroussinCompatibility with C became increasingly difficult
2350c8de5b0SBaptiste Daroussinto maintain as new features were added to btyaccpa.ske.
2360c8de5b0SBaptiste DaroussinSo we dropped it. If anybody wants to make the new version
2370c8de5b0SBaptiste Daroussincompatible with C, we would gladly accept the changes.
2380c8de5b0SBaptiste Daroussin
2390c8de5b0SBaptiste DaroussinMeanwhile we expect that you use C++ to write grammar
2400c8de5b0SBaptiste Daroussinactions and everything else in grammar files.
2410c8de5b0SBaptiste DaroussinSince C is (in a sense) subset of C++, your C-based
2420c8de5b0SBaptiste Daroussingrammar may work if you use C++ compiler to compile it.
2430c8de5b0SBaptiste Daroussin
2440c8de5b0SBaptiste Daroussin		Version 3.0 bugs fixed
2450c8de5b0SBaptiste Daroussin		----------------------
2460c8de5b0SBaptiste Daroussin
2470c8de5b0SBaptiste DaroussinMatthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug:
2480c8de5b0SBaptiste DaroussinBtYacc does not correctly handle typenames, if one typename
2490c8de5b0SBaptiste Daroussinis a prefix of another one and if this type is used after
2500c8de5b0SBaptiste Daroussinthe longer one. In this case BTYacc produces invalid code.
2510c8de5b0SBaptiste Daroussin
2520c8de5b0SBaptiste Daroussin
2530c8de5b0SBaptiste Daroussin		Version 2.1 changes
2540c8de5b0SBaptiste Daroussin		-------------------
2550c8de5b0SBaptiste Daroussin		  by Vadim Maslov
2560c8de5b0SBaptiste Daroussin
2570c8de5b0SBaptiste Daroussin** Added preprocessor statements to BtYacc that are similar
2580c8de5b0SBaptiste Daroussinin function and behavior to C/C++ preprocessor statements.
2590c8de5b0SBaptiste Daroussin
2600c8de5b0SBaptiste DaroussinThese statements are used to:
2610c8de5b0SBaptiste Daroussin
2620c8de5b0SBaptiste Daroussin- Introduce modularity into a grammar by breaking it
2630c8de5b0SBaptiste Daroussin  into several *.y files and assembling different
2640c8de5b0SBaptiste Daroussin  grammars from the *.y modules using %include and %ifdef.
2650c8de5b0SBaptiste Daroussin
2660c8de5b0SBaptiste Daroussin- Have several versions of the same grammar
2670c8de5b0SBaptiste Daroussin  by using %ifdef and $endif.
2680c8de5b0SBaptiste Daroussin
2690c8de5b0SBaptiste Daroussin- To include automatically generated grammar fragment.
2700c8de5b0SBaptiste Daroussin  For instance, we use %include to include
2710c8de5b0SBaptiste Daroussin  automatically generated list of tokens.
2720c8de5b0SBaptiste Daroussin
2730c8de5b0SBaptiste DaroussinPreprocessor statements are:
2740c8de5b0SBaptiste Daroussin
2750c8de5b0SBaptiste Daroussin%define <var-name>
2760c8de5b0SBaptiste Daroussin	Define preprocessor variable named <var-name>.
2770c8de5b0SBaptiste Daroussin
2780c8de5b0SBaptiste Daroussin%ifdef <var-name>
2790c8de5b0SBaptiste Daroussin	If preprocessor variable named <var-name>
2800c8de5b0SBaptiste Daroussin	is defined by %define, then process the text from
2810c8de5b0SBaptiste Daroussin	this %ifdef to the closing %endif.
2820c8de5b0SBaptiste Daroussin
2830c8de5b0SBaptiste Daroussin%endif
2840c8de5b0SBaptiste Daroussin	Closing bracket for %ifdef preprocessor statement.
2850c8de5b0SBaptiste Daroussin	Only one nesting level of %ifdef-%endif is allowed.
2860c8de5b0SBaptiste Daroussin
2870c8de5b0SBaptiste Daroussin%include <file-name>
2880c8de5b0SBaptiste Daroussin	Process contents of the file named <file-name>.
2890c8de5b0SBaptiste Daroussin	If <file-name> is a relative name, it is looked up
2900c8de5b0SBaptiste Daroussin        in a directory in which btyacc was started.
2910c8de5b0SBaptiste Daroussin	Only one nesting level of %include is allowed.
2920c8de5b0SBaptiste Daroussin
2930c8de5b0SBaptiste Daroussin
2940c8de5b0SBaptiste Daroussin		Version 2.0 changes
2950c8de5b0SBaptiste Daroussin		-------------------
2960c8de5b0SBaptiste Daroussin		  by Vadim Maslov
2970c8de5b0SBaptiste Daroussin
2980c8de5b0SBaptiste Daroussin
2990c8de5b0SBaptiste Daroussin** Changed 16-bit short numbers to 32-bit int numbers in
3000c8de5b0SBaptiste Daroussingrammar tables, so that huge grammar tables (tables that
3010c8de5b0SBaptiste Daroussinare larger than 32768 elements) resulting from huge
3020c8de5b0SBaptiste Daroussingrammars (Cobol grammar, for instance) can work correctly.
3030c8de5b0SBaptiste DaroussinYou need to have 32-bit integer to index table bigger than
3040c8de5b0SBaptiste Daroussin32768 elements, 16-bit integer is not enough.
3050c8de5b0SBaptiste Daroussin
3060c8de5b0SBaptiste DaroussinThe original BtYacc just generated non-working tables
3070c8de5b0SBaptiste Daroussinlarger than 32768 elements without even notifying about
3080c8de5b0SBaptiste Daroussinthe table overflow.
3090c8de5b0SBaptiste Daroussin
3100c8de5b0SBaptiste Daroussin
3110c8de5b0SBaptiste Daroussin** Make error recovery work correctly when error happens
3120c8de5b0SBaptiste Daroussinwhile processing nested conflicts. Original BtYacc could
3130c8de5b0SBaptiste Daroussininfinitely cycle in certain situations that involved error
3140c8de5b0SBaptiste Daroussinrecovery while in nested conflict.
3150c8de5b0SBaptiste Daroussin
3160c8de5b0SBaptiste DaroussinMore detailed explanation: when we have nested conflicts
3170c8de5b0SBaptiste Daroussin(conflict that happens while trial-processing another
3180c8de5b0SBaptiste Daroussinconflict), it leads btyacc into NP-complete searching of
3190c8de5b0SBaptiste Daroussinconflict tree. The ultimate goal is YYVALID operator that
3200c8de5b0SBaptiste Daroussinselects a particular branch of that tree as a valid one.
3210c8de5b0SBaptiste Daroussin
3220c8de5b0SBaptiste DaroussinIf no YYVALID is found on the tree, then error recovery
3230c8de5b0SBaptiste Daroussintakes over.  The problem with this is that error recovery
3240c8de5b0SBaptiste Daroussinis started in the same state context that exists on the
3250c8de5b0SBaptiste Daroussinlast surveyed branch of the conflict tree.  Sometimes this
3260c8de5b0SBaptiste Daroussinlast branch may be of zero length and it results in
3270c8de5b0SBaptiste Daroussinrecovering to exactly the same state as existed before
3280c8de5b0SBaptiste Daroussinentering the conflict. BtYacc cycles then.
3290c8de5b0SBaptiste Daroussin
3300c8de5b0SBaptiste DaroussinWe solved this problem by memorizing the longest path in
3310c8de5b0SBaptiste Daroussinthe conflict tree while browsing it. If we ever get into
3320c8de5b0SBaptiste Daroussinerror recovery, we restore state that existed on the
3330c8de5b0SBaptiste Daroussinlongest path.  Effectively we say: if we have an error,
3340c8de5b0SBaptiste Daroussinlet us move forward as far as we possibly could while we
3350c8de5b0SBaptiste Daroussinwere browsing the conflict tree.
3360c8de5b0SBaptiste Daroussin
3370c8de5b0SBaptiste Daroussin
3380c8de5b0SBaptiste Daroussin** Introduce YYVALID_NESTED operation in addition to
3390c8de5b0SBaptiste Daroussinsimply YYVALID.  When we have a nested conflict (conflict
3400c8de5b0SBaptiste Daroussinwhile processing in trial mode for another conflict), we
3410c8de5b0SBaptiste Daroussinwant to relate YYVALID to a particular level of conflict
3420c8de5b0SBaptiste Daroussinbeing in trial.
3430c8de5b0SBaptiste Daroussin
3440c8de5b0SBaptiste DaroussinSince we mostly anticipate only 2-level nested conflicts
3450c8de5b0SBaptiste DaroussinYYVALID_NESTED tells the parser to satisfy only the
3460c8de5b0SBaptiste Daroussininternal conflict.  Therefore, in 1-level conflict
3470c8de5b0SBaptiste Daroussinsituation YYVALID_NESTED acts like a regular YYVALID, but
3480c8de5b0SBaptiste Daroussinin 2-level conflict it is a no-op and the other YYVALID
3490c8de5b0SBaptiste Daroussinfor outer conflict will be searched for.
3500c8de5b0SBaptiste Daroussin
3510c8de5b0SBaptiste Daroussin
3520c8de5b0SBaptiste Daroussin** Improved handling of situation where /tmp directory is
3530c8de5b0SBaptiste Daroussinmissing.  Original btyacc just died quietly when /tmp
3540c8de5b0SBaptiste Daroussindirectory was missing.  We added code that states the
3550c8de5b0SBaptiste Daroussinproblem explicitly. While on UNIX /tmp directory is always
3560c8de5b0SBaptiste Daroussinpresent, it may be missing on WIN32 systems, therefore
3570c8de5b0SBaptiste Daroussindiagnosing this situation is important.
3580c8de5b0SBaptiste Daroussin
3590c8de5b0SBaptiste Daroussin
3600c8de5b0SBaptiste Daroussin	Version 1.0 changes: BackTracking
3610c8de5b0SBaptiste Daroussin	=================================
3620c8de5b0SBaptiste Daroussin		by Chris Dodd
3630c8de5b0SBaptiste Daroussin
3640c8de5b0SBaptiste DaroussinBTYACC is a modified version of yacc that supports
3650c8de5b0SBaptiste Daroussinautomatic backtracking and semantic disambiguation to
3660c8de5b0SBaptiste Daroussinparse ambiguous grammars, as well as syntactic sugar for
3670c8de5b0SBaptiste Daroussininherited attributes (which tend to introduce conflicts).
3680c8de5b0SBaptiste DaroussinWhenever a btyacc generated parser runs into a
3690c8de5b0SBaptiste Daroussinshift-reduce or reduce-reduce error in the parse table, it
3700c8de5b0SBaptiste Daroussinremembers the current parse point (yacc stack and input
3710c8de5b0SBaptiste Daroussinstream state), and goes into trial parse mode.  It then
3720c8de5b0SBaptiste Daroussincontinues parsing, ignoring most rule actions.  If it runs
3730c8de5b0SBaptiste Daroussininto an error (either through the parse table or through
3740c8de5b0SBaptiste Daroussinan action calling YYERROR), it backtracks to the most
3750c8de5b0SBaptiste Daroussinrecent conflict point and tries a different alternative.
3760c8de5b0SBaptiste DaroussinIf it finds a successful parse (reaches the end of the
3770c8de5b0SBaptiste Daroussininput or an action calls YYVALID), it backtracks to the
3780c8de5b0SBaptiste Daroussinpoint where it first entered trial parse mode, and
3790c8de5b0SBaptiste Daroussincontinues with a full parse (executing all actions),
3800c8de5b0SBaptiste Daroussinfollowing the path of the successful trial.
3810c8de5b0SBaptiste Daroussin
3820c8de5b0SBaptiste DaroussinActions in btyacc come in two flavors -- {}-actions, which
3830c8de5b0SBaptiste Daroussinare only executed when not in trial mode, and []-actions
3840c8de5b0SBaptiste Daroussinwhich are executed regardless of mode.  There are also
3850c8de5b0SBaptiste Daroussininherited attributes, which look like arguments (they are
3860c8de5b0SBaptiste Daroussinenclosed in "()") and act like []-actions.
3870c8de5b0SBaptiste Daroussin
3880c8de5b0SBaptiste DaroussinWhat this buys you:
3890c8de5b0SBaptiste Daroussin
3900c8de5b0SBaptiste Daroussin* No more lexer feedback hack.  In yacc grammars for C, a
3910c8de5b0SBaptiste Daroussinstandard hack, know as the "lexer feedback hack" is used
3920c8de5b0SBaptiste Daroussinto find typedef names.  The lexer uses semantic
3930c8de5b0SBaptiste Daroussininformation to decide if any given identifier is a
3940c8de5b0SBaptiste Daroussintypedef-name or not and returns a special token.  With
3950c8de5b0SBaptiste Daroussinbtyacc, you no longer need to do this; the lexer should
3960c8de5b0SBaptiste Daroussinjust always return an identifier.  The btyacc grammar then
3970c8de5b0SBaptiste Daroussinneeds a rule of the form:
3980c8de5b0SBaptiste Daroussin
3990c8de5b0SBaptiste Daroussintypename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
4000c8de5b0SBaptiste Daroussin
4010c8de5b0SBaptiste DaroussinWhile the hack works adequately well for parsing C, it
4020c8de5b0SBaptiste Daroussinbecomes a nightmare when you try to parse something like
4030c8de5b0SBaptiste DaroussinC++, where treating an ID as a typedef becomes heavily
4040c8de5b0SBaptiste Daroussindependent on context.
4050c8de5b0SBaptiste Daroussin
4060c8de5b0SBaptiste Daroussin* Easy disambiguation via simple ordering.  Btyacc runs
4070c8de5b0SBaptiste Daroussinits trials via the rule "try shifting first, then try
4080c8de5b0SBaptiste Daroussinreducing by the order that the conflicting rules appear in
4090c8de5b0SBaptiste Daroussinthe input file".  This means you can deal with semantic a
4100c8de5b0SBaptiste Daroussindisambiguation rule like:
4110c8de5b0SBaptiste Daroussin    [1] If it looks like a declaration it is, otherwise
4120c8de5b0SBaptiste Daroussin    [2] If it looks like an expression it is, otherwise
4130c8de5b0SBaptiste Daroussin    [3] it is a syntax error
4140c8de5b0SBaptiste Daroussin	[Ellis&Stroustrup, Annotated C++ Reference Manual, p93]
4150c8de5b0SBaptiste Daroussin
4160c8de5b0SBaptiste DaroussinTo deal with this, you need only put all the rules for
4170c8de5b0SBaptiste Daroussindeclarations before the rules for expressions in the
4180c8de5b0SBaptiste Daroussingrammar file.
4190c8de5b0SBaptiste Daroussin
4200c8de5b0SBaptiste Daroussin* No extra cost if you do not use it.  Backtracking is
4210c8de5b0SBaptiste Daroussinonly triggered when the parse hits a shift/reduce or
4220c8de5b0SBaptiste Daroussinreduce/reduce conflict in the table.  If you have no
4230c8de5b0SBaptiste Daroussinconflicts in your grammar, there is no extra cost, other
4240c8de5b0SBaptiste Daroussinthan some extra code which will never be invoked.
4250c8de5b0SBaptiste Daroussin
4260c8de5b0SBaptiste Daroussin* C++ and ANSI C compatible parsers.  The parsers produced
4270c8de5b0SBaptiste Daroussinby btyacc can be compiled with C++ correctly.  If you
4280c8de5b0SBaptiste Daroussin"#define" YYSTYPE to be some C++ type with constructor and
4290c8de5b0SBaptiste Daroussindestructor, everything will work fine.  My favorite is
4300c8de5b0SBaptiste Daroussin"#define YYSTYPE SmartPointer", where SmartPointer is a
4310c8de5b0SBaptiste Daroussinsmart pointer type that does garbage collection on the
4320c8de5b0SBaptiste Daroussinpointed to objects.
4330c8de5b0SBaptiste Daroussin
4340c8de5b0SBaptiste DaroussinBTYACC was originally written to make it easy to write a
4350c8de5b0SBaptiste DaroussinC++ parser (my goal was to be able to use the grammar out
4360c8de5b0SBaptiste Daroussinof the back of the ARM with as few modifications as
4370c8de5b0SBaptiste Daroussinpossible).  Anyone who has ever looked at Jim Roskind
4380c8de5b0SBaptiste Daroussinpublic domain C++ yacc grammar, or the yacc-based grammar
4390c8de5b0SBaptiste Daroussinused in g++ knows how difficult this is.  BTYACC is very
4400c8de5b0SBaptiste Daroussinuseful for parsing any ambiguous grammar, particularly
4410c8de5b0SBaptiste Daroussinones that come from trying to merge two (or more) complete
4420c8de5b0SBaptiste Daroussingrammars.
4430c8de5b0SBaptiste Daroussin
4440c8de5b0SBaptiste DaroussinLimitations of the backtracking: Currently, the generated
4450c8de5b0SBaptiste Daroussinparser does NO pruning of alternate parsing paths.  To
4460c8de5b0SBaptiste Daroussinavoid an exponential explosion of possible paths (and
4470c8de5b0SBaptiste Daroussinparsing time), you need to manually tell the parser when
4480c8de5b0SBaptiste Daroussinit can throw away saved paths using YYVALID.  In practice,
4490c8de5b0SBaptiste Daroussinthis turns out to be fairly easy to do.  A C++ parser (for
4500c8de5b0SBaptiste Daroussinexample) can just put a [YYVALID;] after every complete
4510c8de5b0SBaptiste Daroussindeclaration and statement rule, corresponding to pruning
4520c8de5b0SBaptiste Daroussinthe backtracking state after seeing a ';' or '}' -- there
4530c8de5b0SBaptiste Daroussinwill never be a situation in which it is useful to
4540c8de5b0SBaptiste Daroussinbacktrack past either of these.
4550c8de5b0SBaptiste Daroussin
4560c8de5b0SBaptiste DaroussinInherited attributes in btyacc:
4570c8de5b0SBaptiste Daroussin
4580c8de5b0SBaptiste DaroussinInherited attributes look a lot like function arguments to
4590c8de5b0SBaptiste Daroussinnon-terminals, which is what they end up being in a
4600c8de5b0SBaptiste Daroussinrecursive descent parser, but NOT how they are implemented
4610c8de5b0SBaptiste Daroussinin btyacc.  Basically they are just syntactic sugar for
4620c8de5b0SBaptiste Daroussinembedded semantic actions and $0, $-1, ... in normal yacc.
4630c8de5b0SBaptiste Daroussinbtyacc gives you two big advantages besides just the
4640c8de5b0SBaptiste Daroussinsyntax:
4650c8de5b0SBaptiste Daroussin    1. it does type checking on the inherited attributes,
4660c8de5b0SBaptiste Daroussin       so you do not have to specify $<type>0 and makes sure
4670c8de5b0SBaptiste Daroussin       you give the correct number of arguments (inherited
4680c8de5b0SBaptiste Daroussin       attributes) to every use of a non-terminal.
4690c8de5b0SBaptiste Daroussin    2. It "collapses" identical actions from that are produced
4700c8de5b0SBaptiste Daroussin       from inherited attributes.  This eliminates many
4710c8de5b0SBaptiste Daroussin       potential reduce-reduce conflicts arising from
4720c8de5b0SBaptiste Daroussin       the inherited attributes.
4730c8de5b0SBaptiste Daroussin
4740c8de5b0SBaptiste DaroussinYou use inherited attributes by declaring the types of the
4750c8de5b0SBaptiste Daroussinattributes in the preamble with a type declaration and
4760c8de5b0SBaptiste Daroussindeclaring names of the attributes on the lhs of the yacc
4770c8de5b0SBaptiste Daroussinrule.  You can of course have more than one rule with the
4780c8de5b0SBaptiste Daroussinsame lhs, and you can even give them different names in
4790c8de5b0SBaptiste Daroussineach, but the type and number must be the same.
4800c8de5b0SBaptiste Daroussin
4810c8de5b0SBaptiste DaroussinHere is a small example:
4820c8de5b0SBaptiste Daroussin           /* lhs takes 2 inherited attributes */
4830c8de5b0SBaptiste Daroussin%type <t1> lhs(<t1>, <t2>)
4840c8de5b0SBaptiste Daroussin	   stuff(<t1>, <t2>)
4850c8de5b0SBaptiste Daroussin%%
4860c8de5b0SBaptiste Daroussinlhs($i1, $i2) : { $$ = $i1 }
4870c8de5b0SBaptiste Daroussin	      | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
4880c8de5b0SBaptiste Daroussin
4890c8de5b0SBaptiste DaroussinThis is roughly equivalent to the following yacc code:
4900c8de5b0SBaptiste Daroussinlhs :
4910c8de5b0SBaptiste Daroussin      { $$ = $<t1>-1; }
4920c8de5b0SBaptiste Daroussin    | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
4930c8de5b0SBaptiste Daroussin      { $$ = $4; }
4940c8de5b0SBaptiste Daroussin    ;
4950c8de5b0SBaptiste Daroussin
4960c8de5b0SBaptiste DaroussinSee the file "test/t2.y" for a longer and more complete
4970c8de5b0SBaptiste Daroussinexample.  At the current time, the start symbol cannot
4980c8de5b0SBaptiste Daroussinhave any arguments.
4990c8de5b0SBaptiste Daroussin
5000c8de5b0SBaptiste DaroussinVariant parsers:
5010c8de5b0SBaptiste Daroussin
5020c8de5b0SBaptiste DaroussinBtyacc supports the -S flag to use a different parser
5030c8de5b0SBaptiste Daroussinskeleton, changing the way that the parser is called and
5040c8de5b0SBaptiste Daroussinused.  The skeleton "push.skel" is included to produce a
5050c8de5b0SBaptiste Daroussin"passive" parser that you feed tokens to (rather than
5060c8de5b0SBaptiste Daroussinhaving the parser call a separate yylex routine).  With
5070c8de5b0SBaptiste Daroussinpush.skel, yyparse is defined as follows:
5080c8de5b0SBaptiste Daroussin
5090c8de5b0SBaptiste Daroussinint yyparse(int token, YYSTYPE yylval)
5100c8de5b0SBaptiste Daroussin
5110c8de5b0SBaptiste DaroussinYou should call yyparse repeatedly with successive tokens
5120c8de5b0SBaptiste Daroussinof input.  It returns 0 if more input is needed, 1 for a
5130c8de5b0SBaptiste Daroussinsuccessful parse, and -1 for an unrecoverable parse error.
5140c8de5b0SBaptiste Daroussin
5150c8de5b0SBaptiste Daroussin
5160c8de5b0SBaptiste Daroussin	Miscellaneous Features in ver. 1.0
5170c8de5b0SBaptiste Daroussin	----------------------------------
5180c8de5b0SBaptiste Daroussin		by Chris Dodd
5190c8de5b0SBaptiste Daroussin
5200c8de5b0SBaptiste Daroussin     The -r option has been implemented.  The -r option tells
5210c8de5b0SBaptiste DaroussinYacc to put the read-only tables in y.tab.c and the code and
5220c8de5b0SBaptiste Daroussinvariables in y.code.c.  Keith Bostic asked for this option so
5230c8de5b0SBaptiste Daroussinthat :yyfix could be eliminated.
5240c8de5b0SBaptiste Daroussin
5250c8de5b0SBaptiste Daroussin     The -l and -t options have been implemented.  The -l
5260c8de5b0SBaptiste Daroussinoption tells Yacc not to include #line directives in the code
5270c8de5b0SBaptiste Daroussinit produces.  The -t option causes debugging code to be
5280c8de5b0SBaptiste Daroussinincluded in the compiled parser.
5290c8de5b0SBaptiste Daroussin
5300c8de5b0SBaptiste Daroussin     The code for error recovery has been changed to
5310c8de5b0SBaptiste Daroussinimplement the same algorithm as AT&T Yacc.  There will still
5320c8de5b0SBaptiste Daroussinbe differences in the way error recovery works because AT&T
5330c8de5b0SBaptiste DaroussinYacc uses more default reductions than Berkeley Yacc.
5340c8de5b0SBaptiste Daroussin
5350c8de5b0SBaptiste Daroussin     The environment variable TMPDIR determines the directory
5360c8de5b0SBaptiste Daroussinwhere temporary files will be created.  If TMPDIR is defined,
5370c8de5b0SBaptiste Daroussintemporary files will be created in the directory whose
5380c8de5b0SBaptiste Daroussinpathname is the value of TMPDIR.  By default, temporary files
5390c8de5b0SBaptiste Daroussinare created in /tmp.
5400c8de5b0SBaptiste Daroussin
5410c8de5b0SBaptiste Daroussin     The keywords are now case-insensitive.  For example,
5420c8de5b0SBaptiste Daroussin%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
5430c8de5b0SBaptiste Daroussinall equivalent.
5440c8de5b0SBaptiste Daroussin
5450c8de5b0SBaptiste Daroussin     Commas and semicolons that are not part of C code are
5460c8de5b0SBaptiste Daroussintreated as commentary.
5470c8de5b0SBaptiste Daroussin
5480c8de5b0SBaptiste Daroussin     Line-end comments, as in BCPL, are permitted.  Line-end
5490c8de5b0SBaptiste Daroussincomments begin with // and end at the next end-of-line.
5500c8de5b0SBaptiste DaroussinLine-end comments are permitted in C code; they are converted
5510c8de5b0SBaptiste Daroussinto C comments on output.
5520c8de5b0SBaptiste Daroussin
5530c8de5b0SBaptiste Daroussin     The form of y.output files has been changed to look more
5540c8de5b0SBaptiste Daroussinlike those produced by AT&T Yacc.
5550c8de5b0SBaptiste Daroussin
5560c8de5b0SBaptiste Daroussin     A new kind of declaration has been added.
5570c8de5b0SBaptiste DaroussinThe form of the declaration is
5580c8de5b0SBaptiste Daroussin
5590c8de5b0SBaptiste Daroussin	  %ident string
5600c8de5b0SBaptiste Daroussin
5610c8de5b0SBaptiste Daroussinwhere string is a sequence of characters beginning with a
5620c8de5b0SBaptiste Daroussindouble quote and ending with either a double quote or the
5630c8de5b0SBaptiste Daroussinnext end-of-line, whichever comes first.  The declaration
5640c8de5b0SBaptiste Daroussinwill cause a #ident directive to be written near the start
5650c8de5b0SBaptiste Daroussinof the output file.
5660c8de5b0SBaptiste Daroussin
5670c8de5b0SBaptiste Daroussin     If a parser has been compiled with debugging code, that
5680c8de5b0SBaptiste Daroussincode can be enabled by setting an environment variable.
5690c8de5b0SBaptiste DaroussinIf the environment variable YYDEBUG is set to 0, debugging
5700c8de5b0SBaptiste Daroussinoutput is suppressed.  If it is set to 1, debugging output
5710c8de5b0SBaptiste Daroussinis written to standard output.
5720c8de5b0SBaptiste Daroussin
5730c8de5b0SBaptiste Daroussin
5740c8de5b0SBaptiste Daroussin		Building BtYacc
5750c8de5b0SBaptiste Daroussin		---------------
5760c8de5b0SBaptiste Daroussin	by Chris Dodd and Vadim Maslov
5770c8de5b0SBaptiste Daroussin
5780c8de5b0SBaptiste DaroussinWe used GCC and GNU make to compile BtYacc both on UNIX and
5790c8de5b0SBaptiste DaroussinWIN32 paltforms.  You are welcome to try different
5800c8de5b0SBaptiste Daroussincombinations of makes and compilers.  Most likely it will
5810c8de5b0SBaptiste Daroussinwork, but it may require Makefile changes.
5820c8de5b0SBaptiste Daroussin
5830c8de5b0SBaptiste DaroussinThere is no config script.
5840c8de5b0SBaptiste DaroussinJust type "make" and it should compile.
5850c8de5b0SBaptiste Daroussin
5860c8de5b0SBaptiste DaroussinAWK. If you want to change file btyaccpa.ske (backtracking
5870c8de5b0SBaptiste Daroussinparser skeleton), you will need awk to compile it into
5880c8de5b0SBaptiste Daroussinskeleton.c file. We used GNU AWK (gawk) version 3.0.
5890c8de5b0SBaptiste Daroussin
5900c8de5b0SBaptiste DaroussinIt is known that using older versions of gawk
5910c8de5b0SBaptiste Daroussinmay create problems in compilation, because older awks
5920c8de5b0SBaptiste Daroussinhave problems with backslashes at the end of a line.
5930c8de5b0SBaptiste Daroussin
5940c8de5b0SBaptiste DaroussinFor MSDOS, there a "makefile.dos" that should do the trick.
5950c8de5b0SBaptiste DaroussinNote: makefile.dos was not tested for a long time.
5960c8de5b0SBaptiste Daroussin
5970c8de5b0SBaptiste DaroussinThe result of compilation should be a single executable called
5980c8de5b0SBaptiste Daroussin"btyacc" which you can install anywhere you like;
5990c8de5b0SBaptiste Daroussinit does not require any other files in the distribution to run.
6000c8de5b0SBaptiste Daroussin
6010c8de5b0SBaptiste Daroussin
6020c8de5b0SBaptiste Daroussin	       Legal Stuff
6030c8de5b0SBaptiste Daroussin	       -----------
6040c8de5b0SBaptiste Daroussin	by Chris Dodd and Vadim Maslov
6050c8de5b0SBaptiste Daroussin
6060c8de5b0SBaptiste DaroussinIn English: BtYacc is freeware. BtYacc is distributed with
6070c8de5b0SBaptiste Daroussinno warranty whatsoever. The author and any other contributors
6080c8de5b0SBaptiste Daroussintake no responsibility for any and all consequences of its use.
6090c8de5b0SBaptiste Daroussin
6100c8de5b0SBaptiste DaroussinIn Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
6110c8de5b0SBaptiste DaroussinNOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
6120c8de5b0SBaptiste DaroussinLIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
6130c8de5b0SBaptiste DaroussinDAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
6140c8de5b0SBaptiste DaroussinDATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
6150c8de5b0SBaptiste DaroussinTHIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
6160c8de5b0SBaptiste DaroussinIF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
6170c8de5b0SBaptiste DaroussinDAMAGES.
618