xref: /freebsd/contrib/byacc/README.BTYACC (revision 6ae1554a5d9b318f8ad53ccc39fa5a961403da73)
1-- $Id: README.BTYACC,v 1.2 2014/04/22 08:18:57 Tom.Shields Exp $
2
3The original README from btyacc is below.
4
5The backtracking enhancements to byacc have been merged into Thomas Dickey's
6byacc baseline.
7
8The %include and %define/%ifdef enhancements described below are not currently
9incorporated.
10
11The position management functionality ("YYPOSN", "yyposn", "YYREDUCEPOSNFUNC",
12"YYREDUCEPOSNFUNCARG" & "YYCALLREDUCEPOSN") is replaced by a bison-compatible
13"%locations" implementation.
14
15The memory management functionality ("YYDELETEVAL" & "YYDELETEPOSN") is
16replaced by a bison-compatible "%destructor" implementation.
17
18The detailed syntax error processing functionality ("YYERROR_DETAILED"
19& "yyerror_detailed()") is subsumed by the bison-compatible "yyerror()"
20implementation, as modified by the %parse-param and %locations directives.
21
22The debugging macro "YYDBPR()" in the parser skeleton is renamed
23"YYSTYPE_TOSTRING()".
24
25-------------------------------------------------------------------------------
26	     BTYACC -- backtracking yacc
27	     ===========================
28
29BTYACC was created by Chris Dodd using ideas from many
30places and lots of code from the Berkeley Yacc
31distribution, which is a public domain yacc clone put
32together by the good folks at Berkeley.  This code is
33distributed with NO WARRANTY and is public domain.
34It is certain to contain bugs, which you should
35report to: chrisd@collins.com.
36
37Vadim Maslov of Siber Systems <vadik@siber.com>
38considerably modified BTYACC to make it suitable
39for production environment.
40
41Several people have suggested bug fixes that
42were incorporated into BtYacc.
43
44See the README.BYACC files for more about
45Berkeley Yacc and other sources of info.
46
47http://www.siber.com/btyacc/ is the current home of BtYacc.
48It is provided courtesy of Siber Systems http://www.siber.com/.
49
50
51		Version 3.0 changes
52		-------------------
53		  by Vadim Maslov
54
55Changes mostly occurred in btyaccpa.ske file that
56contains the parsing shift/reduce/backtrack algorithm.
57
58Version 3.0 innovations focus on:
59- text position computation and propagation,
60- industrial-strength error processing and recovery.
61
62
63** Added mechanism for computing and propagating
64text position of tokens and non-terminals.
65
66Compilers often need to build AST trees such that every node
67in a tree can relate to the parsed program source it came from.
68The following applications are very likely to need this:
69- debuggers that show actual source of the debugged program,
70- source-to-source translators that want
71  unchanged parts of the tree to generate the unchanged code.
72
73The new YYPOSN mechanism added in this version of BtYacc
74helps you in automating the text position computation
75and in assigning the computed text positions to the AST.
76This mechanism is successfully used in commercial
77parsers and source-to-source translators.
78
79In standard Yaccs every token and every non-terminal
80has an YYSTYPE semantic value attached to it.
81In this new version every token and every non-terminal
82also has an YYPOSN text position attached to it.
83YYPOSN is a user-defined type that can be anything and
84that has a meaning of text position attached to
85token or non-terminal.
86
87In addition to semantic value stack BtYacc now maintains
88text position stack. Behavior of the text position stack
89is similar to the behavior of the semantic value stack.
90
91If using text position mechanism,
92you need to define the following:
93
94YYPOSN	Preprocessor variable that contains C/C++ type of
95	the text position attached to
96	every token and non-terminal.
97
98yyposn  Global variable of type YYPOSN.
99        The lexer must assign text position of
100	the returned token to yyposn, just like it assigns
101	semantic value of the returned token to yylval.
102
103YYREDUCEPOSNFUNC
104	Preprocessor variable that points to function that
105	is called after the grammar rule reduction
106	to reduce text positions located on the stack.
107
108        This function is called by BtYacc to reduce text
109	positions. The function is called immediately after
110	the regular rule reduction occurs.
111
112	The function has the following prototype:
113	void ReducePosn(YYPOSN  &ret,
114			YYPOSN  *terms,
115			YYSTYPE *term_vals,
116			int      term_no,
117			int      stk_pos,
118			int      yychar,
119			YYPOSN  &yyposn,
120			UserType extra);
121
122        The function arguments are:
123        - ret
124	  Reference to the text position returned by
125          the rule. The function must write the computed
126          text position returned by the rule to ret.
127          This is analogue of the $$ semantic value.
128
129        - term_posns
130          Array of the right-hand side rule components
131	  YYPOSN text positions.  These are analogues of
132	  $1, $2, ..., $N in the text position world.
133
134        - term_vals
135	  Array of the right-hand side (RHS) rule components
136	  YYSTYPE values. These are the $1,...,$N themselves.
137
138        - term_no
139          Number of the components in RHS of the reduced rule.
140          Equal to size of arrays term_posns and term_vals.
141          Also equal to N in $1,...,$N in the reduced rule.
142
143        - stk_pos
144          YYSTYPE/YYPOSN stack position before the reduction.
145
146        - yychar
147          Lookahead token that immediately follows
148 	  the reduced RHS components.
149
150        - yyposn
151          YYPOSN of the token that immediately follows
152	  the reduced RHS components.
153
154        - extra
155          User-defined extra argument passed to ReducePosn.
156
157        Typically this function extracts text positions from
158	the right-hand side rule components and either
159	assigns them to the returned $$ structure/tree or
160        if no $$ value is returned, puts them into
161	the ret text position from where
162        it will be picked up by the later reduced rules.
163
164YYREDUCEPOSNFUNCARG
165	Extra user-defined argument passed to
166	the ReducePosn function. This argument can use
167	any variables defined in btyaccpa.ske.
168
169
170** Added code to btyaccpa.ske that automatically cleans up
171semantic semantic values and text positions of tokens
172and non-terminals that are discarded and deleted as
173a result of error processing.
174
175In the previous versions the discarded token and non-terminal
176semantic values were not cleaned that caused quite severe
177leaks.  The only way to fix it was to add garbage collection
178to YYSTYPE class.
179
180Now BtYacc skeleton calls delete functions for semantic
181values and positions of the discarded tokens and
182non-terminals.
183
184You need to define the following functions that BtYacc
185calls when it needs to delete semantic value or text position.
186
187YYDELETEVAL
188	User-defined function that is called by BtYacc
189	to delete semantic value of the token or non-terminal.
190
191	The user-defined function must have the prototype:
192	void DeleteYYval(YYSTYPE v, int type);
193	v    is semantic value to delete,
194	type is one of the following:
195	0 	discarding token
196	1       discarding state
197	2       cleaning up stack when aborting
198
199YYDELETEPOSN
200	User-defined function that is called by BtYacc
201	to delete text position of the token or non-terminal.
202
203	The user-defined function must have the prototype:
204	void DeleteYYposn(YYPOSN p, int type);
205	v    is semantic value to delete,
206	type is one of the following:
207	0 	discarding token
208	1       discarding state
209	2       cleaning up stack when aborting
210
211
212** User can define "detailed" syntax error processing
213function that reports an *exact* position of
214the token that caused the error.
215
216If you define preprocessor variable YYERROR_DETAILED in
217your grammar then you need define the following
218error processing function:
219
220void yyerror_detailed(char    *text,
221		      int      errt,
222		      YYSTYPE &errt_value,
223		      YYPOSN  &errt_posn);
224
225It receives the following arguments:
226text		Error message.
227errt		Code of the token that caused the error.
228errt_value	Value of the token that caused the error.
229errt_posn	Text position of token that caused error.
230
231
232** Dropped compatibility with C.
233
234Compatibility with C became increasingly difficult
235to maintain as new features were added to btyaccpa.ske.
236So we dropped it. If anybody wants to make the new version
237compatible with C, we would gladly accept the changes.
238
239Meanwhile we expect that you use C++ to write grammar
240actions and everything else in grammar files.
241Since C is (in a sense) subset of C++, your C-based
242grammar may work if you use C++ compiler to compile it.
243
244		Version 3.0 bugs fixed
245		----------------------
246
247Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug:
248BtYacc does not correctly handle typenames, if one typename
249is a prefix of another one and if this type is used after
250the longer one. In this case BTYacc produces invalid code.
251
252
253		Version 2.1 changes
254		-------------------
255		  by Vadim Maslov
256
257** Added preprocessor statements to BtYacc that are similar
258in function and behavior to C/C++ preprocessor statements.
259
260These statements are used to:
261
262- Introduce modularity into a grammar by breaking it
263  into several *.y files and assembling different
264  grammars from the *.y modules using %include and %ifdef.
265
266- Have several versions of the same grammar
267  by using %ifdef and $endif.
268
269- To include automatically generated grammar fragment.
270  For instance, we use %include to include
271  automatically generated list of tokens.
272
273Preprocessor statements are:
274
275%define <var-name>
276	Define preprocessor variable named <var-name>.
277
278%ifdef <var-name>
279	If preprocessor variable named <var-name>
280	is defined by %define, then process the text from
281	this %ifdef to the closing %endif.
282
283%endif
284	Closing bracket for %ifdef preprocessor statement.
285	Only one nesting level of %ifdef-%endif is allowed.
286
287%include <file-name>
288	Process contents of the file named <file-name>.
289	If <file-name> is a relative name, it is looked up
290        in a directory in which btyacc was started.
291	Only one nesting level of %include is allowed.
292
293
294		Version 2.0 changes
295		-------------------
296		  by Vadim Maslov
297
298
299** Changed 16-bit short numbers to 32-bit int numbers in
300grammar tables, so that huge grammar tables (tables that
301are larger than 32768 elements) resulting from huge
302grammars (Cobol grammar, for instance) can work correctly.
303You need to have 32-bit integer to index table bigger than
30432768 elements, 16-bit integer is not enough.
305
306The original BtYacc just generated non-working tables
307larger than 32768 elements without even notifying about
308the table overflow.
309
310
311** Make error recovery work correctly when error happens
312while processing nested conflicts. Original BtYacc could
313infinitely cycle in certain situations that involved error
314recovery while in nested conflict.
315
316More detailed explanation: when we have nested conflicts
317(conflict that happens while trial-processing another
318conflict), it leads btyacc into NP-complete searching of
319conflict tree. The ultimate goal is YYVALID operator that
320selects a particular branch of that tree as a valid one.
321
322If no YYVALID is found on the tree, then error recovery
323takes over.  The problem with this is that error recovery
324is started in the same state context that exists on the
325last surveyed branch of the conflict tree.  Sometimes this
326last branch may be of zero length and it results in
327recovering to exactly the same state as existed before
328entering the conflict. BtYacc cycles then.
329
330We solved this problem by memorizing the longest path in
331the conflict tree while browsing it. If we ever get into
332error recovery, we restore state that existed on the
333longest path.  Effectively we say: if we have an error,
334let us move forward as far as we possibly could while we
335were browsing the conflict tree.
336
337
338** Introduce YYVALID_NESTED operation in addition to
339simply YYVALID.  When we have a nested conflict (conflict
340while processing in trial mode for another conflict), we
341want to relate YYVALID to a particular level of conflict
342being in trial.
343
344Since we mostly anticipate only 2-level nested conflicts
345YYVALID_NESTED tells the parser to satisfy only the
346internal conflict.  Therefore, in 1-level conflict
347situation YYVALID_NESTED acts like a regular YYVALID, but
348in 2-level conflict it is a no-op and the other YYVALID
349for outer conflict will be searched for.
350
351
352** Improved handling of situation where /tmp directory is
353missing.  Original btyacc just died quietly when /tmp
354directory was missing.  We added code that states the
355problem explicitly. While on UNIX /tmp directory is always
356present, it may be missing on WIN32 systems, therefore
357diagnosing this situation is important.
358
359
360	Version 1.0 changes: BackTracking
361	=================================
362		by Chris Dodd
363
364BTYACC is a modified version of yacc that supports
365automatic backtracking and semantic disambiguation to
366parse ambiguous grammars, as well as syntactic sugar for
367inherited attributes (which tend to introduce conflicts).
368Whenever a btyacc generated parser runs into a
369shift-reduce or reduce-reduce error in the parse table, it
370remembers the current parse point (yacc stack and input
371stream state), and goes into trial parse mode.  It then
372continues parsing, ignoring most rule actions.  If it runs
373into an error (either through the parse table or through
374an action calling YYERROR), it backtracks to the most
375recent conflict point and tries a different alternative.
376If it finds a successful parse (reaches the end of the
377input or an action calls YYVALID), it backtracks to the
378point where it first entered trial parse mode, and
379continues with a full parse (executing all actions),
380following the path of the successful trial.
381
382Actions in btyacc come in two flavors -- {}-actions, which
383are only executed when not in trial mode, and []-actions
384which are executed regardless of mode.  There are also
385inherited attributes, which look like arguments (they are
386enclosed in "()") and act like []-actions.
387
388What this buys you:
389
390* No more lexer feedback hack.  In yacc grammars for C, a
391standard hack, know as the "lexer feedback hack" is used
392to find typedef names.  The lexer uses semantic
393information to decide if any given identifier is a
394typedef-name or not and returns a special token.  With
395btyacc, you no longer need to do this; the lexer should
396just always return an identifier.  The btyacc grammar then
397needs a rule of the form:
398
399typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
400
401While the hack works adequately well for parsing C, it
402becomes a nightmare when you try to parse something like
403C++, where treating an ID as a typedef becomes heavily
404dependent on context.
405
406* Easy disambiguation via simple ordering.  Btyacc runs
407its trials via the rule "try shifting first, then try
408reducing by the order that the conflicting rules appear in
409the input file".  This means you can deal with semantic a
410disambiguation rule like:
411    [1] If it looks like a declaration it is, otherwise
412    [2] If it looks like an expression it is, otherwise
413    [3] it is a syntax error
414	[Ellis&Stroustrup, Annotated C++ Reference Manual, p93]
415
416To deal with this, you need only put all the rules for
417declarations before the rules for expressions in the
418grammar file.
419
420* No extra cost if you do not use it.  Backtracking is
421only triggered when the parse hits a shift/reduce or
422reduce/reduce conflict in the table.  If you have no
423conflicts in your grammar, there is no extra cost, other
424than some extra code which will never be invoked.
425
426* C++ and ANSI C compatible parsers.  The parsers produced
427by btyacc can be compiled with C++ correctly.  If you
428"#define" YYSTYPE to be some C++ type with constructor and
429destructor, everything will work fine.  My favorite is
430"#define YYSTYPE SmartPointer", where SmartPointer is a
431smart pointer type that does garbage collection on the
432pointed to objects.
433
434BTYACC was originally written to make it easy to write a
435C++ parser (my goal was to be able to use the grammar out
436of the back of the ARM with as few modifications as
437possible).  Anyone who has ever looked at Jim Roskind
438public domain C++ yacc grammar, or the yacc-based grammar
439used in g++ knows how difficult this is.  BTYACC is very
440useful for parsing any ambiguous grammar, particularly
441ones that come from trying to merge two (or more) complete
442grammars.
443
444Limitations of the backtracking: Currently, the generated
445parser does NO pruning of alternate parsing paths.  To
446avoid an exponential explosion of possible paths (and
447parsing time), you need to manually tell the parser when
448it can throw away saved paths using YYVALID.  In practice,
449this turns out to be fairly easy to do.  A C++ parser (for
450example) can just put a [YYVALID;] after every complete
451declaration and statement rule, corresponding to pruning
452the backtracking state after seeing a ';' or '}' -- there
453will never be a situation in which it is useful to
454backtrack past either of these.
455
456Inherited attributes in btyacc:
457
458Inherited attributes look a lot like function arguments to
459non-terminals, which is what they end up being in a
460recursive descent parser, but NOT how they are implemented
461in btyacc.  Basically they are just syntactic sugar for
462embedded semantic actions and $0, $-1, ... in normal yacc.
463btyacc gives you two big advantages besides just the
464syntax:
465    1. it does type checking on the inherited attributes,
466       so you do not have to specify $<type>0 and makes sure
467       you give the correct number of arguments (inherited
468       attributes) to every use of a non-terminal.
469    2. It "collapses" identical actions from that are produced
470       from inherited attributes.  This eliminates many
471       potential reduce-reduce conflicts arising from
472       the inherited attributes.
473
474You use inherited attributes by declaring the types of the
475attributes in the preamble with a type declaration and
476declaring names of the attributes on the lhs of the yacc
477rule.  You can of course have more than one rule with the
478same lhs, and you can even give them different names in
479each, but the type and number must be the same.
480
481Here is a small example:
482           /* lhs takes 2 inherited attributes */
483%type <t1> lhs(<t1>, <t2>)
484	   stuff(<t1>, <t2>)
485%%
486lhs($i1, $i2) : { $$ = $i1 }
487	      | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
488
489This is roughly equivalent to the following yacc code:
490lhs :
491      { $$ = $<t1>-1; }
492    | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
493      { $$ = $4; }
494    ;
495
496See the file "test/t2.y" for a longer and more complete
497example.  At the current time, the start symbol cannot
498have any arguments.
499
500Variant parsers:
501
502Btyacc supports the -S flag to use a different parser
503skeleton, changing the way that the parser is called and
504used.  The skeleton "push.skel" is included to produce a
505"passive" parser that you feed tokens to (rather than
506having the parser call a separate yylex routine).  With
507push.skel, yyparse is defined as follows:
508
509int yyparse(int token, YYSTYPE yylval)
510
511You should call yyparse repeatedly with successive tokens
512of input.  It returns 0 if more input is needed, 1 for a
513successful parse, and -1 for an unrecoverable parse error.
514
515
516	Miscellaneous Features in ver. 1.0
517	----------------------------------
518		by Chris Dodd
519
520     The -r option has been implemented.  The -r option tells
521Yacc to put the read-only tables in y.tab.c and the code and
522variables in y.code.c.  Keith Bostic asked for this option so
523that :yyfix could be eliminated.
524
525     The -l and -t options have been implemented.  The -l
526option tells Yacc not to include #line directives in the code
527it produces.  The -t option causes debugging code to be
528included in the compiled parser.
529
530     The code for error recovery has been changed to
531implement the same algorithm as AT&T Yacc.  There will still
532be differences in the way error recovery works because AT&T
533Yacc uses more default reductions than Berkeley Yacc.
534
535     The environment variable TMPDIR determines the directory
536where temporary files will be created.  If TMPDIR is defined,
537temporary files will be created in the directory whose
538pathname is the value of TMPDIR.  By default, temporary files
539are created in /tmp.
540
541     The keywords are now case-insensitive.  For example,
542%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
543all equivalent.
544
545     Commas and semicolons that are not part of C code are
546treated as commentary.
547
548     Line-end comments, as in BCPL, are permitted.  Line-end
549comments begin with // and end at the next end-of-line.
550Line-end comments are permitted in C code; they are converted
551to C comments on output.
552
553     The form of y.output files has been changed to look more
554like those produced by AT&T Yacc.
555
556     A new kind of declaration has been added.
557The form of the declaration is
558
559	  %ident string
560
561where string is a sequence of characters beginning with a
562double quote and ending with either a double quote or the
563next end-of-line, whichever comes first.  The declaration
564will cause a #ident directive to be written near the start
565of the output file.
566
567     If a parser has been compiled with debugging code, that
568code can be enabled by setting an environment variable.
569If the environment variable YYDEBUG is set to 0, debugging
570output is suppressed.  If it is set to 1, debugging output
571is written to standard output.
572
573
574		Building BtYacc
575		---------------
576	by Chris Dodd and Vadim Maslov
577
578We used GCC and GNU make to compile BtYacc both on UNIX and
579WIN32 paltforms.  You are welcome to try different
580combinations of makes and compilers.  Most likely it will
581work, but it may require Makefile changes.
582
583There is no config script.
584Just type "make" and it should compile.
585
586AWK. If you want to change file btyaccpa.ske (backtracking
587parser skeleton), you will need awk to compile it into
588skeleton.c file. We used GNU AWK (gawk) version 3.0.
589
590It is known that using older versions of gawk
591may create problems in compilation, because older awks
592have problems with backslashes at the end of a line.
593
594For MSDOS, there a "makefile.dos" that should do the trick.
595Note: makefile.dos was not tested for a long time.
596
597The result of compilation should be a single executable called
598"btyacc" which you can install anywhere you like;
599it does not require any other files in the distribution to run.
600
601
602	       Legal Stuff
603	       -----------
604	by Chris Dodd and Vadim Maslov
605
606In English: BtYacc is freeware. BtYacc is distributed with
607no warranty whatsoever. The author and any other contributors
608take no responsibility for any and all consequences of its use.
609
610In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
611NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
612LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
613DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
614DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
615THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
616IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
617DAMAGES.
618