1-- $Id: README.BTYACC,v 1.1 2014/03/25 19:21:31 Tom.Shields Exp $ 2 3The original README from btyacc is below. 4 5The backtracking enhancements to byacc have been merged into Thomas Dickey's 6byacc baseline. 7 8The %include and %define/%ifdef enhancements described below are not currently 9incorporated. 10 11------------------------------------------------------------------------------- 12 BTYACC -- backtracking yacc 13 =========================== 14 15BTYACC was created by Chris Dodd using ideas from many 16places and lots of code from the Berkeley Yacc 17distribution, which is a public domain yacc clone put 18together by the good folks at Berkeley. This code is 19distributed with NO WARRANTY and is public domain. 20It is certain to contain bugs, which you should 21report to: chrisd@collins.com. 22 23Vadim Maslov of Siber Systems <vadik@siber.com> 24considerably modified BTYACC to make it suitable 25for production environment. 26 27Several people have suggested bug fixes that 28were incorporated into BtYacc. 29 30See the README.BYACC files for more about 31Berkeley Yacc and other sources of info. 32 33http://www.siber.com/btyacc/ is the current home of BtYacc. 34It is provided courtesy of Siber Systems http://www.siber.com/. 35 36 37 Version 3.0 changes 38 ------------------- 39 by Vadim Maslov 40 41Changes mostly occurred in btyaccpa.ske file that 42contains the parsing shift/reduce/backtrack algorithm. 43 44Version 3.0 innovations focus on: 45- text position computation and propagation, 46- industrial-strength error processing and recovery. 47 48 49** Added mechanism for computing and propagating 50text position of tokens and non-terminals. 51 52Compilers often need to build AST trees such that every node 53in a tree can relate to the parsed program source it came from. 54The following applications are very likely to need this: 55- debuggers that show actual source of the debugged program, 56- source-to-source translators that want 57 unchanged parts of the tree to generate the unchanged code. 58 59The new YYPOSN mechanism added in this version of BtYacc 60helps you in automating the text position computation 61and in assigning the computed text positions to the AST. 62This mechanism is successfully used in commercial 63parsers and source-to-source translators. 64 65In standard Yaccs every token and every non-terminal 66has an YYSTYPE semantic value attached to it. 67In this new version every token and every non-terminal 68also has an YYPOSN text position attached to it. 69YYPOSN is a user-defined type that can be anything and 70that has a meaning of text position attached to 71token or non-terminal. 72 73In addition to semantic value stack BtYacc now maintains 74text position stack. Behavior of the text position stack 75is similar to the behavior of the semantic value stack. 76 77If using text position mechanism, 78you need to define the following: 79 80YYPOSN Preprocessor variable that contains C/C++ type of 81 the text position attached to 82 every token and non-terminal. 83 84yyposn Global variable of type YYPOSN. 85 The lexer must assign text position of 86 the returned token to yyposn, just like it assigns 87 semantic value of the returned token to yylval. 88 89YYREDUCEPOSNFUNC 90 Preprocessor variable that points to function that 91 is called after the grammar rule reduction 92 to reduce text positions located on the stack. 93 94 This function is called by BtYacc to reduce text 95 positions. The function is called immediately after 96 the regular rule reduction occurs. 97 98 The function has the following prototype: 99 void ReducePosn(YYPOSN &ret, 100 YYPOSN *terms, 101 YYSTYPE *term_vals, 102 int term_no, 103 int stk_pos, 104 int yychar, 105 YYPOSN &yyposn, 106 UserType extra); 107 108 The function arguments are: 109 - ret 110 Reference to the text position returned by 111 the rule. The function must write the computed 112 text position returned by the rule to ret. 113 This is analogue of the $$ semantic value. 114 115 - term_posns 116 Array of the right-hand side rule components 117 YYPOSN text positions. These are analogues of 118 $1, $2, ..., $N in the text position world. 119 120 - term_vals 121 Array of the right-hand side (RHS) rule components 122 YYSTYPE values. These are the $1,...,$N themselves. 123 124 - term_no 125 Number of the components in RHS of the reduced rule. 126 Equal to size of arrays term_posns and term_vals. 127 Also equal to N in $1,...,$N in the reduced rule. 128 129 - stk_pos 130 YYSTYPE/YYPOSN stack position before the reduction. 131 132 - yychar 133 Lookahead token that immediately follows 134 the reduced RHS components. 135 136 - yyposn 137 YYPOSN of the token that immediately follows 138 the reduced RHS components. 139 140 - extra 141 User-defined extra argument passed to ReducePosn. 142 143 Typically this function extracts text positions from 144 the right-hand side rule components and either 145 assigns them to the returned $$ structure/tree or 146 if no $$ value is returned, puts them into 147 the ret text position from where 148 it will be picked up by the later reduced rules. 149 150YYREDUCEPOSNFUNCARG 151 Extra user-defined argument passed to 152 the ReducePosn function. This argument can use 153 any variables defined in btyaccpa.ske. 154 155 156** Added code to btyaccpa.ske that automatically cleans up 157semantic semantic values and text positions of tokens 158and non-terminals that are discarded and deleted as 159a result of error processing. 160 161In the previous versions the discarded token and non-terminal 162semantic values were not cleaned that caused quite severe 163leaks. The only way to fix it was to add garbage collection 164to YYSTYPE class. 165 166Now BtYacc skeleton calls delete functions for semantic 167values and positions of the discarded tokens and 168non-terminals. 169 170You need to define the following functions that BtYacc 171calls when it needs to delete semantic value or text position. 172 173YYDELETEVAL 174 User-defined function that is called by BtYacc 175 to delete semantic value of the token or non-terminal. 176 177 The user-defined function must have the prototype: 178 void DeleteYYval(YYSTYPE v, int type); 179 v is semantic value to delete, 180 type is one of the following: 181 0 discarding token 182 1 discarding state 183 2 cleaning up stack when aborting 184 185YYDELETEPOSN 186 User-defined function that is called by BtYacc 187 to delete text position of the token or non-terminal. 188 189 The user-defined function must have the prototype: 190 void DeleteYYposn(YYPOSN p, int type); 191 v is semantic value to delete, 192 type is one of the following: 193 0 discarding token 194 1 discarding state 195 2 cleaning up stack when aborting 196 197 198** User can define "detailed" syntax error processing 199function that reports an *exact* position of 200the token that caused the error. 201 202If you define preprocessor variable YYERROR_DETAILED in 203your grammar then you need define the following 204error processing function: 205 206void yyerror_detailed(char *text, 207 int errt, 208 YYSTYPE &errt_value, 209 YYPOSN &errt_posn); 210 211It receives the following arguments: 212text Error message. 213errt Code of the token that caused the error. 214errt_value Value of the token that caused the error. 215errt_posn Text position of token that caused error. 216 217 218** Dropped compatibility with C. 219 220Compatibility with C became increasingly difficult 221to maintain as new features were added to btyaccpa.ske. 222So we dropped it. If anybody wants to make the new version 223compatible with C, we would gladly accept the changes. 224 225Meanwhile we expect that you use C++ to write grammar 226actions and everything else in grammar files. 227Since C is (in a sense) subset of C++, your C-based 228grammar may work if you use C++ compiler to compile it. 229 230 Version 3.0 bugs fixed 231 ---------------------- 232 233Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug: 234BtYacc does not correctly handle typenames, if one typename 235is a prefix of another one and if this type is used after 236the longer one. In this case BTYacc produces invalid code. 237 238 239 Version 2.1 changes 240 ------------------- 241 by Vadim Maslov 242 243** Added preprocessor statements to BtYacc that are similar 244in function and behavior to C/C++ preprocessor statements. 245 246These statements are used to: 247 248- Introduce modularity into a grammar by breaking it 249 into several *.y files and assembling different 250 grammars from the *.y modules using %include and %ifdef. 251 252- Have several versions of the same grammar 253 by using %ifdef and $endif. 254 255- To include automatically generated grammar fragment. 256 For instance, we use %include to include 257 automatically generated list of tokens. 258 259Preprocessor statements are: 260 261%define <var-name> 262 Define preprocessor variable named <var-name>. 263 264%ifdef <var-name> 265 If preprocessor variable named <var-name> 266 is defined by %define, then process the text from 267 this %ifdef to the closing %endif. 268 269%endif 270 Closing bracket for %ifdef preprocessor statement. 271 Only one nesting level of %ifdef-%endif is allowed. 272 273%include <file-name> 274 Process contents of the file named <file-name>. 275 If <file-name> is a relative name, it is looked up 276 in a directory in which btyacc was started. 277 Only one nesting level of %include is allowed. 278 279 280 Version 2.0 changes 281 ------------------- 282 by Vadim Maslov 283 284 285** Changed 16-bit short numbers to 32-bit int numbers in 286grammar tables, so that huge grammar tables (tables that 287are larger than 32768 elements) resulting from huge 288grammars (Cobol grammar, for instance) can work correctly. 289You need to have 32-bit integer to index table bigger than 29032768 elements, 16-bit integer is not enough. 291 292The original BtYacc just generated non-working tables 293larger than 32768 elements without even notifying about 294the table overflow. 295 296 297** Make error recovery work correctly when error happens 298while processing nested conflicts. Original BtYacc could 299infinitely cycle in certain situations that involved error 300recovery while in nested conflict. 301 302More detailed explanation: when we have nested conflicts 303(conflict that happens while trial-processing another 304conflict), it leads btyacc into NP-complete searching of 305conflict tree. The ultimate goal is YYVALID operator that 306selects a particular branch of that tree as a valid one. 307 308If no YYVALID is found on the tree, then error recovery 309takes over. The problem with this is that error recovery 310is started in the same state context that exists on the 311last surveyed branch of the conflict tree. Sometimes this 312last branch may be of zero length and it results in 313recovering to exactly the same state as existed before 314entering the conflict. BtYacc cycles then. 315 316We solved this problem by memorizing the longest path in 317the conflict tree while browsing it. If we ever get into 318error recovery, we restore state that existed on the 319longest path. Effectively we say: if we have an error, 320let us move forward as far as we possibly could while we 321were browsing the conflict tree. 322 323 324** Introduce YYVALID_NESTED operation in addition to 325simply YYVALID. When we have a nested conflict (conflict 326while processing in trial mode for another conflict), we 327want to relate YYVALID to a particular level of conflict 328being in trial. 329 330Since we mostly anticipate only 2-level nested conflicts 331YYVALID_NESTED tells the parser to satisfy only the 332internal conflict. Therefore, in 1-level conflict 333situation YYVALID_NESTED acts like a regular YYVALID, but 334in 2-level conflict it is a no-op and the other YYVALID 335for outer conflict will be searched for. 336 337 338** Improved handling of situation where /tmp directory is 339missing. Original btyacc just died quietly when /tmp 340directory was missing. We added code that states the 341problem explicitly. While on UNIX /tmp directory is always 342present, it may be missing on WIN32 systems, therefore 343diagnosing this situation is important. 344 345 346 Version 1.0 changes: BackTracking 347 ================================= 348 by Chris Dodd 349 350BTYACC is a modified version of yacc that supports 351automatic backtracking and semantic disambiguation to 352parse ambiguous grammars, as well as syntactic sugar for 353inherited attributes (which tend to introduce conflicts). 354Whenever a btyacc generated parser runs into a 355shift-reduce or reduce-reduce error in the parse table, it 356remembers the current parse point (yacc stack and input 357stream state), and goes into trial parse mode. It then 358continues parsing, ignoring most rule actions. If it runs 359into an error (either through the parse table or through 360an action calling YYERROR), it backtracks to the most 361recent conflict point and tries a different alternative. 362If it finds a successful parse (reaches the end of the 363input or an action calls YYVALID), it backtracks to the 364point where it first entered trial parse mode, and 365continues with a full parse (executing all actions), 366following the path of the successful trial. 367 368Actions in btyacc come in two flavors -- {}-actions, which 369are only executed when not in trial mode, and []-actions 370which are executed regardless of mode. There are also 371inherited attributes, which look like arguments (they are 372enclosed in "()") and act like []-actions. 373 374What this buys you: 375 376* No more lexer feedback hack. In yacc grammars for C, a 377standard hack, know as the "lexer feedback hack" is used 378to find typedef names. The lexer uses semantic 379information to decide if any given identifier is a 380typedef-name or not and returns a special token. With 381btyacc, you no longer need to do this; the lexer should 382just always return an identifier. The btyacc grammar then 383needs a rule of the form: 384 385typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ] 386 387While the hack works adequately well for parsing C, it 388becomes a nightmare when you try to parse something like 389C++, where treating an ID as a typedef becomes heavily 390dependent on context. 391 392* Easy disambiguation via simple ordering. Btyacc runs 393its trials via the rule "try shifting first, then try 394reducing by the order that the conflicting rules appear in 395the input file". This means you can deal with semantic a 396disambiguation rule like: 397 [1] If it looks like a declaration it is, otherwise 398 [2] If it looks like an expression it is, otherwise 399 [3] it is a syntax error 400 [Ellis&Stroustrup, Annotated C++ Reference Manual, p93] 401 402To deal with this, you need only put all the rules for 403declarations before the rules for expressions in the 404grammar file. 405 406* No extra cost if you do not use it. Backtracking is 407only triggered when the parse hits a shift/reduce or 408reduce/reduce conflict in the table. If you have no 409conflicts in your grammar, there is no extra cost, other 410than some extra code which will never be invoked. 411 412* C++ and ANSI C compatible parsers. The parsers produced 413by btyacc can be compiled with C++ correctly. If you 414"#define" YYSTYPE to be some C++ type with constructor and 415destructor, everything will work fine. My favorite is 416"#define YYSTYPE SmartPointer", where SmartPointer is a 417smart pointer type that does garbage collection on the 418pointed to objects. 419 420BTYACC was originally written to make it easy to write a 421C++ parser (my goal was to be able to use the grammar out 422of the back of the ARM with as few modifications as 423possible). Anyone who has ever looked at Jim Roskind 424public domain C++ yacc grammar, or the yacc-based grammar 425used in g++ knows how difficult this is. BTYACC is very 426useful for parsing any ambiguous grammar, particularly 427ones that come from trying to merge two (or more) complete 428grammars. 429 430Limitations of the backtracking: Currently, the generated 431parser does NO pruning of alternate parsing paths. To 432avoid an exponential explosion of possible paths (and 433parsing time), you need to manually tell the parser when 434it can throw away saved paths using YYVALID. In practice, 435this turns out to be fairly easy to do. A C++ parser (for 436example) can just put a [YYVALID;] after every complete 437declaration and statement rule, corresponding to pruning 438the backtracking state after seeing a ';' or '}' -- there 439will never be a situation in which it is useful to 440backtrack past either of these. 441 442Inherited attributes in btyacc: 443 444Inherited attributes look a lot like function arguments to 445non-terminals, which is what they end up being in a 446recursive descent parser, but NOT how they are implemented 447in btyacc. Basically they are just syntactic sugar for 448embedded semantic actions and $0, $-1, ... in normal yacc. 449btyacc gives you two big advantages besides just the 450syntax: 451 1. it does type checking on the inherited attributes, 452 so you do not have to specify $<type>0 and makes sure 453 you give the correct number of arguments (inherited 454 attributes) to every use of a non-terminal. 455 2. It "collapses" identical actions from that are produced 456 from inherited attributes. This eliminates many 457 potential reduce-reduce conflicts arising from 458 the inherited attributes. 459 460You use inherited attributes by declaring the types of the 461attributes in the preamble with a type declaration and 462declaring names of the attributes on the lhs of the yacc 463rule. You can of course have more than one rule with the 464same lhs, and you can even give them different names in 465each, but the type and number must be the same. 466 467Here is a small example: 468 /* lhs takes 2 inherited attributes */ 469%type <t1> lhs(<t1>, <t2>) 470 stuff(<t1>, <t2>) 471%% 472lhs($i1, $i2) : { $$ = $i1 } 473 | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; } 474 475This is roughly equivalent to the following yacc code: 476lhs : 477 { $$ = $<t1>-1; } 478 | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff 479 { $$ = $4; } 480 ; 481 482See the file "test/t2.y" for a longer and more complete 483example. At the current time, the start symbol cannot 484have any arguments. 485 486Variant parsers: 487 488Btyacc supports the -S flag to use a different parser 489skeleton, changing the way that the parser is called and 490used. The skeleton "push.skel" is included to produce a 491"passive" parser that you feed tokens to (rather than 492having the parser call a separate yylex routine). With 493push.skel, yyparse is defined as follows: 494 495int yyparse(int token, YYSTYPE yylval) 496 497You should call yyparse repeatedly with successive tokens 498of input. It returns 0 if more input is needed, 1 for a 499successful parse, and -1 for an unrecoverable parse error. 500 501 502 Miscellaneous Features in ver. 1.0 503 ---------------------------------- 504 by Chris Dodd 505 506 The -r option has been implemented. The -r option tells 507Yacc to put the read-only tables in y.tab.c and the code and 508variables in y.code.c. Keith Bostic asked for this option so 509that :yyfix could be eliminated. 510 511 The -l and -t options have been implemented. The -l 512option tells Yacc not to include #line directives in the code 513it produces. The -t option causes debugging code to be 514included in the compiled parser. 515 516 The code for error recovery has been changed to 517implement the same algorithm as AT&T Yacc. There will still 518be differences in the way error recovery works because AT&T 519Yacc uses more default reductions than Berkeley Yacc. 520 521 The environment variable TMPDIR determines the directory 522where temporary files will be created. If TMPDIR is defined, 523temporary files will be created in the directory whose 524pathname is the value of TMPDIR. By default, temporary files 525are created in /tmp. 526 527 The keywords are now case-insensitive. For example, 528%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are 529all equivalent. 530 531 Commas and semicolons that are not part of C code are 532treated as commentary. 533 534 Line-end comments, as in BCPL, are permitted. Line-end 535comments begin with // and end at the next end-of-line. 536Line-end comments are permitted in C code; they are converted 537to C comments on output. 538 539 The form of y.output files has been changed to look more 540like those produced by AT&T Yacc. 541 542 A new kind of declaration has been added. 543The form of the declaration is 544 545 %ident string 546 547where string is a sequence of characters beginning with a 548double quote and ending with either a double quote or the 549next end-of-line, whichever comes first. The declaration 550will cause a #ident directive to be written near the start 551of the output file. 552 553 If a parser has been compiled with debugging code, that 554code can be enabled by setting an environment variable. 555If the environment variable YYDEBUG is set to 0, debugging 556output is suppressed. If it is set to 1, debugging output 557is written to standard output. 558 559 560 Building BtYacc 561 --------------- 562 by Chris Dodd and Vadim Maslov 563 564We used GCC and GNU make to compile BtYacc both on UNIX and 565WIN32 paltforms. You are welcome to try different 566combinations of makes and compilers. Most likely it will 567work, but it may require Makefile changes. 568 569There is no config script. 570Just type "make" and it should compile. 571 572AWK. If you want to change file btyaccpa.ske (backtracking 573parser skeleton), you will need awk to compile it into 574skeleton.c file. We used GNU AWK (gawk) version 3.0. 575 576It is known that using older versions of gawk 577may create problems in compilation, because older awks 578have problems with backslashes at the end of a line. 579 580For MSDOS, there a "makefile.dos" that should do the trick. 581Note: makefile.dos was not tested for a long time. 582 583The result of compilation should be a single executable called 584"btyacc" which you can install anywhere you like; 585it does not require any other files in the distribution to run. 586 587 588 Legal Stuff 589 ----------- 590 by Chris Dodd and Vadim Maslov 591 592In English: BtYacc is freeware. BtYacc is distributed with 593no warranty whatsoever. The author and any other contributors 594take no responsibility for any and all consequences of its use. 595 596In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS 597NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE 598LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL 599DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR 600DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY 601THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN 602IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH 603DAMAGES. 604