1-- $Id: README.BTYACC,v 1.2 2014/04/22 08:18:57 Tom.Shields Exp $ 2 3The original README from btyacc is below. 4 5The backtracking enhancements to byacc have been merged into Thomas Dickey's 6byacc baseline. 7 8The %include and %define/%ifdef enhancements described below are not currently 9incorporated. 10 11The position management functionality ("YYPOSN", "yyposn", "YYREDUCEPOSNFUNC", 12"YYREDUCEPOSNFUNCARG" & "YYCALLREDUCEPOSN") is replaced by a bison-compatible 13"%locations" implementation. 14 15The memory management functionality ("YYDELETEVAL" & "YYDELETEPOSN") is 16replaced by a bison-compatible "%destructor" implementation. 17 18The detailed syntax error processing functionality ("YYERROR_DETAILED" 19& "yyerror_detailed()") is subsumed by the bison-compatible "yyerror()" 20implementation, as modified by the %parse-param and %locations directives. 21 22The debugging macro "YYDBPR()" in the parser skeleton is renamed 23"YYSTYPE_TOSTRING()". 24 25------------------------------------------------------------------------------- 26 BTYACC -- backtracking yacc 27 =========================== 28 29BTYACC was created by Chris Dodd using ideas from many 30places and lots of code from the Berkeley Yacc 31distribution, which is a public domain yacc clone put 32together by the good folks at Berkeley. This code is 33distributed with NO WARRANTY and is public domain. 34It is certain to contain bugs, which you should 35report to: chrisd@collins.com. 36 37Vadim Maslov of Siber Systems <vadik@siber.com> 38considerably modified BTYACC to make it suitable 39for production environment. 40 41Several people have suggested bug fixes that 42were incorporated into BtYacc. 43 44See the README.BYACC files for more about 45Berkeley Yacc and other sources of info. 46 47http://www.siber.com/btyacc/ is the current home of BtYacc. 48It is provided courtesy of Siber Systems http://www.siber.com/. 49 50 51 Version 3.0 changes 52 ------------------- 53 by Vadim Maslov 54 55Changes mostly occurred in btyaccpa.ske file that 56contains the parsing shift/reduce/backtrack algorithm. 57 58Version 3.0 innovations focus on: 59- text position computation and propagation, 60- industrial-strength error processing and recovery. 61 62 63** Added mechanism for computing and propagating 64text position of tokens and non-terminals. 65 66Compilers often need to build AST trees such that every node 67in a tree can relate to the parsed program source it came from. 68The following applications are very likely to need this: 69- debuggers that show actual source of the debugged program, 70- source-to-source translators that want 71 unchanged parts of the tree to generate the unchanged code. 72 73The new YYPOSN mechanism added in this version of BtYacc 74helps you in automating the text position computation 75and in assigning the computed text positions to the AST. 76This mechanism is successfully used in commercial 77parsers and source-to-source translators. 78 79In standard Yaccs every token and every non-terminal 80has an YYSTYPE semantic value attached to it. 81In this new version every token and every non-terminal 82also has an YYPOSN text position attached to it. 83YYPOSN is a user-defined type that can be anything and 84that has a meaning of text position attached to 85token or non-terminal. 86 87In addition to semantic value stack BtYacc now maintains 88text position stack. Behavior of the text position stack 89is similar to the behavior of the semantic value stack. 90 91If using text position mechanism, 92you need to define the following: 93 94YYPOSN Preprocessor variable that contains C/C++ type of 95 the text position attached to 96 every token and non-terminal. 97 98yyposn Global variable of type YYPOSN. 99 The lexer must assign text position of 100 the returned token to yyposn, just like it assigns 101 semantic value of the returned token to yylval. 102 103YYREDUCEPOSNFUNC 104 Preprocessor variable that points to function that 105 is called after the grammar rule reduction 106 to reduce text positions located on the stack. 107 108 This function is called by BtYacc to reduce text 109 positions. The function is called immediately after 110 the regular rule reduction occurs. 111 112 The function has the following prototype: 113 void ReducePosn(YYPOSN &ret, 114 YYPOSN *terms, 115 YYSTYPE *term_vals, 116 int term_no, 117 int stk_pos, 118 int yychar, 119 YYPOSN &yyposn, 120 UserType extra); 121 122 The function arguments are: 123 - ret 124 Reference to the text position returned by 125 the rule. The function must write the computed 126 text position returned by the rule to ret. 127 This is analogue of the $$ semantic value. 128 129 - term_posns 130 Array of the right-hand side rule components 131 YYPOSN text positions. These are analogues of 132 $1, $2, ..., $N in the text position world. 133 134 - term_vals 135 Array of the right-hand side (RHS) rule components 136 YYSTYPE values. These are the $1,...,$N themselves. 137 138 - term_no 139 Number of the components in RHS of the reduced rule. 140 Equal to size of arrays term_posns and term_vals. 141 Also equal to N in $1,...,$N in the reduced rule. 142 143 - stk_pos 144 YYSTYPE/YYPOSN stack position before the reduction. 145 146 - yychar 147 Lookahead token that immediately follows 148 the reduced RHS components. 149 150 - yyposn 151 YYPOSN of the token that immediately follows 152 the reduced RHS components. 153 154 - extra 155 User-defined extra argument passed to ReducePosn. 156 157 Typically this function extracts text positions from 158 the right-hand side rule components and either 159 assigns them to the returned $$ structure/tree or 160 if no $$ value is returned, puts them into 161 the ret text position from where 162 it will be picked up by the later reduced rules. 163 164YYREDUCEPOSNFUNCARG 165 Extra user-defined argument passed to 166 the ReducePosn function. This argument can use 167 any variables defined in btyaccpa.ske. 168 169 170** Added code to btyaccpa.ske that automatically cleans up 171semantic semantic values and text positions of tokens 172and non-terminals that are discarded and deleted as 173a result of error processing. 174 175In the previous versions the discarded token and non-terminal 176semantic values were not cleaned that caused quite severe 177leaks. The only way to fix it was to add garbage collection 178to YYSTYPE class. 179 180Now BtYacc skeleton calls delete functions for semantic 181values and positions of the discarded tokens and 182non-terminals. 183 184You need to define the following functions that BtYacc 185calls when it needs to delete semantic value or text position. 186 187YYDELETEVAL 188 User-defined function that is called by BtYacc 189 to delete semantic value of the token or non-terminal. 190 191 The user-defined function must have the prototype: 192 void DeleteYYval(YYSTYPE v, int type); 193 v is semantic value to delete, 194 type is one of the following: 195 0 discarding token 196 1 discarding state 197 2 cleaning up stack when aborting 198 199YYDELETEPOSN 200 User-defined function that is called by BtYacc 201 to delete text position of the token or non-terminal. 202 203 The user-defined function must have the prototype: 204 void DeleteYYposn(YYPOSN p, int type); 205 v is semantic value to delete, 206 type is one of the following: 207 0 discarding token 208 1 discarding state 209 2 cleaning up stack when aborting 210 211 212** User can define "detailed" syntax error processing 213function that reports an *exact* position of 214the token that caused the error. 215 216If you define preprocessor variable YYERROR_DETAILED in 217your grammar then you need define the following 218error processing function: 219 220void yyerror_detailed(char *text, 221 int errt, 222 YYSTYPE &errt_value, 223 YYPOSN &errt_posn); 224 225It receives the following arguments: 226text Error message. 227errt Code of the token that caused the error. 228errt_value Value of the token that caused the error. 229errt_posn Text position of token that caused error. 230 231 232** Dropped compatibility with C. 233 234Compatibility with C became increasingly difficult 235to maintain as new features were added to btyaccpa.ske. 236So we dropped it. If anybody wants to make the new version 237compatible with C, we would gladly accept the changes. 238 239Meanwhile we expect that you use C++ to write grammar 240actions and everything else in grammar files. 241Since C is (in a sense) subset of C++, your C-based 242grammar may work if you use C++ compiler to compile it. 243 244 Version 3.0 bugs fixed 245 ---------------------- 246 247Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug: 248BtYacc does not correctly handle typenames, if one typename 249is a prefix of another one and if this type is used after 250the longer one. In this case BTYacc produces invalid code. 251 252 253 Version 2.1 changes 254 ------------------- 255 by Vadim Maslov 256 257** Added preprocessor statements to BtYacc that are similar 258in function and behavior to C/C++ preprocessor statements. 259 260These statements are used to: 261 262- Introduce modularity into a grammar by breaking it 263 into several *.y files and assembling different 264 grammars from the *.y modules using %include and %ifdef. 265 266- Have several versions of the same grammar 267 by using %ifdef and $endif. 268 269- To include automatically generated grammar fragment. 270 For instance, we use %include to include 271 automatically generated list of tokens. 272 273Preprocessor statements are: 274 275%define <var-name> 276 Define preprocessor variable named <var-name>. 277 278%ifdef <var-name> 279 If preprocessor variable named <var-name> 280 is defined by %define, then process the text from 281 this %ifdef to the closing %endif. 282 283%endif 284 Closing bracket for %ifdef preprocessor statement. 285 Only one nesting level of %ifdef-%endif is allowed. 286 287%include <file-name> 288 Process contents of the file named <file-name>. 289 If <file-name> is a relative name, it is looked up 290 in a directory in which btyacc was started. 291 Only one nesting level of %include is allowed. 292 293 294 Version 2.0 changes 295 ------------------- 296 by Vadim Maslov 297 298 299** Changed 16-bit short numbers to 32-bit int numbers in 300grammar tables, so that huge grammar tables (tables that 301are larger than 32768 elements) resulting from huge 302grammars (Cobol grammar, for instance) can work correctly. 303You need to have 32-bit integer to index table bigger than 30432768 elements, 16-bit integer is not enough. 305 306The original BtYacc just generated non-working tables 307larger than 32768 elements without even notifying about 308the table overflow. 309 310 311** Make error recovery work correctly when error happens 312while processing nested conflicts. Original BtYacc could 313infinitely cycle in certain situations that involved error 314recovery while in nested conflict. 315 316More detailed explanation: when we have nested conflicts 317(conflict that happens while trial-processing another 318conflict), it leads btyacc into NP-complete searching of 319conflict tree. The ultimate goal is YYVALID operator that 320selects a particular branch of that tree as a valid one. 321 322If no YYVALID is found on the tree, then error recovery 323takes over. The problem with this is that error recovery 324is started in the same state context that exists on the 325last surveyed branch of the conflict tree. Sometimes this 326last branch may be of zero length and it results in 327recovering to exactly the same state as existed before 328entering the conflict. BtYacc cycles then. 329 330We solved this problem by memorizing the longest path in 331the conflict tree while browsing it. If we ever get into 332error recovery, we restore state that existed on the 333longest path. Effectively we say: if we have an error, 334let us move forward as far as we possibly could while we 335were browsing the conflict tree. 336 337 338** Introduce YYVALID_NESTED operation in addition to 339simply YYVALID. When we have a nested conflict (conflict 340while processing in trial mode for another conflict), we 341want to relate YYVALID to a particular level of conflict 342being in trial. 343 344Since we mostly anticipate only 2-level nested conflicts 345YYVALID_NESTED tells the parser to satisfy only the 346internal conflict. Therefore, in 1-level conflict 347situation YYVALID_NESTED acts like a regular YYVALID, but 348in 2-level conflict it is a no-op and the other YYVALID 349for outer conflict will be searched for. 350 351 352** Improved handling of situation where /tmp directory is 353missing. Original btyacc just died quietly when /tmp 354directory was missing. We added code that states the 355problem explicitly. While on UNIX /tmp directory is always 356present, it may be missing on WIN32 systems, therefore 357diagnosing this situation is important. 358 359 360 Version 1.0 changes: BackTracking 361 ================================= 362 by Chris Dodd 363 364BTYACC is a modified version of yacc that supports 365automatic backtracking and semantic disambiguation to 366parse ambiguous grammars, as well as syntactic sugar for 367inherited attributes (which tend to introduce conflicts). 368Whenever a btyacc generated parser runs into a 369shift-reduce or reduce-reduce error in the parse table, it 370remembers the current parse point (yacc stack and input 371stream state), and goes into trial parse mode. It then 372continues parsing, ignoring most rule actions. If it runs 373into an error (either through the parse table or through 374an action calling YYERROR), it backtracks to the most 375recent conflict point and tries a different alternative. 376If it finds a successful parse (reaches the end of the 377input or an action calls YYVALID), it backtracks to the 378point where it first entered trial parse mode, and 379continues with a full parse (executing all actions), 380following the path of the successful trial. 381 382Actions in btyacc come in two flavors -- {}-actions, which 383are only executed when not in trial mode, and []-actions 384which are executed regardless of mode. There are also 385inherited attributes, which look like arguments (they are 386enclosed in "()") and act like []-actions. 387 388What this buys you: 389 390* No more lexer feedback hack. In yacc grammars for C, a 391standard hack, know as the "lexer feedback hack" is used 392to find typedef names. The lexer uses semantic 393information to decide if any given identifier is a 394typedef-name or not and returns a special token. With 395btyacc, you no longer need to do this; the lexer should 396just always return an identifier. The btyacc grammar then 397needs a rule of the form: 398 399typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ] 400 401While the hack works adequately well for parsing C, it 402becomes a nightmare when you try to parse something like 403C++, where treating an ID as a typedef becomes heavily 404dependent on context. 405 406* Easy disambiguation via simple ordering. Btyacc runs 407its trials via the rule "try shifting first, then try 408reducing by the order that the conflicting rules appear in 409the input file". This means you can deal with semantic a 410disambiguation rule like: 411 [1] If it looks like a declaration it is, otherwise 412 [2] If it looks like an expression it is, otherwise 413 [3] it is a syntax error 414 [Ellis&Stroustrup, Annotated C++ Reference Manual, p93] 415 416To deal with this, you need only put all the rules for 417declarations before the rules for expressions in the 418grammar file. 419 420* No extra cost if you do not use it. Backtracking is 421only triggered when the parse hits a shift/reduce or 422reduce/reduce conflict in the table. If you have no 423conflicts in your grammar, there is no extra cost, other 424than some extra code which will never be invoked. 425 426* C++ and ANSI C compatible parsers. The parsers produced 427by btyacc can be compiled with C++ correctly. If you 428"#define" YYSTYPE to be some C++ type with constructor and 429destructor, everything will work fine. My favorite is 430"#define YYSTYPE SmartPointer", where SmartPointer is a 431smart pointer type that does garbage collection on the 432pointed to objects. 433 434BTYACC was originally written to make it easy to write a 435C++ parser (my goal was to be able to use the grammar out 436of the back of the ARM with as few modifications as 437possible). Anyone who has ever looked at Jim Roskind 438public domain C++ yacc grammar, or the yacc-based grammar 439used in g++ knows how difficult this is. BTYACC is very 440useful for parsing any ambiguous grammar, particularly 441ones that come from trying to merge two (or more) complete 442grammars. 443 444Limitations of the backtracking: Currently, the generated 445parser does NO pruning of alternate parsing paths. To 446avoid an exponential explosion of possible paths (and 447parsing time), you need to manually tell the parser when 448it can throw away saved paths using YYVALID. In practice, 449this turns out to be fairly easy to do. A C++ parser (for 450example) can just put a [YYVALID;] after every complete 451declaration and statement rule, corresponding to pruning 452the backtracking state after seeing a ';' or '}' -- there 453will never be a situation in which it is useful to 454backtrack past either of these. 455 456Inherited attributes in btyacc: 457 458Inherited attributes look a lot like function arguments to 459non-terminals, which is what they end up being in a 460recursive descent parser, but NOT how they are implemented 461in btyacc. Basically they are just syntactic sugar for 462embedded semantic actions and $0, $-1, ... in normal yacc. 463btyacc gives you two big advantages besides just the 464syntax: 465 1. it does type checking on the inherited attributes, 466 so you do not have to specify $<type>0 and makes sure 467 you give the correct number of arguments (inherited 468 attributes) to every use of a non-terminal. 469 2. It "collapses" identical actions from that are produced 470 from inherited attributes. This eliminates many 471 potential reduce-reduce conflicts arising from 472 the inherited attributes. 473 474You use inherited attributes by declaring the types of the 475attributes in the preamble with a type declaration and 476declaring names of the attributes on the lhs of the yacc 477rule. You can of course have more than one rule with the 478same lhs, and you can even give them different names in 479each, but the type and number must be the same. 480 481Here is a small example: 482 /* lhs takes 2 inherited attributes */ 483%type <t1> lhs(<t1>, <t2>) 484 stuff(<t1>, <t2>) 485%% 486lhs($i1, $i2) : { $$ = $i1 } 487 | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; } 488 489This is roughly equivalent to the following yacc code: 490lhs : 491 { $$ = $<t1>-1; } 492 | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff 493 { $$ = $4; } 494 ; 495 496See the file "test/t2.y" for a longer and more complete 497example. At the current time, the start symbol cannot 498have any arguments. 499 500Variant parsers: 501 502Btyacc supports the -S flag to use a different parser 503skeleton, changing the way that the parser is called and 504used. The skeleton "push.skel" is included to produce a 505"passive" parser that you feed tokens to (rather than 506having the parser call a separate yylex routine). With 507push.skel, yyparse is defined as follows: 508 509int yyparse(int token, YYSTYPE yylval) 510 511You should call yyparse repeatedly with successive tokens 512of input. It returns 0 if more input is needed, 1 for a 513successful parse, and -1 for an unrecoverable parse error. 514 515 516 Miscellaneous Features in ver. 1.0 517 ---------------------------------- 518 by Chris Dodd 519 520 The -r option has been implemented. The -r option tells 521Yacc to put the read-only tables in y.tab.c and the code and 522variables in y.code.c. Keith Bostic asked for this option so 523that :yyfix could be eliminated. 524 525 The -l and -t options have been implemented. The -l 526option tells Yacc not to include #line directives in the code 527it produces. The -t option causes debugging code to be 528included in the compiled parser. 529 530 The code for error recovery has been changed to 531implement the same algorithm as AT&T Yacc. There will still 532be differences in the way error recovery works because AT&T 533Yacc uses more default reductions than Berkeley Yacc. 534 535 The environment variable TMPDIR determines the directory 536where temporary files will be created. If TMPDIR is defined, 537temporary files will be created in the directory whose 538pathname is the value of TMPDIR. By default, temporary files 539are created in /tmp. 540 541 The keywords are now case-insensitive. For example, 542%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are 543all equivalent. 544 545 Commas and semicolons that are not part of C code are 546treated as commentary. 547 548 Line-end comments, as in BCPL, are permitted. Line-end 549comments begin with // and end at the next end-of-line. 550Line-end comments are permitted in C code; they are converted 551to C comments on output. 552 553 The form of y.output files has been changed to look more 554like those produced by AT&T Yacc. 555 556 A new kind of declaration has been added. 557The form of the declaration is 558 559 %ident string 560 561where string is a sequence of characters beginning with a 562double quote and ending with either a double quote or the 563next end-of-line, whichever comes first. The declaration 564will cause a #ident directive to be written near the start 565of the output file. 566 567 If a parser has been compiled with debugging code, that 568code can be enabled by setting an environment variable. 569If the environment variable YYDEBUG is set to 0, debugging 570output is suppressed. If it is set to 1, debugging output 571is written to standard output. 572 573 574 Building BtYacc 575 --------------- 576 by Chris Dodd and Vadim Maslov 577 578We used GCC and GNU make to compile BtYacc both on UNIX and 579WIN32 paltforms. You are welcome to try different 580combinations of makes and compilers. Most likely it will 581work, but it may require Makefile changes. 582 583There is no config script. 584Just type "make" and it should compile. 585 586AWK. If you want to change file btyaccpa.ske (backtracking 587parser skeleton), you will need awk to compile it into 588skeleton.c file. We used GNU AWK (gawk) version 3.0. 589 590It is known that using older versions of gawk 591may create problems in compilation, because older awks 592have problems with backslashes at the end of a line. 593 594For MSDOS, there a "makefile.dos" that should do the trick. 595Note: makefile.dos was not tested for a long time. 596 597The result of compilation should be a single executable called 598"btyacc" which you can install anywhere you like; 599it does not require any other files in the distribution to run. 600 601 602 Legal Stuff 603 ----------- 604 by Chris Dodd and Vadim Maslov 605 606In English: BtYacc is freeware. BtYacc is distributed with 607no warranty whatsoever. The author and any other contributors 608take no responsibility for any and all consequences of its use. 609 610In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS 611NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE 612LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL 613DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR 614DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY 615THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN 616IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH 617DAMAGES. 618