1*497dcf4cSBaptiste Daroussin-- $Id: README.BTYACC,v 1.2 2014/04/22 08:18:57 Tom.Shields Exp $ 20c8de5b0SBaptiste Daroussin 30c8de5b0SBaptiste DaroussinThe original README from btyacc is below. 40c8de5b0SBaptiste Daroussin 50c8de5b0SBaptiste DaroussinThe backtracking enhancements to byacc have been merged into Thomas Dickey's 60c8de5b0SBaptiste Daroussinbyacc baseline. 70c8de5b0SBaptiste Daroussin 80c8de5b0SBaptiste DaroussinThe %include and %define/%ifdef enhancements described below are not currently 90c8de5b0SBaptiste Daroussinincorporated. 100c8de5b0SBaptiste Daroussin 11*497dcf4cSBaptiste DaroussinThe position management functionality ("YYPOSN", "yyposn", "YYREDUCEPOSNFUNC", 12*497dcf4cSBaptiste Daroussin"YYREDUCEPOSNFUNCARG" & "YYCALLREDUCEPOSN") is replaced by a bison-compatible 13*497dcf4cSBaptiste Daroussin"%locations" implementation. 14*497dcf4cSBaptiste Daroussin 15*497dcf4cSBaptiste DaroussinThe memory management functionality ("YYDELETEVAL" & "YYDELETEPOSN") is 16*497dcf4cSBaptiste Daroussinreplaced by a bison-compatible "%destructor" implementation. 17*497dcf4cSBaptiste Daroussin 18*497dcf4cSBaptiste DaroussinThe detailed syntax error processing functionality ("YYERROR_DETAILED" 19*497dcf4cSBaptiste Daroussin& "yyerror_detailed()") is subsumed by the bison-compatible "yyerror()" 20*497dcf4cSBaptiste Daroussinimplementation, as modified by the %parse-param and %locations directives. 21*497dcf4cSBaptiste Daroussin 22*497dcf4cSBaptiste DaroussinThe debugging macro "YYDBPR()" in the parser skeleton is renamed 23*497dcf4cSBaptiste Daroussin"YYSTYPE_TOSTRING()". 24*497dcf4cSBaptiste Daroussin 250c8de5b0SBaptiste Daroussin------------------------------------------------------------------------------- 260c8de5b0SBaptiste Daroussin BTYACC -- backtracking yacc 270c8de5b0SBaptiste Daroussin =========================== 280c8de5b0SBaptiste Daroussin 290c8de5b0SBaptiste DaroussinBTYACC was created by Chris Dodd using ideas from many 300c8de5b0SBaptiste Daroussinplaces and lots of code from the Berkeley Yacc 310c8de5b0SBaptiste Daroussindistribution, which is a public domain yacc clone put 320c8de5b0SBaptiste Daroussintogether by the good folks at Berkeley. This code is 330c8de5b0SBaptiste Daroussindistributed with NO WARRANTY and is public domain. 340c8de5b0SBaptiste DaroussinIt is certain to contain bugs, which you should 350c8de5b0SBaptiste Daroussinreport to: chrisd@collins.com. 360c8de5b0SBaptiste Daroussin 370c8de5b0SBaptiste DaroussinVadim Maslov of Siber Systems <vadik@siber.com> 380c8de5b0SBaptiste Daroussinconsiderably modified BTYACC to make it suitable 390c8de5b0SBaptiste Daroussinfor production environment. 400c8de5b0SBaptiste Daroussin 410c8de5b0SBaptiste DaroussinSeveral people have suggested bug fixes that 420c8de5b0SBaptiste Daroussinwere incorporated into BtYacc. 430c8de5b0SBaptiste Daroussin 440c8de5b0SBaptiste DaroussinSee the README.BYACC files for more about 450c8de5b0SBaptiste DaroussinBerkeley Yacc and other sources of info. 460c8de5b0SBaptiste Daroussin 470c8de5b0SBaptiste Daroussinhttp://www.siber.com/btyacc/ is the current home of BtYacc. 480c8de5b0SBaptiste DaroussinIt is provided courtesy of Siber Systems http://www.siber.com/. 490c8de5b0SBaptiste Daroussin 500c8de5b0SBaptiste Daroussin 510c8de5b0SBaptiste Daroussin Version 3.0 changes 520c8de5b0SBaptiste Daroussin ------------------- 530c8de5b0SBaptiste Daroussin by Vadim Maslov 540c8de5b0SBaptiste Daroussin 550c8de5b0SBaptiste DaroussinChanges mostly occurred in btyaccpa.ske file that 560c8de5b0SBaptiste Daroussincontains the parsing shift/reduce/backtrack algorithm. 570c8de5b0SBaptiste Daroussin 580c8de5b0SBaptiste DaroussinVersion 3.0 innovations focus on: 590c8de5b0SBaptiste Daroussin- text position computation and propagation, 600c8de5b0SBaptiste Daroussin- industrial-strength error processing and recovery. 610c8de5b0SBaptiste Daroussin 620c8de5b0SBaptiste Daroussin 630c8de5b0SBaptiste Daroussin** Added mechanism for computing and propagating 640c8de5b0SBaptiste Daroussintext position of tokens and non-terminals. 650c8de5b0SBaptiste Daroussin 660c8de5b0SBaptiste DaroussinCompilers often need to build AST trees such that every node 670c8de5b0SBaptiste Daroussinin a tree can relate to the parsed program source it came from. 680c8de5b0SBaptiste DaroussinThe following applications are very likely to need this: 690c8de5b0SBaptiste Daroussin- debuggers that show actual source of the debugged program, 700c8de5b0SBaptiste Daroussin- source-to-source translators that want 710c8de5b0SBaptiste Daroussin unchanged parts of the tree to generate the unchanged code. 720c8de5b0SBaptiste Daroussin 730c8de5b0SBaptiste DaroussinThe new YYPOSN mechanism added in this version of BtYacc 740c8de5b0SBaptiste Daroussinhelps you in automating the text position computation 750c8de5b0SBaptiste Daroussinand in assigning the computed text positions to the AST. 760c8de5b0SBaptiste DaroussinThis mechanism is successfully used in commercial 770c8de5b0SBaptiste Daroussinparsers and source-to-source translators. 780c8de5b0SBaptiste Daroussin 790c8de5b0SBaptiste DaroussinIn standard Yaccs every token and every non-terminal 800c8de5b0SBaptiste Daroussinhas an YYSTYPE semantic value attached to it. 810c8de5b0SBaptiste DaroussinIn this new version every token and every non-terminal 820c8de5b0SBaptiste Daroussinalso has an YYPOSN text position attached to it. 830c8de5b0SBaptiste DaroussinYYPOSN is a user-defined type that can be anything and 840c8de5b0SBaptiste Daroussinthat has a meaning of text position attached to 850c8de5b0SBaptiste Daroussintoken or non-terminal. 860c8de5b0SBaptiste Daroussin 870c8de5b0SBaptiste DaroussinIn addition to semantic value stack BtYacc now maintains 880c8de5b0SBaptiste Daroussintext position stack. Behavior of the text position stack 890c8de5b0SBaptiste Daroussinis similar to the behavior of the semantic value stack. 900c8de5b0SBaptiste Daroussin 910c8de5b0SBaptiste DaroussinIf using text position mechanism, 920c8de5b0SBaptiste Daroussinyou need to define the following: 930c8de5b0SBaptiste Daroussin 940c8de5b0SBaptiste DaroussinYYPOSN Preprocessor variable that contains C/C++ type of 950c8de5b0SBaptiste Daroussin the text position attached to 960c8de5b0SBaptiste Daroussin every token and non-terminal. 970c8de5b0SBaptiste Daroussin 980c8de5b0SBaptiste Daroussinyyposn Global variable of type YYPOSN. 990c8de5b0SBaptiste Daroussin The lexer must assign text position of 1000c8de5b0SBaptiste Daroussin the returned token to yyposn, just like it assigns 1010c8de5b0SBaptiste Daroussin semantic value of the returned token to yylval. 1020c8de5b0SBaptiste Daroussin 1030c8de5b0SBaptiste DaroussinYYREDUCEPOSNFUNC 1040c8de5b0SBaptiste Daroussin Preprocessor variable that points to function that 1050c8de5b0SBaptiste Daroussin is called after the grammar rule reduction 1060c8de5b0SBaptiste Daroussin to reduce text positions located on the stack. 1070c8de5b0SBaptiste Daroussin 1080c8de5b0SBaptiste Daroussin This function is called by BtYacc to reduce text 1090c8de5b0SBaptiste Daroussin positions. The function is called immediately after 1100c8de5b0SBaptiste Daroussin the regular rule reduction occurs. 1110c8de5b0SBaptiste Daroussin 1120c8de5b0SBaptiste Daroussin The function has the following prototype: 1130c8de5b0SBaptiste Daroussin void ReducePosn(YYPOSN &ret, 1140c8de5b0SBaptiste Daroussin YYPOSN *terms, 1150c8de5b0SBaptiste Daroussin YYSTYPE *term_vals, 1160c8de5b0SBaptiste Daroussin int term_no, 1170c8de5b0SBaptiste Daroussin int stk_pos, 1180c8de5b0SBaptiste Daroussin int yychar, 1190c8de5b0SBaptiste Daroussin YYPOSN &yyposn, 1200c8de5b0SBaptiste Daroussin UserType extra); 1210c8de5b0SBaptiste Daroussin 1220c8de5b0SBaptiste Daroussin The function arguments are: 1230c8de5b0SBaptiste Daroussin - ret 1240c8de5b0SBaptiste Daroussin Reference to the text position returned by 1250c8de5b0SBaptiste Daroussin the rule. The function must write the computed 1260c8de5b0SBaptiste Daroussin text position returned by the rule to ret. 1270c8de5b0SBaptiste Daroussin This is analogue of the $$ semantic value. 1280c8de5b0SBaptiste Daroussin 1290c8de5b0SBaptiste Daroussin - term_posns 1300c8de5b0SBaptiste Daroussin Array of the right-hand side rule components 1310c8de5b0SBaptiste Daroussin YYPOSN text positions. These are analogues of 1320c8de5b0SBaptiste Daroussin $1, $2, ..., $N in the text position world. 1330c8de5b0SBaptiste Daroussin 1340c8de5b0SBaptiste Daroussin - term_vals 1350c8de5b0SBaptiste Daroussin Array of the right-hand side (RHS) rule components 1360c8de5b0SBaptiste Daroussin YYSTYPE values. These are the $1,...,$N themselves. 1370c8de5b0SBaptiste Daroussin 1380c8de5b0SBaptiste Daroussin - term_no 1390c8de5b0SBaptiste Daroussin Number of the components in RHS of the reduced rule. 1400c8de5b0SBaptiste Daroussin Equal to size of arrays term_posns and term_vals. 1410c8de5b0SBaptiste Daroussin Also equal to N in $1,...,$N in the reduced rule. 1420c8de5b0SBaptiste Daroussin 1430c8de5b0SBaptiste Daroussin - stk_pos 1440c8de5b0SBaptiste Daroussin YYSTYPE/YYPOSN stack position before the reduction. 1450c8de5b0SBaptiste Daroussin 1460c8de5b0SBaptiste Daroussin - yychar 1470c8de5b0SBaptiste Daroussin Lookahead token that immediately follows 1480c8de5b0SBaptiste Daroussin the reduced RHS components. 1490c8de5b0SBaptiste Daroussin 1500c8de5b0SBaptiste Daroussin - yyposn 1510c8de5b0SBaptiste Daroussin YYPOSN of the token that immediately follows 1520c8de5b0SBaptiste Daroussin the reduced RHS components. 1530c8de5b0SBaptiste Daroussin 1540c8de5b0SBaptiste Daroussin - extra 1550c8de5b0SBaptiste Daroussin User-defined extra argument passed to ReducePosn. 1560c8de5b0SBaptiste Daroussin 1570c8de5b0SBaptiste Daroussin Typically this function extracts text positions from 1580c8de5b0SBaptiste Daroussin the right-hand side rule components and either 1590c8de5b0SBaptiste Daroussin assigns them to the returned $$ structure/tree or 1600c8de5b0SBaptiste Daroussin if no $$ value is returned, puts them into 1610c8de5b0SBaptiste Daroussin the ret text position from where 1620c8de5b0SBaptiste Daroussin it will be picked up by the later reduced rules. 1630c8de5b0SBaptiste Daroussin 1640c8de5b0SBaptiste DaroussinYYREDUCEPOSNFUNCARG 1650c8de5b0SBaptiste Daroussin Extra user-defined argument passed to 1660c8de5b0SBaptiste Daroussin the ReducePosn function. This argument can use 1670c8de5b0SBaptiste Daroussin any variables defined in btyaccpa.ske. 1680c8de5b0SBaptiste Daroussin 1690c8de5b0SBaptiste Daroussin 1700c8de5b0SBaptiste Daroussin** Added code to btyaccpa.ske that automatically cleans up 1710c8de5b0SBaptiste Daroussinsemantic semantic values and text positions of tokens 1720c8de5b0SBaptiste Daroussinand non-terminals that are discarded and deleted as 1730c8de5b0SBaptiste Daroussina result of error processing. 1740c8de5b0SBaptiste Daroussin 1750c8de5b0SBaptiste DaroussinIn the previous versions the discarded token and non-terminal 1760c8de5b0SBaptiste Daroussinsemantic values were not cleaned that caused quite severe 1770c8de5b0SBaptiste Daroussinleaks. The only way to fix it was to add garbage collection 1780c8de5b0SBaptiste Daroussinto YYSTYPE class. 1790c8de5b0SBaptiste Daroussin 1800c8de5b0SBaptiste DaroussinNow BtYacc skeleton calls delete functions for semantic 1810c8de5b0SBaptiste Daroussinvalues and positions of the discarded tokens and 1820c8de5b0SBaptiste Daroussinnon-terminals. 1830c8de5b0SBaptiste Daroussin 1840c8de5b0SBaptiste DaroussinYou need to define the following functions that BtYacc 1850c8de5b0SBaptiste Daroussincalls when it needs to delete semantic value or text position. 1860c8de5b0SBaptiste Daroussin 1870c8de5b0SBaptiste DaroussinYYDELETEVAL 1880c8de5b0SBaptiste Daroussin User-defined function that is called by BtYacc 1890c8de5b0SBaptiste Daroussin to delete semantic value of the token or non-terminal. 1900c8de5b0SBaptiste Daroussin 1910c8de5b0SBaptiste Daroussin The user-defined function must have the prototype: 1920c8de5b0SBaptiste Daroussin void DeleteYYval(YYSTYPE v, int type); 1930c8de5b0SBaptiste Daroussin v is semantic value to delete, 1940c8de5b0SBaptiste Daroussin type is one of the following: 1950c8de5b0SBaptiste Daroussin 0 discarding token 1960c8de5b0SBaptiste Daroussin 1 discarding state 1970c8de5b0SBaptiste Daroussin 2 cleaning up stack when aborting 1980c8de5b0SBaptiste Daroussin 1990c8de5b0SBaptiste DaroussinYYDELETEPOSN 2000c8de5b0SBaptiste Daroussin User-defined function that is called by BtYacc 2010c8de5b0SBaptiste Daroussin to delete text position of the token or non-terminal. 2020c8de5b0SBaptiste Daroussin 2030c8de5b0SBaptiste Daroussin The user-defined function must have the prototype: 2040c8de5b0SBaptiste Daroussin void DeleteYYposn(YYPOSN p, int type); 2050c8de5b0SBaptiste Daroussin v is semantic value to delete, 2060c8de5b0SBaptiste Daroussin type is one of the following: 2070c8de5b0SBaptiste Daroussin 0 discarding token 2080c8de5b0SBaptiste Daroussin 1 discarding state 2090c8de5b0SBaptiste Daroussin 2 cleaning up stack when aborting 2100c8de5b0SBaptiste Daroussin 2110c8de5b0SBaptiste Daroussin 2120c8de5b0SBaptiste Daroussin** User can define "detailed" syntax error processing 2130c8de5b0SBaptiste Daroussinfunction that reports an *exact* position of 2140c8de5b0SBaptiste Daroussinthe token that caused the error. 2150c8de5b0SBaptiste Daroussin 2160c8de5b0SBaptiste DaroussinIf you define preprocessor variable YYERROR_DETAILED in 2170c8de5b0SBaptiste Daroussinyour grammar then you need define the following 2180c8de5b0SBaptiste Daroussinerror processing function: 2190c8de5b0SBaptiste Daroussin 2200c8de5b0SBaptiste Daroussinvoid yyerror_detailed(char *text, 2210c8de5b0SBaptiste Daroussin int errt, 2220c8de5b0SBaptiste Daroussin YYSTYPE &errt_value, 2230c8de5b0SBaptiste Daroussin YYPOSN &errt_posn); 2240c8de5b0SBaptiste Daroussin 2250c8de5b0SBaptiste DaroussinIt receives the following arguments: 2260c8de5b0SBaptiste Daroussintext Error message. 2270c8de5b0SBaptiste Daroussinerrt Code of the token that caused the error. 2280c8de5b0SBaptiste Daroussinerrt_value Value of the token that caused the error. 2290c8de5b0SBaptiste Daroussinerrt_posn Text position of token that caused error. 2300c8de5b0SBaptiste Daroussin 2310c8de5b0SBaptiste Daroussin 2320c8de5b0SBaptiste Daroussin** Dropped compatibility with C. 2330c8de5b0SBaptiste Daroussin 2340c8de5b0SBaptiste DaroussinCompatibility with C became increasingly difficult 2350c8de5b0SBaptiste Daroussinto maintain as new features were added to btyaccpa.ske. 2360c8de5b0SBaptiste DaroussinSo we dropped it. If anybody wants to make the new version 2370c8de5b0SBaptiste Daroussincompatible with C, we would gladly accept the changes. 2380c8de5b0SBaptiste Daroussin 2390c8de5b0SBaptiste DaroussinMeanwhile we expect that you use C++ to write grammar 2400c8de5b0SBaptiste Daroussinactions and everything else in grammar files. 2410c8de5b0SBaptiste DaroussinSince C is (in a sense) subset of C++, your C-based 2420c8de5b0SBaptiste Daroussingrammar may work if you use C++ compiler to compile it. 2430c8de5b0SBaptiste Daroussin 2440c8de5b0SBaptiste Daroussin Version 3.0 bugs fixed 2450c8de5b0SBaptiste Daroussin ---------------------- 2460c8de5b0SBaptiste Daroussin 2470c8de5b0SBaptiste DaroussinMatthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug: 2480c8de5b0SBaptiste DaroussinBtYacc does not correctly handle typenames, if one typename 2490c8de5b0SBaptiste Daroussinis a prefix of another one and if this type is used after 2500c8de5b0SBaptiste Daroussinthe longer one. In this case BTYacc produces invalid code. 2510c8de5b0SBaptiste Daroussin 2520c8de5b0SBaptiste Daroussin 2530c8de5b0SBaptiste Daroussin Version 2.1 changes 2540c8de5b0SBaptiste Daroussin ------------------- 2550c8de5b0SBaptiste Daroussin by Vadim Maslov 2560c8de5b0SBaptiste Daroussin 2570c8de5b0SBaptiste Daroussin** Added preprocessor statements to BtYacc that are similar 2580c8de5b0SBaptiste Daroussinin function and behavior to C/C++ preprocessor statements. 2590c8de5b0SBaptiste Daroussin 2600c8de5b0SBaptiste DaroussinThese statements are used to: 2610c8de5b0SBaptiste Daroussin 2620c8de5b0SBaptiste Daroussin- Introduce modularity into a grammar by breaking it 2630c8de5b0SBaptiste Daroussin into several *.y files and assembling different 2640c8de5b0SBaptiste Daroussin grammars from the *.y modules using %include and %ifdef. 2650c8de5b0SBaptiste Daroussin 2660c8de5b0SBaptiste Daroussin- Have several versions of the same grammar 2670c8de5b0SBaptiste Daroussin by using %ifdef and $endif. 2680c8de5b0SBaptiste Daroussin 2690c8de5b0SBaptiste Daroussin- To include automatically generated grammar fragment. 2700c8de5b0SBaptiste Daroussin For instance, we use %include to include 2710c8de5b0SBaptiste Daroussin automatically generated list of tokens. 2720c8de5b0SBaptiste Daroussin 2730c8de5b0SBaptiste DaroussinPreprocessor statements are: 2740c8de5b0SBaptiste Daroussin 2750c8de5b0SBaptiste Daroussin%define <var-name> 2760c8de5b0SBaptiste Daroussin Define preprocessor variable named <var-name>. 2770c8de5b0SBaptiste Daroussin 2780c8de5b0SBaptiste Daroussin%ifdef <var-name> 2790c8de5b0SBaptiste Daroussin If preprocessor variable named <var-name> 2800c8de5b0SBaptiste Daroussin is defined by %define, then process the text from 2810c8de5b0SBaptiste Daroussin this %ifdef to the closing %endif. 2820c8de5b0SBaptiste Daroussin 2830c8de5b0SBaptiste Daroussin%endif 2840c8de5b0SBaptiste Daroussin Closing bracket for %ifdef preprocessor statement. 2850c8de5b0SBaptiste Daroussin Only one nesting level of %ifdef-%endif is allowed. 2860c8de5b0SBaptiste Daroussin 2870c8de5b0SBaptiste Daroussin%include <file-name> 2880c8de5b0SBaptiste Daroussin Process contents of the file named <file-name>. 2890c8de5b0SBaptiste Daroussin If <file-name> is a relative name, it is looked up 2900c8de5b0SBaptiste Daroussin in a directory in which btyacc was started. 2910c8de5b0SBaptiste Daroussin Only one nesting level of %include is allowed. 2920c8de5b0SBaptiste Daroussin 2930c8de5b0SBaptiste Daroussin 2940c8de5b0SBaptiste Daroussin Version 2.0 changes 2950c8de5b0SBaptiste Daroussin ------------------- 2960c8de5b0SBaptiste Daroussin by Vadim Maslov 2970c8de5b0SBaptiste Daroussin 2980c8de5b0SBaptiste Daroussin 2990c8de5b0SBaptiste Daroussin** Changed 16-bit short numbers to 32-bit int numbers in 3000c8de5b0SBaptiste Daroussingrammar tables, so that huge grammar tables (tables that 3010c8de5b0SBaptiste Daroussinare larger than 32768 elements) resulting from huge 3020c8de5b0SBaptiste Daroussingrammars (Cobol grammar, for instance) can work correctly. 3030c8de5b0SBaptiste DaroussinYou need to have 32-bit integer to index table bigger than 3040c8de5b0SBaptiste Daroussin32768 elements, 16-bit integer is not enough. 3050c8de5b0SBaptiste Daroussin 3060c8de5b0SBaptiste DaroussinThe original BtYacc just generated non-working tables 3070c8de5b0SBaptiste Daroussinlarger than 32768 elements without even notifying about 3080c8de5b0SBaptiste Daroussinthe table overflow. 3090c8de5b0SBaptiste Daroussin 3100c8de5b0SBaptiste Daroussin 3110c8de5b0SBaptiste Daroussin** Make error recovery work correctly when error happens 3120c8de5b0SBaptiste Daroussinwhile processing nested conflicts. Original BtYacc could 3130c8de5b0SBaptiste Daroussininfinitely cycle in certain situations that involved error 3140c8de5b0SBaptiste Daroussinrecovery while in nested conflict. 3150c8de5b0SBaptiste Daroussin 3160c8de5b0SBaptiste DaroussinMore detailed explanation: when we have nested conflicts 3170c8de5b0SBaptiste Daroussin(conflict that happens while trial-processing another 3180c8de5b0SBaptiste Daroussinconflict), it leads btyacc into NP-complete searching of 3190c8de5b0SBaptiste Daroussinconflict tree. The ultimate goal is YYVALID operator that 3200c8de5b0SBaptiste Daroussinselects a particular branch of that tree as a valid one. 3210c8de5b0SBaptiste Daroussin 3220c8de5b0SBaptiste DaroussinIf no YYVALID is found on the tree, then error recovery 3230c8de5b0SBaptiste Daroussintakes over. The problem with this is that error recovery 3240c8de5b0SBaptiste Daroussinis started in the same state context that exists on the 3250c8de5b0SBaptiste Daroussinlast surveyed branch of the conflict tree. Sometimes this 3260c8de5b0SBaptiste Daroussinlast branch may be of zero length and it results in 3270c8de5b0SBaptiste Daroussinrecovering to exactly the same state as existed before 3280c8de5b0SBaptiste Daroussinentering the conflict. BtYacc cycles then. 3290c8de5b0SBaptiste Daroussin 3300c8de5b0SBaptiste DaroussinWe solved this problem by memorizing the longest path in 3310c8de5b0SBaptiste Daroussinthe conflict tree while browsing it. If we ever get into 3320c8de5b0SBaptiste Daroussinerror recovery, we restore state that existed on the 3330c8de5b0SBaptiste Daroussinlongest path. Effectively we say: if we have an error, 3340c8de5b0SBaptiste Daroussinlet us move forward as far as we possibly could while we 3350c8de5b0SBaptiste Daroussinwere browsing the conflict tree. 3360c8de5b0SBaptiste Daroussin 3370c8de5b0SBaptiste Daroussin 3380c8de5b0SBaptiste Daroussin** Introduce YYVALID_NESTED operation in addition to 3390c8de5b0SBaptiste Daroussinsimply YYVALID. When we have a nested conflict (conflict 3400c8de5b0SBaptiste Daroussinwhile processing in trial mode for another conflict), we 3410c8de5b0SBaptiste Daroussinwant to relate YYVALID to a particular level of conflict 3420c8de5b0SBaptiste Daroussinbeing in trial. 3430c8de5b0SBaptiste Daroussin 3440c8de5b0SBaptiste DaroussinSince we mostly anticipate only 2-level nested conflicts 3450c8de5b0SBaptiste DaroussinYYVALID_NESTED tells the parser to satisfy only the 3460c8de5b0SBaptiste Daroussininternal conflict. Therefore, in 1-level conflict 3470c8de5b0SBaptiste Daroussinsituation YYVALID_NESTED acts like a regular YYVALID, but 3480c8de5b0SBaptiste Daroussinin 2-level conflict it is a no-op and the other YYVALID 3490c8de5b0SBaptiste Daroussinfor outer conflict will be searched for. 3500c8de5b0SBaptiste Daroussin 3510c8de5b0SBaptiste Daroussin 3520c8de5b0SBaptiste Daroussin** Improved handling of situation where /tmp directory is 3530c8de5b0SBaptiste Daroussinmissing. Original btyacc just died quietly when /tmp 3540c8de5b0SBaptiste Daroussindirectory was missing. We added code that states the 3550c8de5b0SBaptiste Daroussinproblem explicitly. While on UNIX /tmp directory is always 3560c8de5b0SBaptiste Daroussinpresent, it may be missing on WIN32 systems, therefore 3570c8de5b0SBaptiste Daroussindiagnosing this situation is important. 3580c8de5b0SBaptiste Daroussin 3590c8de5b0SBaptiste Daroussin 3600c8de5b0SBaptiste Daroussin Version 1.0 changes: BackTracking 3610c8de5b0SBaptiste Daroussin ================================= 3620c8de5b0SBaptiste Daroussin by Chris Dodd 3630c8de5b0SBaptiste Daroussin 3640c8de5b0SBaptiste DaroussinBTYACC is a modified version of yacc that supports 3650c8de5b0SBaptiste Daroussinautomatic backtracking and semantic disambiguation to 3660c8de5b0SBaptiste Daroussinparse ambiguous grammars, as well as syntactic sugar for 3670c8de5b0SBaptiste Daroussininherited attributes (which tend to introduce conflicts). 3680c8de5b0SBaptiste DaroussinWhenever a btyacc generated parser runs into a 3690c8de5b0SBaptiste Daroussinshift-reduce or reduce-reduce error in the parse table, it 3700c8de5b0SBaptiste Daroussinremembers the current parse point (yacc stack and input 3710c8de5b0SBaptiste Daroussinstream state), and goes into trial parse mode. It then 3720c8de5b0SBaptiste Daroussincontinues parsing, ignoring most rule actions. If it runs 3730c8de5b0SBaptiste Daroussininto an error (either through the parse table or through 3740c8de5b0SBaptiste Daroussinan action calling YYERROR), it backtracks to the most 3750c8de5b0SBaptiste Daroussinrecent conflict point and tries a different alternative. 3760c8de5b0SBaptiste DaroussinIf it finds a successful parse (reaches the end of the 3770c8de5b0SBaptiste Daroussininput or an action calls YYVALID), it backtracks to the 3780c8de5b0SBaptiste Daroussinpoint where it first entered trial parse mode, and 3790c8de5b0SBaptiste Daroussincontinues with a full parse (executing all actions), 3800c8de5b0SBaptiste Daroussinfollowing the path of the successful trial. 3810c8de5b0SBaptiste Daroussin 3820c8de5b0SBaptiste DaroussinActions in btyacc come in two flavors -- {}-actions, which 3830c8de5b0SBaptiste Daroussinare only executed when not in trial mode, and []-actions 3840c8de5b0SBaptiste Daroussinwhich are executed regardless of mode. There are also 3850c8de5b0SBaptiste Daroussininherited attributes, which look like arguments (they are 3860c8de5b0SBaptiste Daroussinenclosed in "()") and act like []-actions. 3870c8de5b0SBaptiste Daroussin 3880c8de5b0SBaptiste DaroussinWhat this buys you: 3890c8de5b0SBaptiste Daroussin 3900c8de5b0SBaptiste Daroussin* No more lexer feedback hack. In yacc grammars for C, a 3910c8de5b0SBaptiste Daroussinstandard hack, know as the "lexer feedback hack" is used 3920c8de5b0SBaptiste Daroussinto find typedef names. The lexer uses semantic 3930c8de5b0SBaptiste Daroussininformation to decide if any given identifier is a 3940c8de5b0SBaptiste Daroussintypedef-name or not and returns a special token. With 3950c8de5b0SBaptiste Daroussinbtyacc, you no longer need to do this; the lexer should 3960c8de5b0SBaptiste Daroussinjust always return an identifier. The btyacc grammar then 3970c8de5b0SBaptiste Daroussinneeds a rule of the form: 3980c8de5b0SBaptiste Daroussin 3990c8de5b0SBaptiste Daroussintypename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ] 4000c8de5b0SBaptiste Daroussin 4010c8de5b0SBaptiste DaroussinWhile the hack works adequately well for parsing C, it 4020c8de5b0SBaptiste Daroussinbecomes a nightmare when you try to parse something like 4030c8de5b0SBaptiste DaroussinC++, where treating an ID as a typedef becomes heavily 4040c8de5b0SBaptiste Daroussindependent on context. 4050c8de5b0SBaptiste Daroussin 4060c8de5b0SBaptiste Daroussin* Easy disambiguation via simple ordering. Btyacc runs 4070c8de5b0SBaptiste Daroussinits trials via the rule "try shifting first, then try 4080c8de5b0SBaptiste Daroussinreducing by the order that the conflicting rules appear in 4090c8de5b0SBaptiste Daroussinthe input file". This means you can deal with semantic a 4100c8de5b0SBaptiste Daroussindisambiguation rule like: 4110c8de5b0SBaptiste Daroussin [1] If it looks like a declaration it is, otherwise 4120c8de5b0SBaptiste Daroussin [2] If it looks like an expression it is, otherwise 4130c8de5b0SBaptiste Daroussin [3] it is a syntax error 4140c8de5b0SBaptiste Daroussin [Ellis&Stroustrup, Annotated C++ Reference Manual, p93] 4150c8de5b0SBaptiste Daroussin 4160c8de5b0SBaptiste DaroussinTo deal with this, you need only put all the rules for 4170c8de5b0SBaptiste Daroussindeclarations before the rules for expressions in the 4180c8de5b0SBaptiste Daroussingrammar file. 4190c8de5b0SBaptiste Daroussin 4200c8de5b0SBaptiste Daroussin* No extra cost if you do not use it. Backtracking is 4210c8de5b0SBaptiste Daroussinonly triggered when the parse hits a shift/reduce or 4220c8de5b0SBaptiste Daroussinreduce/reduce conflict in the table. If you have no 4230c8de5b0SBaptiste Daroussinconflicts in your grammar, there is no extra cost, other 4240c8de5b0SBaptiste Daroussinthan some extra code which will never be invoked. 4250c8de5b0SBaptiste Daroussin 4260c8de5b0SBaptiste Daroussin* C++ and ANSI C compatible parsers. The parsers produced 4270c8de5b0SBaptiste Daroussinby btyacc can be compiled with C++ correctly. If you 4280c8de5b0SBaptiste Daroussin"#define" YYSTYPE to be some C++ type with constructor and 4290c8de5b0SBaptiste Daroussindestructor, everything will work fine. My favorite is 4300c8de5b0SBaptiste Daroussin"#define YYSTYPE SmartPointer", where SmartPointer is a 4310c8de5b0SBaptiste Daroussinsmart pointer type that does garbage collection on the 4320c8de5b0SBaptiste Daroussinpointed to objects. 4330c8de5b0SBaptiste Daroussin 4340c8de5b0SBaptiste DaroussinBTYACC was originally written to make it easy to write a 4350c8de5b0SBaptiste DaroussinC++ parser (my goal was to be able to use the grammar out 4360c8de5b0SBaptiste Daroussinof the back of the ARM with as few modifications as 4370c8de5b0SBaptiste Daroussinpossible). Anyone who has ever looked at Jim Roskind 4380c8de5b0SBaptiste Daroussinpublic domain C++ yacc grammar, or the yacc-based grammar 4390c8de5b0SBaptiste Daroussinused in g++ knows how difficult this is. BTYACC is very 4400c8de5b0SBaptiste Daroussinuseful for parsing any ambiguous grammar, particularly 4410c8de5b0SBaptiste Daroussinones that come from trying to merge two (or more) complete 4420c8de5b0SBaptiste Daroussingrammars. 4430c8de5b0SBaptiste Daroussin 4440c8de5b0SBaptiste DaroussinLimitations of the backtracking: Currently, the generated 4450c8de5b0SBaptiste Daroussinparser does NO pruning of alternate parsing paths. To 4460c8de5b0SBaptiste Daroussinavoid an exponential explosion of possible paths (and 4470c8de5b0SBaptiste Daroussinparsing time), you need to manually tell the parser when 4480c8de5b0SBaptiste Daroussinit can throw away saved paths using YYVALID. In practice, 4490c8de5b0SBaptiste Daroussinthis turns out to be fairly easy to do. A C++ parser (for 4500c8de5b0SBaptiste Daroussinexample) can just put a [YYVALID;] after every complete 4510c8de5b0SBaptiste Daroussindeclaration and statement rule, corresponding to pruning 4520c8de5b0SBaptiste Daroussinthe backtracking state after seeing a ';' or '}' -- there 4530c8de5b0SBaptiste Daroussinwill never be a situation in which it is useful to 4540c8de5b0SBaptiste Daroussinbacktrack past either of these. 4550c8de5b0SBaptiste Daroussin 4560c8de5b0SBaptiste DaroussinInherited attributes in btyacc: 4570c8de5b0SBaptiste Daroussin 4580c8de5b0SBaptiste DaroussinInherited attributes look a lot like function arguments to 4590c8de5b0SBaptiste Daroussinnon-terminals, which is what they end up being in a 4600c8de5b0SBaptiste Daroussinrecursive descent parser, but NOT how they are implemented 4610c8de5b0SBaptiste Daroussinin btyacc. Basically they are just syntactic sugar for 4620c8de5b0SBaptiste Daroussinembedded semantic actions and $0, $-1, ... in normal yacc. 4630c8de5b0SBaptiste Daroussinbtyacc gives you two big advantages besides just the 4640c8de5b0SBaptiste Daroussinsyntax: 4650c8de5b0SBaptiste Daroussin 1. it does type checking on the inherited attributes, 4660c8de5b0SBaptiste Daroussin so you do not have to specify $<type>0 and makes sure 4670c8de5b0SBaptiste Daroussin you give the correct number of arguments (inherited 4680c8de5b0SBaptiste Daroussin attributes) to every use of a non-terminal. 4690c8de5b0SBaptiste Daroussin 2. It "collapses" identical actions from that are produced 4700c8de5b0SBaptiste Daroussin from inherited attributes. This eliminates many 4710c8de5b0SBaptiste Daroussin potential reduce-reduce conflicts arising from 4720c8de5b0SBaptiste Daroussin the inherited attributes. 4730c8de5b0SBaptiste Daroussin 4740c8de5b0SBaptiste DaroussinYou use inherited attributes by declaring the types of the 4750c8de5b0SBaptiste Daroussinattributes in the preamble with a type declaration and 4760c8de5b0SBaptiste Daroussindeclaring names of the attributes on the lhs of the yacc 4770c8de5b0SBaptiste Daroussinrule. You can of course have more than one rule with the 4780c8de5b0SBaptiste Daroussinsame lhs, and you can even give them different names in 4790c8de5b0SBaptiste Daroussineach, but the type and number must be the same. 4800c8de5b0SBaptiste Daroussin 4810c8de5b0SBaptiste DaroussinHere is a small example: 4820c8de5b0SBaptiste Daroussin /* lhs takes 2 inherited attributes */ 4830c8de5b0SBaptiste Daroussin%type <t1> lhs(<t1>, <t2>) 4840c8de5b0SBaptiste Daroussin stuff(<t1>, <t2>) 4850c8de5b0SBaptiste Daroussin%% 4860c8de5b0SBaptiste Daroussinlhs($i1, $i2) : { $$ = $i1 } 4870c8de5b0SBaptiste Daroussin | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; } 4880c8de5b0SBaptiste Daroussin 4890c8de5b0SBaptiste DaroussinThis is roughly equivalent to the following yacc code: 4900c8de5b0SBaptiste Daroussinlhs : 4910c8de5b0SBaptiste Daroussin { $$ = $<t1>-1; } 4920c8de5b0SBaptiste Daroussin | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff 4930c8de5b0SBaptiste Daroussin { $$ = $4; } 4940c8de5b0SBaptiste Daroussin ; 4950c8de5b0SBaptiste Daroussin 4960c8de5b0SBaptiste DaroussinSee the file "test/t2.y" for a longer and more complete 4970c8de5b0SBaptiste Daroussinexample. At the current time, the start symbol cannot 4980c8de5b0SBaptiste Daroussinhave any arguments. 4990c8de5b0SBaptiste Daroussin 5000c8de5b0SBaptiste DaroussinVariant parsers: 5010c8de5b0SBaptiste Daroussin 5020c8de5b0SBaptiste DaroussinBtyacc supports the -S flag to use a different parser 5030c8de5b0SBaptiste Daroussinskeleton, changing the way that the parser is called and 5040c8de5b0SBaptiste Daroussinused. The skeleton "push.skel" is included to produce a 5050c8de5b0SBaptiste Daroussin"passive" parser that you feed tokens to (rather than 5060c8de5b0SBaptiste Daroussinhaving the parser call a separate yylex routine). With 5070c8de5b0SBaptiste Daroussinpush.skel, yyparse is defined as follows: 5080c8de5b0SBaptiste Daroussin 5090c8de5b0SBaptiste Daroussinint yyparse(int token, YYSTYPE yylval) 5100c8de5b0SBaptiste Daroussin 5110c8de5b0SBaptiste DaroussinYou should call yyparse repeatedly with successive tokens 5120c8de5b0SBaptiste Daroussinof input. It returns 0 if more input is needed, 1 for a 5130c8de5b0SBaptiste Daroussinsuccessful parse, and -1 for an unrecoverable parse error. 5140c8de5b0SBaptiste Daroussin 5150c8de5b0SBaptiste Daroussin 5160c8de5b0SBaptiste Daroussin Miscellaneous Features in ver. 1.0 5170c8de5b0SBaptiste Daroussin ---------------------------------- 5180c8de5b0SBaptiste Daroussin by Chris Dodd 5190c8de5b0SBaptiste Daroussin 5200c8de5b0SBaptiste Daroussin The -r option has been implemented. The -r option tells 5210c8de5b0SBaptiste DaroussinYacc to put the read-only tables in y.tab.c and the code and 5220c8de5b0SBaptiste Daroussinvariables in y.code.c. Keith Bostic asked for this option so 5230c8de5b0SBaptiste Daroussinthat :yyfix could be eliminated. 5240c8de5b0SBaptiste Daroussin 5250c8de5b0SBaptiste Daroussin The -l and -t options have been implemented. The -l 5260c8de5b0SBaptiste Daroussinoption tells Yacc not to include #line directives in the code 5270c8de5b0SBaptiste Daroussinit produces. The -t option causes debugging code to be 5280c8de5b0SBaptiste Daroussinincluded in the compiled parser. 5290c8de5b0SBaptiste Daroussin 5300c8de5b0SBaptiste Daroussin The code for error recovery has been changed to 5310c8de5b0SBaptiste Daroussinimplement the same algorithm as AT&T Yacc. There will still 5320c8de5b0SBaptiste Daroussinbe differences in the way error recovery works because AT&T 5330c8de5b0SBaptiste DaroussinYacc uses more default reductions than Berkeley Yacc. 5340c8de5b0SBaptiste Daroussin 5350c8de5b0SBaptiste Daroussin The environment variable TMPDIR determines the directory 5360c8de5b0SBaptiste Daroussinwhere temporary files will be created. If TMPDIR is defined, 5370c8de5b0SBaptiste Daroussintemporary files will be created in the directory whose 5380c8de5b0SBaptiste Daroussinpathname is the value of TMPDIR. By default, temporary files 5390c8de5b0SBaptiste Daroussinare created in /tmp. 5400c8de5b0SBaptiste Daroussin 5410c8de5b0SBaptiste Daroussin The keywords are now case-insensitive. For example, 5420c8de5b0SBaptiste Daroussin%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are 5430c8de5b0SBaptiste Daroussinall equivalent. 5440c8de5b0SBaptiste Daroussin 5450c8de5b0SBaptiste Daroussin Commas and semicolons that are not part of C code are 5460c8de5b0SBaptiste Daroussintreated as commentary. 5470c8de5b0SBaptiste Daroussin 5480c8de5b0SBaptiste Daroussin Line-end comments, as in BCPL, are permitted. Line-end 5490c8de5b0SBaptiste Daroussincomments begin with // and end at the next end-of-line. 5500c8de5b0SBaptiste DaroussinLine-end comments are permitted in C code; they are converted 5510c8de5b0SBaptiste Daroussinto C comments on output. 5520c8de5b0SBaptiste Daroussin 5530c8de5b0SBaptiste Daroussin The form of y.output files has been changed to look more 5540c8de5b0SBaptiste Daroussinlike those produced by AT&T Yacc. 5550c8de5b0SBaptiste Daroussin 5560c8de5b0SBaptiste Daroussin A new kind of declaration has been added. 5570c8de5b0SBaptiste DaroussinThe form of the declaration is 5580c8de5b0SBaptiste Daroussin 5590c8de5b0SBaptiste Daroussin %ident string 5600c8de5b0SBaptiste Daroussin 5610c8de5b0SBaptiste Daroussinwhere string is a sequence of characters beginning with a 5620c8de5b0SBaptiste Daroussindouble quote and ending with either a double quote or the 5630c8de5b0SBaptiste Daroussinnext end-of-line, whichever comes first. The declaration 5640c8de5b0SBaptiste Daroussinwill cause a #ident directive to be written near the start 5650c8de5b0SBaptiste Daroussinof the output file. 5660c8de5b0SBaptiste Daroussin 5670c8de5b0SBaptiste Daroussin If a parser has been compiled with debugging code, that 5680c8de5b0SBaptiste Daroussincode can be enabled by setting an environment variable. 5690c8de5b0SBaptiste DaroussinIf the environment variable YYDEBUG is set to 0, debugging 5700c8de5b0SBaptiste Daroussinoutput is suppressed. If it is set to 1, debugging output 5710c8de5b0SBaptiste Daroussinis written to standard output. 5720c8de5b0SBaptiste Daroussin 5730c8de5b0SBaptiste Daroussin 5740c8de5b0SBaptiste Daroussin Building BtYacc 5750c8de5b0SBaptiste Daroussin --------------- 5760c8de5b0SBaptiste Daroussin by Chris Dodd and Vadim Maslov 5770c8de5b0SBaptiste Daroussin 5780c8de5b0SBaptiste DaroussinWe used GCC and GNU make to compile BtYacc both on UNIX and 5790c8de5b0SBaptiste DaroussinWIN32 paltforms. You are welcome to try different 5800c8de5b0SBaptiste Daroussincombinations of makes and compilers. Most likely it will 5810c8de5b0SBaptiste Daroussinwork, but it may require Makefile changes. 5820c8de5b0SBaptiste Daroussin 5830c8de5b0SBaptiste DaroussinThere is no config script. 5840c8de5b0SBaptiste DaroussinJust type "make" and it should compile. 5850c8de5b0SBaptiste Daroussin 5860c8de5b0SBaptiste DaroussinAWK. If you want to change file btyaccpa.ske (backtracking 5870c8de5b0SBaptiste Daroussinparser skeleton), you will need awk to compile it into 5880c8de5b0SBaptiste Daroussinskeleton.c file. We used GNU AWK (gawk) version 3.0. 5890c8de5b0SBaptiste Daroussin 5900c8de5b0SBaptiste DaroussinIt is known that using older versions of gawk 5910c8de5b0SBaptiste Daroussinmay create problems in compilation, because older awks 5920c8de5b0SBaptiste Daroussinhave problems with backslashes at the end of a line. 5930c8de5b0SBaptiste Daroussin 5940c8de5b0SBaptiste DaroussinFor MSDOS, there a "makefile.dos" that should do the trick. 5950c8de5b0SBaptiste DaroussinNote: makefile.dos was not tested for a long time. 5960c8de5b0SBaptiste Daroussin 5970c8de5b0SBaptiste DaroussinThe result of compilation should be a single executable called 5980c8de5b0SBaptiste Daroussin"btyacc" which you can install anywhere you like; 5990c8de5b0SBaptiste Daroussinit does not require any other files in the distribution to run. 6000c8de5b0SBaptiste Daroussin 6010c8de5b0SBaptiste Daroussin 6020c8de5b0SBaptiste Daroussin Legal Stuff 6030c8de5b0SBaptiste Daroussin ----------- 6040c8de5b0SBaptiste Daroussin by Chris Dodd and Vadim Maslov 6050c8de5b0SBaptiste Daroussin 6060c8de5b0SBaptiste DaroussinIn English: BtYacc is freeware. BtYacc is distributed with 6070c8de5b0SBaptiste Daroussinno warranty whatsoever. The author and any other contributors 6080c8de5b0SBaptiste Daroussintake no responsibility for any and all consequences of its use. 6090c8de5b0SBaptiste Daroussin 6100c8de5b0SBaptiste DaroussinIn Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS 6110c8de5b0SBaptiste DaroussinNOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE 6120c8de5b0SBaptiste DaroussinLIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL 6130c8de5b0SBaptiste DaroussinDAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR 6140c8de5b0SBaptiste DaroussinDATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY 6150c8de5b0SBaptiste DaroussinTHIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN 6160c8de5b0SBaptiste DaroussinIF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH 6170c8de5b0SBaptiste DaroussinDAMAGES. 618