lex.1 - OpenGrok cross reference for /freebsd/usr.bin/lex/lex.1

Lines Matching +full:trade +full:- +full:off
4 flex, lex \- fast lexical analyzer generator
7 .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
8 .B [\-\-help \-\-version]
13 a tool for generating programs that perform pattern-matching on text.
39         managing "mini-scanners"
45     End-of-file Rules
58         flex command-line options, and the "%option"
107 .B \-ll
178     /* scanner for a toy Pascal-like language */
185     DIGIT    [0-9]
186     ID       [a-z][a-z0-9]*
206     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
208     "{"[^}\\n]*"}"     /* eat up one-line comments */
220         ++argv, --argc;  /* skip over program name */
268 followed by zero or more letters, digits, '_', or '-' (dash).
269 The definition is taken to begin at the first non-white-space character
276     DIGIT    [0-9]
277     ID       [a-z][a-z0-9]*
283 followed by zero-or-more letters-or-digits.
293     ([0-9])+"."([0-9])*
296 and matches one-or-more digits followed by a '.' followed
297 by zero-or-more digits.
339 but its meaning is not well-defined and it may well cause compile-time
358     [abj-oZ]   a "character class" with a range in it; matches
361     [^A-Z]     a "negated character class", i.e., any character
364     [^A-Z\\n]   any character EXCEPT an uppercase letter or
377                  then the ANSI-C interpretation of \\x.
426     <<EOF>>    an end-of-file
428                an end-of-file when in start condition s1 or s2
433 operators, '-', ']', and, at the beginning of the class, '^'.
457 the string "ba" followed by zero-or-more r's.
458 To match "foo" or zero-or-more "bar"'s, use:
464 and to match zero-or-more "foo"'s-or-"bar"'s:
497 returns true - i.e., any alphabetic or numeric.
509     [[:alpha:]0-9]
510     [a-zA-Z0-9]
513 If your scanner is case-insensitive (the
514 .B \-i
523 .IP -
524 A negated character class such as the example "[^A-Z]"
529 (e.g., "[^A-Z\\n]").
535 .IP -
561 If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
570 bar-at-the-beginning-of-a-line.
628 .B -l
696 results in too much text being pushed back; instead, a run-time error results.
707 The pattern ends at the first non-escaped
759 characters to its end--these will overwrite later characters in the
775 .IP -
778 .IP -
782 .IP -
839 .I -Cf
841 .I -CF
851 .IP -
859 For example, given the input "mega-kludge"
860 the following will write "mega-mega-kludge" to the output:
864     mega-    ECHO; yymore();
868 First "mega-" is matched and echoed to the output.
870 is matched, but the previous "mega-" is still hanging around at the
875 for the "kludge" rule will actually write "mega-kludge".
892 .IP -
912     [a-z]+    ECHO;
927 .IP -
942     for ( i = yyleng - 1; i >= 0; --i )
953 of the input stream, pushing back strings must be done back-to-front.
976 to attempt to mark the input stream with an end-of-file.
977 .IP -
1021 .IP -
1032 .IP -
1039 is also called when an end-of-file is encountered.
1075 K&R-style/non-prototyped function declaration, you must terminate
1076 the definition with a semi-colon (;).
1084 an end-of-file (at which point it returns the value 0) or
1089 If the scanner reaches an end-of-file, subsequent calls are undefined
1134 will resume scanning where it left off.
1137 block-reads rather than simple
1155 global file-pointer "yyin".
1173 When the scanner receives an end-of-file indication from YY_INPUT,
1184 true (non-zero), then the scanner terminates, returning 0 to its
1199 .B \-ll
1202 Three routines are available for scanning from in-memory buffers rather
1276 exclusive start conditions make it easy to specify "mini-scanners"
1328 Also note that the special start-condition specifier
1358 referred to as the start-condition "INITIAL", so
1394 "expect-floats"
1395 it will treat it as a single token, the floating-point number
1405     expect-floats        BEGIN(expect);
1407     <expect>[0-9]+"."[0-9]+      {
1413                  * we need another "expect-number"
1420     [0-9]+      {
1447 a high-speed scanner try to match as much possible in each rule, as
1450 Note that start-conditions names are really integer values and
1480 the integer-valued
1498 Note that start conditions do not have their own name-space; %s's and %x's
1501 Finally, here's an example of how to match C-style quoted strings using
1515     <str>\\"        { /* saw closing quote - all done */
1524             /* error - unterminated string constant */
1528     <str>\\\\[0-7]{1,3} {
1535                     /* error, constant is out-of-bounds */
1540     <str>\\\\[0-9]+ {
1541             /* generate error - bad escape sequence; something
1623 The start condition stack grows dynamically and so has no built-in
1767     [a-z]+              ECHO;
1768     [^a-z\\n]*\\n?        ECHO;
1793             if ( --include_stack_ptr < 0 )
1808 scanning in-memory strings instead of files.
1822 scans a NUL-terminated string.
1853 .B base[size-2],
1868 .SH END-OF-FILE RULES
1870 actions which are to be taken when an end-of-file is
1871 encountered and yywrap() returns non-zero (i.e., indicates
1875 .IP -
1882 .IP -
1886 .IP -
1890 .IP -
1937 it could be #define'd to call a routine to convert yytext to lower-case.
1957 .B \-s),
1982 .B \-I
1984 A non-zero value
1986 value as non-interactive.
1989 .B %option always-interactive
1991 .B %option never-interactive
2002 A non-zero macro argument makes rules anchored with
2028 .IP -
2065 .B \-+
2067 .IP -
2070 .IP -
2082 Once scanning terminates because an end-of-file
2086 .IP -
2091 The switch-over to the new file is immediate
2092 (any previously buffered-up input is lost).
2099 .IP -
2105 .IP -
2110 .IP -
2122 parser-generator.
2136 .B \-d
2159     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2166 .B \-b, --backup
2167 Generate backing-up information to
2172 can remove backing-up states.
2175 backing-up states are eliminated and
2176 .B \-Cf
2178 .B \-CF
2180 .B \-p
2186 .B \-c
2187 is a do-nothing, deprecated option included for POSIX compliance.
2189 .B \-d, \-\-debug
2195 is non-zero (which is the default),
2201     --accepting rule at line 53 ("the matched text")
2209 or reaches an end-of-file.
2211 .B \-f, \-\-full
2217 .B \-Cfr
2220 .B \-h, \-\-help
2226 .B \-?
2228 .B \-\-help
2230 .B \-h.
2232 .B \-i, \-\-case-insensitive
2236 .I case-insensitive
2246 .B \-l, \-\-lex\-compat
2255 .B \-+, -f, -F, -Cf,
2257 .B -CF
2266 .B \-n
2267 is another do-nothing, deprecated option included only for
2270 .B \-p, \-\-perf\-report
2289 .B \-I
2292 .B \-s, \-\-no\-default
2303 .B \-t, \-\-stdout
2310 .B \-v, \-\-verbose
2321 .B \-V),
2325 .B \-w, \-\-nowarn
2328 .B \-B, \-\-batch
2336 .B \-I
2339 .B \-B
2349 .B \-Cf
2351 .B \-CF
2353 .B \-B
2356 .B \-F, \-\-fast
2362 .B (-f),
2366 and a catch-all, "identifier" rule, such as in the set:
2373     [a-z]+    return TOK_ID;
2376 then you're better off using the full table representation.
2379 to detect the keywords, you're better off using
2380 .B -F.
2383 .B \-CFr
2386 .B \-+.
2388 .B \-I, \-\-interactive
2410 .B \-Cf
2412 .B \-CF
2413 table-compression options (see below).
2415 for high-performance you should be using one of these options, so if you
2418 assumes you'd rather trade off a bit of run-time performance for intuitive
2423 .B \-I
2425 .B \-Cf
2427 .B \-CF.
2434 .B \-I
2437 .B %option always-interactive
2443 .B \-B
2446 .B \-L, \-\-noline
2462 fault -- you should report these sorts of errors to the email address
2465 .B \-T, \-\-trace
2474 the form of the input and the resultant non-deterministic and deterministic
2479 .B \-V, \-\-version
2483 .B \-\-version
2485 .B \-V.
2487 .B \-7, \-\-7bit
2490 to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
2493 .B \-7
2496 .B \-8
2499 or crash if their input contains an 8-bit character.
2502 .B \-Cf
2504 .B \-CF
2506 .B \-7
2510 default behavior is to generate an 8-bit scanner unless you use the
2511 .B \-Cf
2513 .B \-CF,
2516 defaults to generating 7-bit scanners unless your site was always
2517 configured to generate 8-bit scanners (as will often be the case
2518 with non-USA sites).
2519 You can tell whether flex generated a 7-bit
2520 or an 8-bit scanner by inspecting the flag summary in the
2521 .B \-v
2525 .B \-Cfe
2527 .B \-CFe
2529 discussed see below), flex still defaults to generating an 8-bit
2530 scanner, since usually with these compression options full 8-bit tables
2531 are not much more expensive than 7-bit tables.
2533 .B \-8, \-\-8bit
2536 to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2539 .B \-Cf
2541 .B \-CF,
2542 as otherwise flex defaults to generating an 8-bit scanner anyway.
2545 .B \-7
2546 above for flex's default behavior and the tradeoffs between 7-bit
2547 and 8-bit scanners.
2549 .B \-+, \-\-c++
2555 .B \-C[aefFmr]
2556 controls the degree of table compression and, more generally, trade-offs
2559 .B \-Ca, \-\-align
2560 ("align") instructs flex to trade off larger tables in the
2565 than with smaller-sized units such as shortwords.
2569 .B \-Ce, \-\-ecs
2579 "[0-9]" then the digits '0', '1', ..., '9' will all be put
2583 a factor of 2-5) and are pretty cheap performance-wise (one array
2584 look-up per character scanned).
2586 .B \-Cf
2589 scanner tables should be generated -
2595 .B \-CF
2598 .B \-F
2602 .B \-+.
2604 .B \-Cm, \-\-meta-ecs
2608 .I meta-equivalence classes,
2611 Meta-equivalence
2614 array look-up per character scanned).
2616 .B \-Cr, \-\-read
2628 .B \-Cf
2630 .B \-CF.
2632 .B \-Cr
2638 .B \-Cr
2644 .B \-C
2646 equivalence classes nor meta-equivalence classes should be used.
2649 .B \-Cf
2651 .B \-CF
2653 .B \-Cm
2654 do not make sense together - there is no opportunity for meta-equivalence
2660 .B \-Cem,
2664 and meta-equivalence classes.
2666 You can trade off
2667 faster-executing scanners at the cost of larger tables with
2672           -Cem
2673           -Cm
2674           -Ce
2675           -C
2676           -C{f,F}e
2677           -C{f,F}
2678           -C{f,F}a
2687 .B \-Cfe
2691 .B \-ooutput, \-\-outputfile=FILE
2697 .B \-o
2699 .B \-t
2705 .B \\-L
2709 .B \-Pprefix, \-\-prefix=STRING
2714 for all globally-visible variable and function names to instead be
2717 .B \-Pfoo
2763 provide your own (appropriately-named) version of the routine for your
2767 .B \-ll
2770 .B \-Sskeleton_file, \-\-skel=FILE
2778 .B \-X, \-\-posix\-compat
2781 .B \-\-yylineno
2784 .B \-\-yyclass=NAME
2787 .B \-\-header\-file=FILE
2790 .B \-\-tables\-file[=FILE]
2793 .B \\-Dmacro[=defn]
2796 .B \-R,  \-\-reentrant
2799 .B \-\-bison\-bridge
2802 .B \-\-bison\-locations
2805 .B \-\-stdinit
2808 .B \-\-noansi\-definitions old\-style function definitions.
2810 .B \-\-noansi\-prototypes
2813 .B \-\-nounistd
2816 .B \-\-noFUNCTION
2821 scanner specification itself, rather than from the flex command-line.
2835     7bit            -7 option
2836     8bit            -8 option
2837     align           -Ca option
2838     backup          -b option
2839     batch           -B option
2840     c++             -+ option
2843     case-sensitive  opposite of -i (default)
2845     case-insensitive or
2846     caseless        -i option
2848     debug           -d option
2849     default         opposite of -s option
2850     ecs             -Ce option
2851     fast            -F option
2852     full            -f option
2853     interactive     -I option
2854     lex-compat      -l option
2855     meta-ecs        -Cm option
2856     perf-report     -p option
2857     read            -Cr option
2858     stdout          -t option
2859     verbose         -v option
2860     warn            opposite of -w option
2861                     (use "%option nowarn" for -w)
2871 .B always-interactive
2891 .B never-interactive
2896 .B always-interactive.
2921 to be compile-time constant.
2930 .B %option lex-compat.
2937 upon an end-of-file, but simply assume that there are no more
2962 Three options take string-delimited values, offset with '=':
2969 .B -oABC,
2977 .B -PXYZ.
2985 .B \-+
3001 member function that emits a run-time error (by invoking
3027 is that it generate high-performance scanners.
3031 .B \-C
3044     %option always-interactive
3046     '^' beginning-of-line operator
3057 is a quite-cheap macro; so if just putting back some excess text you
3068 .B \-b
3083     State #6 is non-accepting -
3086      out-transitions: [ o ]
3087      jam-transitions: EOF [ \\001-n  p-\\177 ]
3089     State #8 is non-accepting -
3092      out-transitions: [ a ]
3093      jam-transitions: EOF [ \\001-`  b-\\177 ]
3095     State #9 is non-accepting -
3098      out-transitions: [ r ]
3099      jam-transitions: EOF [ \\001-q  s-\\177 ]
3129 .B \-Cf
3131 .B \-CF,
3151 done using a "catch-all" rule:
3158     [a-z]+      return TOK_ID;
3283 To eliminate the back-tracking, introduce a catch-all rule:
3294     [a-z]+   |
3312     [a-z]+\\n |
3332 how it's classified, we can introduce one more catch-all rule, this
3344     [a-z]+\\n |
3345     [a-z]+   |
3350 .B \-Cf,
3394 .B \-+
3515 (if non-nil)
3543 To indicate end-of-input, return 0 characters.
3545 .B \-B
3547 .B \-I
3563 which, while NUL-terminated, may also contain "internal" NUL's if
3583 .B \-P
3605     alpha   [A-Za-z]
3606     dig     [0-9]
3607     name    ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3608     num1    [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3609     num2    [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3647         while(lexer->yylex() != 0)
3653 .B \-P
3707 .B \-l
3714 .B \-l
3721 .IP -
3727 .B \-l
3733 should be maintained on a per-buffer basis, rather than a per-scanner
3738 .IP -
3745 encounters an end-of-file the normal
3748 A ``real'' end-of-file is returned by
3765 .IP -
3770 .IP -
3776 an interrupt handler which long-jumps out of the scanner, and
3781     fatal flex scanner internal error--end of buffer missed
3798 .IP -
3803 macro is done to the file-pointer
3810 .IP -
3814 .IP -
3821     NAME    [A-Z][A-Z0-9]*
3828 is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3830 "[A-Z0-9]*".
3834 "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3853 .B \-l
3859 .IP -
3873 .IP -
3880 .IP -
3891 .B \-l
3893 .IP -
3904 .IP -
3915 .IP -
3916 The special table-size declarations such as
3925 .IP -
3953     interactive/non-interactive scanners
3974 semi-colons, while with
3998 an identifier "catch-all" rule:
4001     [a-z]+    got_identifier();
4010 .B \-s
4017 .B \-s
4022 .I yymore_used_but_not_detected undefined -
4041 .I flex scanner jammed -
4043 .B \-s
4048 .I token too large, exceeds YYLMAX -
4061 .I scanner requires \-8 flag to
4062 .I use the character 'x' -
4063 Your scanner specification includes recognizing the 8-bit character
4065 and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
4067 .B \-Cf
4069 .B \-CF
4072 .B \-7
4075 .I flex scanner push-back overflow -
4079 both the pushed-back text and the current token in
4085 input buffer overflow, can't enlarge buffer because scanner uses REJECT -
4093 fatal flex scanner internal error--end of buffer missed -
4094 This can occur in a scanner which is reentered after a long-jump
4104 .I too many start conditions in <> construct! -
4109 .B \-ll
4119 .B -+.
4132 backing-up information for
4133 .B \-b
4148 For some trailing context rules, parts which are actually fixed-length are
4151 considered variable-length.
4173 .B \-l
4176 Pattern-matching of NUL's is substantially slower than matching other
4182 Due to both buffering of input and read-ahead, you cannot intermix
4193 .B \-v
4203 .B \-f
4205 .B \-F
4220 .I LEX \- Lexical Analyzer Generator
4224 Addison-Wesley (1986).
4225 Describes the pattern-matching techniques used by
4239 beta-testers, feedbackers, and contributors, especially Francois Pinard,
4242 Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4273 Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4279 mail-archiving skills but whose contributions are appreciated all the
4287 Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to