lex.1 - OpenGrok cross reference for /freebsd/usr.bin/lex/lex.1

Lines Matching +full:default +full:- +full:input
4 flex, lex \- fast lexical analyzer generator
7 .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
8 .B [\-\-help \-\-version]
13 a tool for generating programs that perform pattern-matching on text.
22     Format Of The Input File
27     How The Input Is Matched
35         how to control the input source
39         managing "mini-scanners"
41     Multiple Input Buffers
42         how to manipulate multiple input sources; how to
45     End-of-file Rules
46         special rules for matching the end of the input
58         flex command-line options, and the "%option"
96 the given input files, or its standard input if no file names are given,
107 .B \-ll
110 it analyzes its input for occurrences
119 input specifies a scanner which whenever it encounters the string
127 By default, any text not matched by a
131 to copy its input file to its output with each occurrence
133 In this input, there is just one rule.
161 of lines in its input (it produces no output other than the
178     /* scanner for a toy Pascal-like language */
185     DIGIT    [0-9]
186     ID       [a-z][a-z0-9]*
206     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
208     "{"[^}\\n]*"}"     /* eat up one-line comments */
220         ++argv, --argc;  /* skip over program name */
238 .SH FORMAT OF THE INPUT FILE
241 input file consists of three sections, separated by a line with just
268 followed by zero or more letters, digits, '_', or '-' (dash).
269 The definition is taken to begin at the first non-white-space character
276     DIGIT    [0-9]
277     ID       [a-z][a-z0-9]*
283 followed by zero-or-more letters-or-digits.
293     ([0-9])+"."([0-9])*
296 and matches one-or-more digits followed by a '.' followed
297 by zero-or-more digits.
303 input contains a series of rules of the form:
322 in the input file may be skipped, too.
339 but its meaning is not well-defined and it may well cause compile-time
349 The patterns in the input are written using an extended set of regular
358     [abj-oZ]   a "character class" with a range in it; matches
361     [^A-Z]     a "negated character class", i.e., any character
364     [^A-Z\\n]   any character EXCEPT an uppercase letter or
377                  then the ANSI-C interpretation of \\x.
397                  but is then returned to the input before
415                input yourself, or explicitly use r/\\r\\n for "r$".
426     <<EOF>>    an end-of-file
428                an end-of-file when in start condition s1 or s2
433 operators, '-', ']', and, at the beginning of the class, '^'.
457 the string "ba" followed by zero-or-more r's.
458 To match "foo" or zero-or-more "bar"'s, use:
464 and to match zero-or-more "foo"'s-or-"bar"'s:
497 returns true - i.e., any alphabetic or numeric.
509     [[:alpha:]0-9]
510     [a-zA-Z0-9]
513 If your scanner is case-insensitive (the
514 .B \-i
523 .IP -
524 A negated character class such as the example "[^A-Z]"
529 (e.g., "[^A-Z\\n]").
534 input unless there's another quote in the input.
535 .IP -
561 If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
570 bar-at-the-beginning-of-a-line.
571 .SH HOW THE INPUT IS MATCHED
572 When the generated scanner is run, it analyzes its input looking
577 though it will then be returned to the input).
582 input file is chosen.
595 input is scanned for another match.
598 .I default rule
599 is executed: the next character in the input is considered matched and
603 input is:
609 which generates a scanner that simply copies its input (one character
624 in the first (definitions) section of your flex input.
625 The default is
628 .B -l
680 input.
696 results in too much text being pushed back; instead, a run-time error results.
707 The pattern ends at the first non-escaped
710 action is empty, then when the pattern is matched the input token
713 which deletes all occurrences of "zap me" from its input:
720 (It will copy all other characters in the input to the output since
721 they will be matched by the default rule.)
759 characters to its end--these will overwrite later characters in the
760 input stream).
775 .IP -
778 .IP -
782 .IP -
785 input (or a prefix of the input).
787 above in "How the Input is Matched", and
795 input file, or one which matched less text.
797 words in the input and call the routine special() whenever "frob" is seen:
809 any "frob"'s in the input would not be counted as words, since the
839 .I -Cf
841 .I -CF
851 .IP -
859 For example, given the input "mega-kludge"
860 the following will write "mega-mega-kludge" to the output:
864     mega-    ECHO; yymore();
868 First "mega-" is matched and echoed to the output.
870 is matched, but the previous "mega-" is still hanging around at the
875 for the "kludge" rule will actually write "mega-kludge".
892 .IP -
896 characters of the current token back to the input stream, where they
906 For example, on the input "foobar" the following will write out
912     [a-z]+    ECHO;
917 will cause the entire current input string to be scanned again.
919 changed how the scanner will subsequently process its input (using
925 is a macro and can only be used in the flex input file, not from
927 .IP -
931 back onto the input stream.
942     for ( i = yyleng - 1; i >= 0; --i )
953 of the input stream, pushing back strings must be done back-to-front.
959 (the default), a call to
972 instead (see How The Input Is Matched).
976 to attempt to mark the input stream with an end-of-file.
977 .IP -
978 .B input()
979 reads the next character from the input stream.
990                     while ( (c = input()) != '*' &&
996                         while ( (c = input()) == '*' )
1014 .B input()
1020 .I input.)
1021 .IP -
1031 function, described below in the section Multiple Input Buffers.
1032 .IP -
1037 By default,
1039 is also called when an end-of-file is encountered.
1050 By default,
1075 K&R-style/non-prototyped function declaration, you must terminate
1076 the definition with a semi-colon (;).
1080 is called, it scans tokens from the global input file
1084 an end-of-file (at which point it returns the value 0) or
1089 If the scanner reaches an end-of-file, subsequent calls are undefined
1092 is pointed at a new input file (in which case scanning continues from
1109 to a new input file or using
1114 and because it can be used to switch input files in the middle of scanning.
1115 It can also be used to throw away the current input buffer, by calling
1136 By default (and for purposes of efficiency), the scanner uses
1137 block-reads rather than simple
1141 The nature of how it gets its input can be controlled by defining the
1154 The default YY_INPUT reads from the
1155 global file-pointer "yyin".
1158 section of the input file):
1170 This definition will change the input processing to occur
1173 When the scanner receives an end-of-file indication from YY_INPUT,
1182 to point to another input file, and scanning continues.
1184 true (non-zero), then the scanner terminates, returning 0 to its
1199 .B \-ll
1200 to obtain the default version of the routine, which always returns 1.
1202 Three routines are available for scanning from in-memory buffers rather
1207 See the discussion of them below in the section Multiple Input Buffers.
1213 global (default, stdout), which may be redefined by the user simply
1244 are declared in the definitions (first) section of the input
1274 input.
1276 exclusive start conditions make it easy to specify "mini-scanners"
1277 which scan portions of the input that are syntactically different
1328 Also note that the special start-condition specifier
1343 The default rule (to
1358 referred to as the start-condition "INITIAL", so
1391 By default it will treat it as
1394 "expect-floats"
1395 it will treat it as a single token, the floating-point number
1405     expect-floats        BEGIN(expect);
1407     <expect>[0-9]+"."[0-9]+      {
1413                  * we need another "expect-number"
1420     [0-9]+      {
1429 maintaining a count of the current input line.
1447 a high-speed scanner try to match as much possible in each rule, as
1450 Note that start-conditions names are really integer values and
1480 the integer-valued
1498 Note that start conditions do not have their own name-space; %s's and %x's
1501 Finally, here's an example of how to match C-style quoted strings using
1515     <str>\\"        { /* saw closing quote - all done */
1524             /* error - unterminated string constant */
1528     <str>\\\\[0-7]{1,3} {
1535                     /* error, constant is out-of-bounds */
1540     <str>\\\\[0-9]+ {
1541             /* generate error - bad escape sequence; something
1623 The start condition stack grows dynamically and so has no built-in
1630 .SH MULTIPLE INPUT BUFFERS
1632 require reading from several input streams.
1636 where the next input will be read from by simply writing a
1642 which requires switching the input source.
1647 input buffers.
1648 An input buffer is created by using:
1672 correctly declare input buffers in source files other than that
1696 switches the scanner's input buffer so subsequent tokens will
1705 Note also that switching input sources via either
1767     [a-z]+              ECHO;
1768     [^a-z\\n]*\\n?        ECHO;
1793             if ( --include_stack_ptr < 0 )
1807 Three routines are available for setting up input buffers for
1808 scanning in-memory strings instead of files.
1810 a new input buffer for scanning the string, and return a corresponding
1822 scans a NUL-terminated string.
1853 .B base[size-2],
1862 returns a nil pointer instead of creating a new input buffer.
1868 .SH END-OF-FILE RULES
1870 actions which are to be taken when an end-of-file is
1871 encountered and yywrap() returns non-zero (i.e., indicates
1875 .IP -
1878 to a new input file (in previous versions of flex, after doing the
1882 .IP -
1886 .IP -
1890 .IP -
1937 it could be #define'd to call a routine to convert yytext to lower-case.
1955 gives the total number of rules (including the default rule, even if
1957 .B \-s),
1979 but must be used when the scanner's input source is indeed
1982 .B \-I
1984 A non-zero value
1986 value as non-interactive.
1989 .B %option always-interactive
1991 .B %option never-interactive
2002 A non-zero macro argument makes rules anchored with
2014 By default, it is simply a "break", to separate
2028 .IP -
2043 if you don't like the default value (generally 8KB).
2049 .I input()
2059 which is the default.
2065 .B \-+
2067 .IP -
2070 .IP -
2072 is the file which by default
2079 buffers its input; use
2082 Once scanning terminates because an end-of-file
2085 at the new input file and then call the scanner again to continue scanning.
2086 .IP -
2090 at the new input file.
2091 The switch-over to the new file is immediate
2092 (any previously buffered-up input is lost).
2097 as an argument thus throws away the current input buffer and continues
2098 scanning the same input file.
2099 .IP -
2105 .IP -
2110 .IP -
2122 parser-generator.
2126 to find the next input token.
2136 .B \-d
2145 input.
2159     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2166 .B \-b, --backup
2167 Generate backing-up information to
2170 and the input characters on which they do so.
2172 can remove backing-up states.
2175 backing-up states are eliminated and
2176 .B \-Cf
2178 .B \-CF
2180 .B \-p
2186 .B \-c
2187 is a do-nothing, deprecated option included for POSIX compliance.
2189 .B \-d, \-\-debug
2195 is non-zero (which is the default),
2201     --accepting rule at line 53 ("the matched text")
2207 default rule, reaches the end of its input buffer (or encounters
2209 or reaches an end-of-file.
2211 .B \-f, \-\-full
2217 .B \-Cfr
2220 .B \-h, \-\-help
2226 .B \-?
2228 .B \-\-help
2230 .B \-h.
2232 .B \-i, \-\-case-insensitive
2236 .I case-insensitive
2240 input patterns will
2241 be ignored, and tokens in the input will be matched regardless of case.
2246 .B \-l, \-\-lex\-compat
2255 .B \-+, -f, -F, -Cf,
2257 .B -CF
2266 .B \-n
2267 is another do-nothing, deprecated option included only for
2270 .B \-p, \-\-perf\-report
2274 input file which will cause a serious loss of performance in the resulting
2289 .B \-I
2292 .B \-s, \-\-no\-default
2294 .I default rule
2295 (that unmatched scanner input is echoed to
2298 If the scanner encounters input that does not
2303 .B \-t, \-\-stdout
2310 .B \-v, \-\-verbose
2321 .B \-V),
2323 those that are on by default.
2325 .B \-w, \-\-nowarn
2328 .B \-B, \-\-batch
2336 .B \-I
2339 .B \-B
2349 .B \-Cf
2351 .B \-CF
2353 .B \-B
2356 .B \-F, \-\-fast
2362 .B (-f),
2366 and a catch-all, "identifier" rule, such as in the set:
2372     "default" return TOK_DEFAULT;
2373     [a-z]+    return TOK_ID;
2380 .B -F.
2383 .B \-CFr
2386 .B \-+.
2388 .B \-I, \-\-interactive
2407 scanners default to
2410 .B \-Cf
2412 .B \-CF
2413 table-compression options (see below).
2415 for high-performance you should be using one of these options, so if you
2418 assumes you'd rather trade off a bit of run-time performance for intuitive
2423 .B \-I
2425 .B \-Cf
2427 .B \-CF.
2428 Thus, this option is not really needed; it is on by default for all those
2433 returns false for the scanner input, flex will revert to batch mode, even if
2434 .B \-I
2437 .B %option always-interactive
2443 .B \-B
2446 .B \-L, \-\-noline
2458 input file (if the errors are due to code in the input file), or
2462 fault -- you should report these sorts of errors to the email address
2465 .B \-T, \-\-trace
2474 the form of the input and the resultant non-deterministic and deterministic
2479 .B \-V, \-\-version
2483 .B \-\-version
2485 .B \-V.
2487 .B \-7, \-\-7bit
2490 to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
2491 characters in its input.
2493 .B \-7
2496 .B \-8
2499 or crash if their input contains an 8-bit character.
2502 .B \-Cf
2504 .B \-CF
2506 .B \-7
2510 default behavior is to generate an 8-bit scanner unless you use the
2511 .B \-Cf
2513 .B \-CF,
2516 defaults to generating 7-bit scanners unless your site was always
2517 configured to generate 8-bit scanners (as will often be the case
2518 with non-USA sites).
2519 You can tell whether flex generated a 7-bit
2520 or an 8-bit scanner by inspecting the flag summary in the
2521 .B \-v
2525 .B \-Cfe
2527 .B \-CFe
2529 discussed see below), flex still defaults to generating an 8-bit
2530 scanner, since usually with these compression options full 8-bit tables
2531 are not much more expensive than 7-bit tables.
2533 .B \-8, \-\-8bit
2536 to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2539 .B \-Cf
2541 .B \-CF,
2542 as otherwise flex defaults to generating an 8-bit scanner anyway.
2545 .B \-7
2546 above for flex's default behavior and the tradeoffs between 7-bit
2547 and 8-bit scanners.
2549 .B \-+, \-\-c++
2555 .B \-C[aefFmr]
2556 controls the degree of table compression and, more generally, trade-offs
2559 .B \-Ca, \-\-align
2565 than with smaller-sized units such as shortwords.
2569 .B \-Ce, \-\-ecs
2578 input is in the character class
2579 "[0-9]" then the digits '0', '1', ..., '9' will all be put
2583 a factor of 2-5) and are pretty cheap performance-wise (one array
2584 look-up per character scanned).
2586 .B \-Cf
2589 scanner tables should be generated -
2595 .B \-CF
2598 .B \-F
2602 .B \-+.
2604 .B \-Cm, \-\-meta-ecs
2608 .I meta-equivalence classes,
2611 Meta-equivalence
2614 array look-up per character scanned).
2616 .B \-Cr, \-\-read
2619 use of the standard I/O library (stdio) for input.
2628 .B \-Cf
2630 .B \-CF.
2632 .B \-Cr
2636 whatever text your previous reads left in the stdio input buffer).
2638 .B \-Cr
2644 .B \-C
2646 equivalence classes nor meta-equivalence classes should be used.
2649 .B \-Cf
2651 .B \-CF
2653 .B \-Cm
2654 do not make sense together - there is no opportunity for meta-equivalence
2659 The default setting is
2660 .B \-Cem,
2664 and meta-equivalence classes.
2667 faster-executing scanners at the cost of larger tables with
2672           -Cem
2673           -Cm
2674           -Ce
2675           -C
2676           -C{f,F}e
2677           -C{f,F}
2678           -C{f,F}a
2684 during development you will usually want to use the default, maximal
2687 .B \-Cfe
2691 .B \-ooutput, \-\-outputfile=FILE
2697 .B \-o
2699 .B \-t
2705 .B \\-L
2709 .B \-Pprefix, \-\-prefix=STRING
2710 changes the default
2714 for all globally-visible variable and function names to instead be
2717 .B \-Pfoo
2722 It also changes the name of the default output file from
2763 provide your own (appropriately-named) version of the routine for your
2767 .B \-ll
2768 no longer provides one for you by default.
2770 .B \-Sskeleton_file, \-\-skel=FILE
2771 overrides the default skeleton file from which
2778 .B \-X, \-\-posix\-compat
2781 .B \-\-yylineno
2784 .B \-\-yyclass=NAME
2787 .B \-\-header\-file=FILE
2790 .B \-\-tables\-file[=FILE]
2793 .B \\-Dmacro[=defn]
2794 #define macro defn (default defn is '1').
2796 .B \-R,  \-\-reentrant
2799 .B \-\-bison\-bridge
2802 .B \-\-bison\-locations
2805 .B \-\-stdinit
2808 .B \-\-noansi\-definitions old\-style function definitions.
2810 .B \-\-noansi\-prototypes
2813 .B \-\-nounistd
2816 .B \-\-noFUNCTION
2821 scanner specification itself, rather than from the flex command-line.
2827 directive, and multiple directives in the first section of your flex input
2835     7bit            -7 option
2836     8bit            -8 option
2837     align           -Ca option
2838     backup          -b option
2839     batch           -B option
2840     c++             -+ option
2843     case-sensitive  opposite of -i (default)
2845     case-insensitive or
2846     caseless        -i option
2848     debug           -d option
2849     default         opposite of -s option
2850     ecs             -Ce option
2851     fast            -F option
2852     full            -f option
2853     interactive     -I option
2854     lex-compat      -l option
2855     meta-ecs        -Cm option
2856     perf-report     -p option
2857     read            -Cr option
2858     stdout          -t option
2859     verbose         -v option
2860     warn            opposite of -w option
2861                     (use "%option nowarn" for -w)
2864     pointer         equivalent to "%pointer" (default)
2871 .B always-interactive
2872 instructs flex to generate a scanner which always considers its input
2874 Normally, on each new input file the scanner calls
2877 the scanner's input source is interactive and thus should be read a
2883 directs flex to provide a default
2891 .B never-interactive
2892 instructs flex to generate a scanner which never considers its input
2896 .B always-interactive.
2912 instead of the default of
2921 to be compile-time constant.
2927 read from its input in the global variable
2930 .B %option lex-compat.
2937 upon an end-of-file, but simply assume that there are no more
2962 Three options take string-delimited values, offset with '=':
2969 .B -oABC,
2977 .B -PXYZ.
2985 .B \-+
3001 member function that emits a run-time error (by invoking
3015     input, unput
3027 is that it generate high-performance scanners.
3031 .B \-C
3044     %option always-interactive
3046     '^' beginning-of-line operator
3057 is a quite-cheap macro; so if just putting back some excess text you
3068 .B \-b
3072 For example, on the input
3083     State #6 is non-accepting -
3086      out-transitions: [ o ]
3087      jam-transitions: EOF [ \\001-n  p-\\177 ]
3089     State #8 is non-accepting -
3092      out-transitions: [ a ]
3093      jam-transitions: EOF [ \\001-`  b-\\177 ]
3095     State #9 is non-accepting -
3098      out-transitions: [ r ]
3099      jam-transitions: EOF [ \\001-q  s-\\177 ]
3109 at lines 2 and 3 in the input file.
3117 have to back up to simply match the 'f' (by the default rule).
3129 .B \-Cf
3131 .B \-CF,
3151 done using a "catch-all" rule:
3158     [a-z]+      return TOK_ID;
3216 This is because with long tokens the processing of most input
3283 To eliminate the back-tracking, introduce a catch-all rule:
3294     [a-z]+   |
3312     [a-z]+\\n |
3320 know that there will never be any characters in the input stream
3331 since we never expect to encounter such an input and therefore don't
3332 how it's classified, we can introduce one more catch-all rule, this
3344     [a-z]+\\n |
3345     [a-z]+   |
3350 .B \-Cf,
3364 How the Input is Matched, dynamically resizing
3382 Note that the default input source for your scanner remains
3384 and default echoing is still done to
3394 .B \-+
3423 returns the current input line number
3470 object using the given streams for input and output.
3471 If not specified, the streams default to
3480 does for ordinary flex scanners: it scans the input stream, consuming
3515 (if non-nil)
3520 (ditto), deleting the previous input buffer if
3526 first switches the input streams via
3543 To indicate end-of-input, return 0 characters.
3545 .B \-B
3547 .B \-I
3553 the scanner might be scanning an interactive input source, you can
3563 which, while NUL-terminated, may also contain "internal" NUL's if
3569 The default version of this function writes the message to the stream
3583 .B \-P
3590 (the default).
3605     alpha   [A-Za-z]
3606     dig     [0-9]
3607     name    ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3608     num1    [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3609     num2    [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3647         while(lexer->yylex() != 0)
3653 .B \-P
3697 (the default), a call to
3707 .B \-l
3714 .B \-l
3721 .IP -
3727 .B \-l
3733 should be maintained on a per-buffer basis, rather than a per-scanner
3738 .IP -
3740 .B input()
3744 .B input()
3745 encounters an end-of-file the normal
3748 A ``real'' end-of-file is returned by
3749 .B input()
3753 Input is instead controlled by defining the
3760 .B input()
3763 scanner's input other than by making an initial assignment to
3765 .IP -
3770 .IP -
3776 an interrupt handler which long-jumps out of the scanner, and
3781     fatal flex scanner internal error--end of buffer missed
3790 Note that this call will throw away any buffered input; usually this
3798 .IP -
3803 macro is done to the file-pointer
3810 .IP -
3814 .IP -
3821     NAME    [A-Z][A-Z0-9]*
3828 is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3830 "[A-Z0-9]*".
3834 "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3853 .B \-l
3859 .IP -
3873 .IP -
3880 .IP -
3891 .B \-l
3893 .IP -
3904 .IP -
3915 .IP -
3916 The special table-size declarations such as
3925 .IP -
3953     interactive/non-interactive scanners
3974 semi-colons, while with
3998 an identifier "catch-all" rule:
4001     [a-z]+    got_identifier();
4010 .B \-s
4012 option given but default rule can be matched
4014 that the default rule (match any single character) is the only one
4015 that will match a particular input.
4017 .B \-s
4022 .I yymore_used_but_not_detected undefined -
4041 .I flex scanner jammed -
4043 .B \-s
4044 has encountered an input string which wasn't matched by
4048 .I token too large, exceeds YYLMAX -
4053 constant (8K bytes by default).
4059 input.
4061 .I scanner requires \-8 flag to
4062 .I use the character 'x' -
4063 Your scanner specification includes recognizing the 8-bit character
4065 and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
4067 .B \-Cf
4069 .B \-CF
4072 .B \-7
4075 .I flex scanner push-back overflow -
4079 both the pushed-back text and the current token in
4085 input buffer overflow, can't enlarge buffer because scanner uses REJECT -
4087 to expand the input buffer.
4093 fatal flex scanner internal error--end of buffer missed -
4094 This can occur in a scanner which is reentered after a long-jump
4104 .I too many start conditions in <> construct! -
4109 .B \-ll
4119 .B -+.
4132 backing-up information for
4133 .B \-b
4148 For some trailing context rules, parts which are actually fixed-length are
4151 considered variable-length.
4173 .B \-l
4176 Pattern-matching of NUL's is substantially slower than matching other
4179 Dynamic resizing of the input buffer is slow, as it entails rescanning
4182 Due to both buffering of input and read-ahead, you cannot intermix
4189 .B input()
4193 .B \-v
4203 .B \-f
4205 .B \-F
4220 .I LEX \- Lexical Analyzer Generator
4224 Addison-Wesley (1986).
4225 Describes the pattern-matching techniques used by
4239 beta-testers, feedbackers, and contributors, especially Francois Pinard,
4242 Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4273 Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4279 mail-archiving skills but whose contributions are appreciated all the
4287 Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to