xref: /illumos-gate/usr/src/man/man7/regex.7 (revision 8119dad84d6416f13557b0ba8e2aaf9064cbcfd3)
1.\"
2.\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
3.\" permission to reproduce portions of its copyrighted documentation.
4.\" Original documentation from The Open Group can be obtained online at
5.\" http://www.opengroup.org/bookstore/.
6.\"
7.\" The Institute of Electrical and Electronics Engineers and The Open
8.\" Group, have given us permission to reprint portions of their
9.\" documentation.
10.\"
11.\" In the following statement, the phrase ``this text'' refers to portions
12.\" of the system documentation.
13.\"
14.\" Portions of this text are reprinted and reproduced in electronic form
15.\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
16.\" Standard for Information Technology -- Portable Operating System
17.\" Interface (POSIX), The Open Group Base Specifications Issue 6,
18.\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
19.\" Engineers, Inc and The Open Group.  In the event of any discrepancy
20.\" between these versions and the original IEEE and The Open Group
21.\" Standard, the original IEEE and The Open Group Standard is the referee
22.\" document.  The original Standard can be obtained online at
23.\" http://www.opengroup.org/unix/online.html.
24.\"
25.\" This notice shall appear on any product containing this material.
26.\"
27.\" The contents of this file are subject to the terms of the
28.\" Common Development and Distribution License (the "License").
29.\" You may not use this file except in compliance with the License.
30.\"
31.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
32.\" or http://www.opensolaris.org/os/licensing.
33.\" See the License for the specific language governing permissions
34.\" and limitations under the License.
35.\"
36.\" When distributing Covered Code, include this CDDL HEADER in each
37.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
38.\" If applicable, add the following below this CDDL HEADER, with the
39.\" fields enclosed by brackets "[]" replaced with your own identifying
40.\" information: Portions Copyright [yyyy] [name of copyright owner]
41.\"
42.\"
43.\" Copyright (c) 1992, X/Open Company Limited  All Rights Reserved
44.\" Portions Copyright (c) 1999, Sun Microsystems, Inc.  All Rights Reserved
45.\" Copyright 2017 Nexenta Systems, Inc.
46.\"
47.Dd August 14, 2020
48.Dt REGEX 7
49.Os
50.Sh NAME
51.Nm regex
52.Nd internationalized basic and extended regular expression matching
53.Sh DESCRIPTION
54Regular Expressions
55.Pq REs
56provide a mechanism to select specific strings from a set of character strings.
57The Internationalized Regular Expressions described below differ from the Simple
58Regular Expressions described on the
59.Xr regexp 7
60manual page in the following ways:
61.Bl -bullet
62.It
63both Basic and Extended Regular Expressions are supported
64.It
65the Internationalization features -- character class, equivalence class, and
66multi-character collation -- are supported.
67.El
68.Pp
69The Basic Regular Expression
70.Pq BRE
71notation and construction rules described in the
72.Sx BASIC REGULAR EXPRESSIONS
73section apply to most utilities supporting regular expressions.
74Some utilities, instead, support the Extended Regular Expressions
75.Pq ERE
76described in the
77.Sx EXTENDED REGULAR EXPRESSIONS
78section; any exceptions for both cases are noted in the descriptions of the
79specific utilities using regular expressions.
80Both BREs and EREs are supported by the Regular Expression Matching interfaces
81.Xr regcomp 3C
82and
83.Xr regexec 3C .
84.Sh BASIC REGULAR EXPRESSIONS
85.Ss BREs Matching a Single Character
86A BRE ordinary character, a special character preceded by a backslash, or a
87period matches a single character.
88A bracket expression matches a single character or a single collating element.
89See
90.Sx RE Bracket Expression ,
91below.
92.Ss BRE Ordinary Characters
93An ordinary character is a BRE that matches itself: any character in the
94supported character set, except for the BRE special characters listed in
95.Sx BRE Special Characters ,
96below.
97.Pp
98The interpretation of an ordinary character preceded by a backslash
99.Pq Qq \e
100is undefined, except for:
101.Bl -enum
102.It
103the characters
104.Qq \&) ,
105.Qq \&( ,
106.Qq { ,
107and
108.Qq }
109.It
110the digits 1 to 9 inclusive
111.Po see
112.Sx BREs Matching Multiple Characters ,
113below
114.Pc
115.It
116a character inside a bracket expression.
117.El
118.Ss BRE Special Characters
119A BRE special character has special properties in certain contexts.
120Outside those contexts, or when preceded by a backslash, such a character will
121be a BRE that matches the special character itself.
122The BRE special characters and the contexts in which they have their special
123meaning are:
124.Bl -tag -width Ds
125.It Sy \&. \&[ \&\e
126The period, left-bracket, and backslash are special except when used in a
127bracket expression
128.Po see
129.Sx RE Bracket Expression ,
130below
131.Pc .
132An expression containing a
133.Qq \&[
134that is not preceded by a backslash and is not part of a bracket expression
135produces undefined results.
136.It Sy *
137The asterisk is special except when used:
138.Bl -bullet
139.It
140in a bracket expression
141.It
142as the first character of an entire BRE
143.Po after an initial
144.Qq ^ ,
145if any
146.Pc
147.It
148as the first character of a subexpression
149.Po after an initial
150.Qq ^ ,
151if any; see
152.Sx BREs Matching Multiple Characters ,
153below
154.Pc .
155.El
156.It Sy ^
157The circumflex is special when used:
158.Bl -bullet
159.It
160as an anchor
161.Po see
162.Sx BRE Expression Anchoring ,
163below
164.Pc .
165.It
166as the first character of a bracket expression
167.Po see
168.Sx RE Bracket Expression ,
169below
170.Pc .
171.El
172.It Sy $
173The dollar sign is special when used as an anchor.
174.El
175.Ss Periods in BREs
176A period
177.Pq Qq \&. ,
178when used outside a bracket expression, is a BRE that matches any character in
179the supported character set except NUL.
180.Ss RE Bracket Expression
181A bracket expression
182.Po an expression enclosed in square brackets,
183.Qq []
184.Pc
185is an RE that matches a single collating element contained in the non-empty set
186of collating elements represented by the bracket expression.
187.Pp
188The following rules and definitions apply to bracket expressions:
189.Bl -enum
190.It
191A
192.Em bracket expression
193is either a matching list expression or a non-matching list expression.
194It consists of one or more expressions: collating elements, collating symbols,
195equivalence classes, character classes, or range expressions
196.Pq see rule 7 below .
197Portable applications must not use range expressions, even though all
198implementations support them.
199The right-bracket
200.Pq Qq \&]
201loses its special meaning and represents itself in a bracket expression if it
202occurs first in the list
203.Po after an initial circumflex
204.Pq Qq ^ ,
205if any
206.Pc .
207Otherwise, it terminates the bracket expression, unless it appears in a
208collating symbol
209.Po such as
210.Qq [.].]
211.Pc
212or is the ending right-bracket for a collating symbol, equivalence class, or
213character class.
214.Pp
215The special characters
216.Qq \&. ,
217.Qq * ,
218.Qq \&[ ,
219.Qq \&\e
220.Pq period, asterisk, left-bracket and backslash, respectively
221lose their special meaning within a bracket expression.
222.Pp
223The character sequences
224.Qq [. ,
225.Qq [= ,
226.Qq [:
227.Pq left-bracket followed by a period, equals-sign, or colon
228are special inside a bracket expression and are used to delimit collating
229symbols, equivalence class expressions, and character class expressions.
230These symbols must be followed by a valid expression and the matching
231terminating sequence
232.Qq .] ,
233.Qq =]
234or
235.Qq :] ,
236as described in the following items.
237.It
238A
239.Em matching list expression
240specifies a list that matches any one of the expressions represented in the
241list.
242The first character in the list must not be the circumflex.
243For example,
244.Qq [abc]
245is an RE that matches any of the characters
246.Qq a ,
247.Qq b
248or
249.Qq c .
250.It
251A
252.Em non-matching list expression
253begins with a circumflex
254.Pq Qq ^ ,
255and specifies a list that matches any character or collating element except for
256the expressions represented in the list after the leading circumflex.
257For example,
258.Qq [^abc]
259is an RE that matches any character or collating element except the characters
260.Qq a ,
261.Qq b ,
262or
263.Qq c .
264The circumflex will have this special meaning only when it occurs first in the
265list, immediately following the left-bracket.
266.It
267A
268.Em collating symbol
269is a collating element enclosed within bracket-period
270.Pq Qq [..]
271delimiters.
272Multi-character collating elements must be represented as collating symbols when
273it is necessary to distinguish them from a list of the individual characters
274that make up the multi-character collating element.
275For example, if the string
276.Qq ch
277is a collating element in the current collation sequence with the associated
278collating symbol
279.Qq Aq ch ,
280the expression
281.Qq [[.ch.]]
282will be treated as an RE matching the character sequence
283.Qq ch ,
284while
285.Qq [ch]
286will be treated as an RE matching
287.Qq c
288or
289.Qq h .
290Collating symbols will be recognized only inside bracket expressions.
291This implies that the RE
292.Qq [[.ch.]]*c
293matches the first to fifth character in the string
294.Qq chchch.
295If the string is not a collating element in the current collating sequence
296definition, or if the collating element has no characters associated with it,
297the symbol will be treated as an invalid expression.
298.It
299An
300.Em equivalence class expression
301represents the set of collating elements belonging to an equivalence class.
302Only primary equivalence classes will be recognised.
303The class is expressed by enclosing any one of the collating elements in the
304equivalence class within bracket-equal
305.Pq Qq [==]
306delimiters.
307For example, if
308.Qq a
309and
310.Qq b
311belong to the same equivalence class, then
312.Qq [[=a=]b] ,
313.Qq [[==]a]
314and
315.Qq [[==]b]
316will each be equivalent to
317.Qq [ab] .
318If the collating element does not belong to an equivalence class, the
319equivalence class expression will be treated as a
320.Em collating symbol .
321.It
322A
323.Em character class expression
324represents the set of characters belonging to a character class, as defined in
325the
326.Ev LC_CTYPE
327category in the current locale.
328All character classes specified in the current locale will be recognized.
329A character class expression is expressed as a character class name enclosed
330within bracket-colon
331.Pq Qq [::]
332delimiters.
333.Pp
334The following character class expressions are supported in all locales:
335.Bl -column "[:alnum:]" "[:cntrl:]" "[:lower:]" "[:xdigit:]"
336.It [:alnum:] Ta [:cntrl:] Ta [:lower:] Ta [:space:]
337.It [:alpha:] Ta [:digit:] Ta [:print:] Ta [:upper:]
338.It [:blank:] Ta [:graph:] Ta [:punct:] Ta [:xdigit:]
339.El
340.Pp
341In addition, character class expressions of the form
342.Qq [:name:]
343are recognized in those locales where the
344.Em name
345keyword has been given a
346.Em charclass
347definition in the
348.Ev LC_CTYPE
349category.
350.It
351A
352.Em range expression
353represents the set of collating elements that fall between two elements in the
354current collation sequence, inclusively.
355It is expressed as the starting point and the ending point separated by a hyphen
356.Pq Qq - .
357.Pp
358Range expressions must not be used in portable applications because their
359behavior is dependent on the collating sequence.
360Ranges will be treated according to the current collating sequence, and include
361such characters that fall within the range based on that collating sequence,
362regardless of character values.
363This, however, means that the interpretation will differ depending on collating
364sequence.
365If, for instance, one collating sequence defines as a variant of
366.Qq a ,
367while another defines it as a letter following
368.Qq z ,
369then the expression
370.Qq [-z]
371is valid in the first language and invalid in the second.
372.sp
373In the following, all examples assume the collation sequence specified for the
374POSIX locale, unless another collation sequence is specifically defined.
375.Pp
376The starting range point and the ending range point must be a collating element
377or collating symbol.
378An equivalence class expression used as a starting or ending point of a range
379expression produces unspecified results.
380An equivalence class can be used portably within a bracket expression, but only
381outside the range.
382For example, the unspecified expression
383.Qq [[=e=]-f]
384should be given as
385.Qq [[=e=]e-f] .
386The ending range point must collate equal to or higher than the starting range
387point; otherwise, the expression will be treated as invalid.
388The order used is the order in which the collating elements are specified in the
389current collation definition.
390One-to-many mappings
391.Po see
392.Xr locale 7
393.Pc
394will not be performed.
395For example, assuming that the character
396.Qq eszet
397is placed in the collation sequence after
398.Qq r
399and
400.Qq s ,
401but before
402.Qq t ,
403and that it maps to the sequence
404.Qq ss
405for collation purposes, then the expression
406.Qq [r-s]
407matches only
408.Qq r
409and
410.Qq s ,
411but the expression
412.Qq [s-t]
413matches
414.Qq s ,
415.Qq beta ,
416or
417.Qq t .
418.Pp
419The interpretation of range expressions where the ending range point is also
420the starting range point of a subsequent range expression
421.Po for instance
422.Qq [a-m-o]
423.Pc
424is undefined.
425.Pp
426The hyphen character will be treated as itself if it occurs first
427.Po after an initial
428.Qq ^ ,
429if any
430.Pc
431or last in the list, or as an ending range point in a range expression.
432As examples, the expressions
433.Qq [-ac]
434and
435.Qq [ac-]
436are equivalent and match any of the characters
437.Qq a ,
438.Qq c ,
439or
440.Qq -;
441.Qq [^-ac]
442and
443.Qq [^ac-]
444are equivalent and match any characters except
445.Qq a ,
446.Qq c ,
447or
448.Qq -;
449the expression
450.Qq [%--]
451matches any of the characters between
452.Qq %
453and
454.Qq -
455inclusive; the expression
456.Qq [--@]
457matches any of the characters between
458.Qq -
459and
460.Qq @
461inclusive; and the expression
462.Qq [a--@]
463is invalid, because the letter
464.Qq a
465follows the symbol
466.Qq -
467in the POSIX locale.
468To use a hyphen as the starting range point, it must either come first in the
469bracket expression or be specified as a collating symbol, for example:
470.Qq [][.-.]-0] ,
471which matches either a right bracket or any character or collating element that
472collates between hyphen and 0, inclusive.
473.Pp
474If a bracket expression must specify both
475.Qq -
476and
477.Qq \&] ,
478the
479.Qq \&]
480must be placed first
481.Po after the
482.Qq ^ ,
483if any
484.Pc
485and the
486.Qq -
487last within the bracket expression.
488.El
489.Pp
490Note: Latin-1 characters such as
491.Qq \(ga
492or
493.Qq ^
494are not printable in some locales, for example, the
495.Em ja
496locale.
497.Ss BREs Matching Multiple Characters
498The following rules can be used to construct BREs matching multiple characters
499from BREs matching a single character:
500.Bl -enum
501.It
502The concatenation of BREs matches the concatenation of the strings matched
503by each component of the BRE.
504.It
505A
506.Em subexpression
507can be defined within a BRE by enclosing it between the character pairs
508.Qq \e(
509and
510.Qq \e) .
511Such a subexpression matches whatever it would have matched without the
512.Qq \e(
513and
514.Qq \e) ,
515except that anchoring within subexpressions is optional behavior; see
516.Sx BRE Expression Anchoring ,
517below.
518Subexpressions can be arbitrarily nested.
519.It
520The
521.Em back-reference
522expression
523.Qq \e Ns Em n
524matches the same
525.Pq possibly empty
526string of characters as was matched by a subexpression enclosed between
527.Qq \e(
528and
529.Qq \e)
530preceding the
531.Qq \e Ns Em n .
532The character
533.Qq Em n
534must be a digit from 1 to 9 inclusive,
535.Em n Ns th
536subexpression
537.Po the one that begins with the
538.Em n Ns th
539.Qq \e(
540and ends with the corresponding paired
541.Qq \e)
542.Pc .
543The expression is invalid if less than
544.Em n
545subexpressions precede the
546.Qq \e Ns Em n .
547For example, the expression
548.Qq ^\e(.*\e)\e1$
549matches a line consisting of two adjacent appearances of the same string, and
550the expression
551.Qq \e(a\e)*\e1
552fails to match
553.Qq a .
554The limit of nine back-references to subexpressions in the RE is based on the
555use of a single digit identifier.
556This does not imply that only nine subexpressions are allowed in REs.
557.It
558When a BRE matching a single character, a subexpression or a back-reference is
559followed by the special character asterisk
560.Pq Qq * ,
561together with that asterisk it matches what zero or more consecutive occurrences
562of the BRE would match.
563For example,
564.Qq [ab]*
565and
566.Qq [ab][ab]
567are equivalent when matching the string
568.Qq ab .
569.It
570When a BRE matching a single character, a subexpression, or a back-reference
571is followed by an
572.Em interval expression
573of the format
574.Qq \e{ Ns Em m Ns \e} ,
575.Qq \e{ Ns Em m Ns ,\e}
576or
577.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} ,
578together with that interval expression it matches what repeated consecutive
579occurrences of the BRE would match.
580The values of
581.Em m
582and
583.Em n
584will be decimal integers in the range 0 <=
585.Em m
586<=
587.Em n
588<=
589.Dv BRE_DUP_MAX ,
590where
591.Em m
592specifies the exact or minimum number of occurrences and
593.Em n
594specifies the maximum number of occurrences.
595The expression
596.Qq \e{ Ns Em m Ns \e}
597matches exactly
598.Em m
599occurrences of the preceding BRE,
600.Qq \e{ Ns Em m Ns ,\e}
601matches at least
602.Em m
603occurrences and
604.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e}
605matches any number of occurrences between
606.Em m
607and
608.Em n ,
609inclusive.
610.Pp
611For example, in the string
612.Qq abababccccccd ,
613the BRE
614.Qq c\e{3\e}
615is matched by characters seven to nine, the BRE
616.Qq \e(ab\e)\e{4,\e}
617is not matched at all and the BRE
618.Qq c\e{1,3\e}d
619is matched by characters ten to thirteen.
620.El
621.Pp
622The behavior of multiple adjacent duplication symbols
623.Po Qq *
624and intervals
625.Pc
626produces undefined results.
627.Ss BRE Precedence
628The order of precedence is as shown in the following table:
629.Bl -column "BRE Precedence (from high to low)" ""
630.It Sy BRE Precedence (from high to low) Ta
631.It collation-related bracket symbols Ta [= =]  [: :]  [. .]
632.It escaped characters Ta \e< Ns Em special character Ns >
633.It bracket expression Ta [ ]
634.It subexpressions/back-references Ta \e( \e) \e Ns Em n
635.It single-character-BRE duplication Ta * \e{ Ns Em m Ns \&, Ns Em n Ns \e}
636.It concatenation Ta
637.It anchoring Ta ^ $
638.El
639.Ss BRE Expression Anchoring
640A BRE can be limited to matching strings that begin or end a line; this is
641called
642.Em anchoring .
643The circumflex and dollar sign special characters will be considered BRE anchors
644in the following contexts:
645.Bl -enum
646.It
647A circumflex
648.Pq Qq ^
649is an anchor when used as the first character of an entire BRE.
650The implementation may treat circumflex as an anchor when used as the first
651character of a subexpression.
652The circumflex will anchor the expression to the beginning of a string;
653only sequences starting at the first character of a string will be matched by
654the BRE.
655For example, the BRE
656.Qq ^ab
657matches
658.Qq ab
659in the string
660.Qq abcdef ,
661but fails to match in the string
662.Qq cdefab .
663A portable BRE must escape a leading circumflex in a subexpression to match a
664literal circumflex.
665.It
666A dollar sign
667.Pq Qq $
668is an anchor when used as the last character of an entire BRE.
669The implementation may treat a dollar sign as an anchor when used as the last
670character of a subexpression.
671The dollar sign will anchor the expression to the end of the string being
672matched; the dollar sign can be said to match the end-of-string following the
673last character.
674.It
675A BRE anchored by both
676.Qq ^
677and
678.Qq $
679matches only an entire string.
680For example, the BRE
681^abcdef$
682matches strings consisting only of
683.Qq abcdef .
684.It
685.Qq ^
686and
687.Qq $
688are not special in subexpressions.
689.El
690.Pp
691Note: The Solaris implementation does not support anchoring in BRE
692subexpressions.
693.Sh EXTENDED REGULAR EXPRESSIONS
694The rules specified for BREs apply to Extended Regular Expressions
695.Pq EREs
696with the following exceptions:
697.Bl -bullet
698.It
699The characters
700.Qq | ,
701.Qq + ,
702and
703.Qq \&?
704have special meaning, as defined below.
705.It
706The
707.Qq {
708and
709.Qq }
710characters, when used as the duplication operator, are not preceded by
711backslashes.
712The constructs
713.Qq \e{
714and
715.Qq \e}
716simply match the characters
717.Qq {
718and
719.Qq }, respectively.
720.It
721The back reference operator is not supported.
722.It
723Anchoring
724.Pq Qq ^$
725is supported in subexpressions.
726.El
727.Ss EREs Matching a Single Character
728An ERE ordinary character, a special character preceded by a backslash, or a
729period matches a single character.
730A bracket expression matches a single character or a single collating element.
731An
732.Em ERE matching a single character
733enclosed in parentheses matches the same as the ERE without parentheses would
734have matched.
735.Ss ERE Ordinary Characters
736An
737.Em ordinary character
738is an ERE that matches itself.
739An ordinary character is any character in the supported character set, except
740for the ERE special characters listed in
741.Sx ERE Special Characters
742below.
743The interpretation of an ordinary character preceded by a backslash
744.Pq Qq \&\e
745is undefined.
746.Ss ERE Special Characters
747An
748.Em ERE special character
749has special properties in certain contexts.
750Outside those contexts, or when preceded by a backslash, such a character is an
751ERE that matches the special character itself.
752The extended regular expression special characters and the contexts in which
753they have their special meaning are:
754.Bl -tag -width Ds
755.It Sy \&. \&[ \&\e \&(
756The period, left-bracket, backslash, and left-parenthesis are special except
757when used in a bracket expression
758.Po see
759.Sx RE Bracket Expression ,
760above
761.Pc .
762Outside a bracket expression, a left-parenthesis immediately followed by a
763right-parenthesis produces undefined results.
764.It Sy \&)
765The right-parenthesis is special when matched with a preceding
766left-parenthesis, both outside a bracket expression.
767.It Sy * + \&? {
768The asterisk, plus-sign, question-mark, and left-brace are special except when
769used in a bracket expression
770.Po see
771.Sx RE Bracket Expression ,
772above
773.Pc .
774Any of the following uses produce undefined results:
775.Bl -bullet
776.It
777if these characters appear first in an ERE, or immediately following a
778vertical-line, circumflex or left-parenthesis
779.It
780if a left-brace is not part of a valid interval expression.
781.El
782.It Sy \&|
783The vertical-line is special except when used in a bracket expression
784.Po see
785.Sx RE Bracket Expression ,
786above
787.Pc .
788A vertical-line appearing first or last in an ERE, or immediately following a
789vertical-line or a left-parenthesis, or immediately preceding a
790right-parenthesis, produces undefined results.
791.It Sy ^
792The circumflex is special when used:
793.Bl -bullet
794.It
795as an anchor
796.Po see
797.Sx ERE Expression Anchoring ,
798below
799.Pc .
800.It
801as the first character of a bracket expression
802.Po see
803.Sx RE Bracket Expression ,
804above
805.Pc .
806.El
807.It Sy $
808The dollar sign is special when used as an anchor.
809.El
810.Ss Periods in EREs
811A period
812.Pq Qq \&. ,
813when used outside a bracket expression, is an ERE that matches any character in
814the supported character set except NUL.
815.Ss ERE Bracket Expression
816The rules for ERE Bracket Expressions are the same as for Basic Regular
817Expressions; see
818.Sx RE Bracket Expression ,
819above.
820.Ss EREs Matching Multiple Characters
821The following rules will be used to construct EREs matching multiple characters
822from EREs matching a single character:
823.Bl -enum
824.It
825A
826.Em concatenation of EREs
827matches the concatenation of the character sequences matched by each component
828of the ERE.
829A concatenation of EREs enclosed in parentheses matches whatever the
830concatenation without the parentheses matches.
831For example, both the ERE
832.Qq cd
833and the ERE
834.Qq (cd)
835are matched by the third and fourth character of the string
836.Qq abcdefabcdef .
837.It
838When an ERE matching a single character or an ERE enclosed in parentheses is
839followed by the special character plus-sign
840.Pq Qq + ,
841together with that plus-sign it matches what one or more consecutive occurrences
842of the ERE would match.
843For example, the ERE
844.Qq b+(bc)
845matches the fourth to seventh characters in the string
846.Qq acabbbcde ;
847.Qq [ab]+
848and
849.Qq [ab][ab]*
850are equivalent.
851.It
852When an ERE matching a single character or an ERE enclosed in parentheses is
853followed by the special character asterisk
854.Pq Qq * ,
855together with that asterisk it matches what zero or more consecutive occurrences
856of the ERE would match.
857For example, the ERE
858.Qq b*c
859matches the first character in the string
860.Qq cabbbcde ,
861and the ERE
862.Qq b*cd
863matches the third to seventh characters in the string
864.Qq cabbbcdebbbbbbcdbc .
865And,
866.Qq [ab]*
867and
868.Qq [ab][ab]
869are equivalent when matching the string
870.Qq ab .
871.It
872When an ERE matching a single character or an ERE enclosed in parentheses is
873followed by the special character question-mark
874.Pq Qq \&? ,
875together with that question-mark it matches what zero or one consecutive
876occurrences of the ERE would match.
877For example, the ERE
878.Qq b?c
879matches the second character in the string
880.Qq acabbbcde .
881.It
882When an ERE matching a single character or an ERE enclosed in parentheses is
883followed by an
884.Em interval expression
885of the format
886.Qq { Ns Em m Ns } ,
887.Qq { Ns Em m Ns ,}
888or
889.Qq { Ns Em m Ns \&, Ns Em n Ns } ,
890together with that interval expression it matches what repeated consecutive
891occurrences of the ERE would match.
892The values of
893.Em m
894and
895.Em n
896will be decimal integers in the range 0 <=
897.Em m
898<=
899.Em n
900<=
901.Dv RE_DUP_MAX ,
902where
903.Em m
904specifies the exact or minimum number of occurrences and
905.Em n
906specifies the maximum number of occurrences.
907The expression
908.Qq { Ns Em m Ns }
909matches exactly
910.Em m
911occurrences of the preceding ERE,
912.Qq { Ns Em m Ns ,}
913matches at least
914.Em m
915occurrences and
916.Qq { Ns m Ns \&, Ns Em n Ns }
917matches any number of occurrences between
918.Em m
919and
920.Em n ,
921inclusive.
922.El
923.Pp
924For example, in the string
925.Qq abababccccccd
926the ERE
927.Qq c{3}
928is matched by characters seven to nine and the ERE
929.Qq (ab){2,}
930is matched by characters one to six.
931.Pp
932The behavior of multiple adjacent duplication symbols
933.Po
934.Qq + ,
935.Qq * ,
936.Qq \&?
937and intervals
938.Pc
939produces undefined results.
940.Ss ERE Alternation
941Two EREs separated by the special character vertical-line
942.Pq Qq |
943match a string that is matched by either.
944For example, the ERE
945.Qq a((bc)|d)
946matches the string
947.Qq abc
948and the string
949.Qq ad .
950Single characters, or expressions matching single characters, separated by the
951vertical bar and enclosed in parentheses, will be treated as an ERE matching a
952single character.
953.Ss ERE Precedence
954The order of precedence will be as shown in the following table:
955.Bl -column "ERE Precedence (from high to low)" ""
956.It Sy ERE Precedence (from high to low) Ta
957.It collation-related bracket symbols Ta [= =]  [: :]  [. .]
958.It escaped characters Ta \e< Ns Em special character Ns >
959.It bracket expression Ta \&[ \&]
960.It grouping Ta \&( \&)
961.It single-character-ERE duplication Ta * + \&? { Ns Em m Ns \&, Ns Em n Ns}
962.It concatenation Ta
963.It anchoring Ta ^  $
964.It alternation Ta |
965.El
966.Pp
967For example, the ERE
968.Qq abba|cde
969matches either the string
970.Qq abba
971or the string
972.Qq cde
973.Po rather than the string
974.Qq abbade
975or
976.Qq abbcde ,
977because concatenation has a higher order of precedence than alternation
978.Pc .
979.Ss ERE Expression Anchoring
980An ERE can be limited to matching strings that begin or end a line; this is
981called
982.Em anchoring .
983The circumflex and dollar sign special characters are considered ERE anchors
984when used anywhere outside a bracket expression.
985This has the following effects:
986.Bl -enum
987.It
988A circumflex
989.Pq Qq ^
990outside a bracket expression anchors the expression or subexpression it begins
991to the beginning of a string; such an expression or subexpression can match only
992a sequence starting at the first character of a string.
993For example, the EREs
994.Qq ^ab
995and
996.Qq (^ab)
997match
998.Qq ab
999in the string
1000.Qq abcdef ,
1001but fail to match in the string
1002.Qq cdefab ,
1003and the ERE
1004.Qq a^b
1005is valid, but can never match because the
1006.Qq a
1007prevents the expression
1008.Qq ^b
1009from matching starting at the first character.
1010.It
1011A dollar sign
1012.Pq Qq $
1013outside a bracket expression anchors the expression or subexpression it ends to
1014the end of a string; such an expression or subexpression can match only a
1015sequence ending at the last character of a string.
1016For example, the EREs
1017.Qq ef$
1018and
1019.Qq (ef$)
1020match
1021.Qq ef
1022in the string
1023.Qq abcdef ,
1024but fail to match in the string
1025.Qq cdefab ,
1026and the ERE
1027.Qq e$f
1028is valid, but can never match because the
1029.Qq f
1030prevents the expression
1031.Qq e$
1032from matching ending at the last character.
1033.El
1034.Sh SEE ALSO
1035.Xr localedef 1 ,
1036.Xr regcomp 3C ,
1037.Xr attributes 7 ,
1038.Xr environ 7 ,
1039.Xr locale 7 ,
1040.Xr regexp 7
1041