1.\" 2.\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for 3.\" permission to reproduce portions of its copyrighted documentation. 4.\" Original documentation from The Open Group can be obtained online at 5.\" http://www.opengroup.org/bookstore/. 6.\" 7.\" The Institute of Electrical and Electronics Engineers and The Open 8.\" Group, have given us permission to reprint portions of their 9.\" documentation. 10.\" 11.\" In the following statement, the phrase ``this text'' refers to portions 12.\" of the system documentation. 13.\" 14.\" Portions of this text are reprinted and reproduced in electronic form 15.\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition, 16.\" Standard for Information Technology -- Portable Operating System 17.\" Interface (POSIX), The Open Group Base Specifications Issue 6, 18.\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics 19.\" Engineers, Inc and The Open Group. In the event of any discrepancy 20.\" between these versions and the original IEEE and The Open Group 21.\" Standard, the original IEEE and The Open Group Standard is the referee 22.\" document. The original Standard can be obtained online at 23.\" http://www.opengroup.org/unix/online.html. 24.\" 25.\" This notice shall appear on any product containing this material. 26.\" 27.\" The contents of this file are subject to the terms of the 28.\" Common Development and Distribution License (the "License"). 29.\" You may not use this file except in compliance with the License. 30.\" 31.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 32.\" or http://www.opensolaris.org/os/licensing. 33.\" See the License for the specific language governing permissions 34.\" and limitations under the License. 35.\" 36.\" When distributing Covered Code, include this CDDL HEADER in each 37.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 38.\" If applicable, add the following below this CDDL HEADER, with the 39.\" fields enclosed by brackets "[]" replaced with your own identifying 40.\" information: Portions Copyright [yyyy] [name of copyright owner] 41.\" 42.\" 43.\" Copyright (c) 1992, X/Open Company Limited All Rights Reserved 44.\" Portions Copyright (c) 1999, Sun Microsystems, Inc. All Rights Reserved 45.\" Copyright 2017 Nexenta Systems, Inc. 46.\" 47.Dd August 14, 2020 48.Dt REGEX 7 49.Os 50.Sh NAME 51.Nm regex 52.Nd internationalized basic and extended regular expression matching 53.Sh DESCRIPTION 54Regular Expressions 55.Pq REs 56provide a mechanism to select specific strings from a set of character strings. 57The Internationalized Regular Expressions described below differ from the Simple 58Regular Expressions described on the 59.Xr regexp 7 60manual page in the following ways: 61.Bl -bullet 62.It 63both Basic and Extended Regular Expressions are supported 64.It 65the Internationalization features -- character class, equivalence class, and 66multi-character collation -- are supported. 67.El 68.Pp 69The Basic Regular Expression 70.Pq BRE 71notation and construction rules described in the 72.Sx BASIC REGULAR EXPRESSIONS 73section apply to most utilities supporting regular expressions. 74Some utilities, instead, support the Extended Regular Expressions 75.Pq ERE 76described in the 77.Sx EXTENDED REGULAR EXPRESSIONS 78section; any exceptions for both cases are noted in the descriptions of the 79specific utilities using regular expressions. 80Both BREs and EREs are supported by the Regular Expression Matching interfaces 81.Xr regcomp 3C 82and 83.Xr regexec 3C . 84.Sh BASIC REGULAR EXPRESSIONS 85.Ss BREs Matching a Single Character 86A BRE ordinary character, a special character preceded by a backslash, or a 87period matches a single character. 88A bracket expression matches a single character or a single collating element. 89See 90.Sx RE Bracket Expression , 91below. 92.Ss BRE Ordinary Characters 93An ordinary character is a BRE that matches itself: any character in the 94supported character set, except for the BRE special characters listed in 95.Sx BRE Special Characters , 96below. 97.Pp 98The interpretation of an ordinary character preceded by a backslash 99.Pq Qq \e 100is undefined, except for: 101.Bl -enum 102.It 103the characters 104.Qq \&) , 105.Qq \&( , 106.Qq { , 107and 108.Qq } 109.It 110the digits 1 to 9 inclusive 111.Po see 112.Sx BREs Matching Multiple Characters , 113below 114.Pc 115.It 116a character inside a bracket expression. 117.El 118.Ss BRE Special Characters 119A BRE special character has special properties in certain contexts. 120Outside those contexts, or when preceded by a backslash, such a character will 121be a BRE that matches the special character itself. 122The BRE special characters and the contexts in which they have their special 123meaning are: 124.Bl -tag -width Ds 125.It Sy \&. \&[ \&\e 126The period, left-bracket, and backslash are special except when used in a 127bracket expression 128.Po see 129.Sx RE Bracket Expression , 130below 131.Pc . 132An expression containing a 133.Qq \&[ 134that is not preceded by a backslash and is not part of a bracket expression 135produces undefined results. 136.It Sy * 137The asterisk is special except when used: 138.Bl -bullet 139.It 140in a bracket expression 141.It 142as the first character of an entire BRE 143.Po after an initial 144.Qq ^ , 145if any 146.Pc 147.It 148as the first character of a subexpression 149.Po after an initial 150.Qq ^ , 151if any; see 152.Sx BREs Matching Multiple Characters , 153below 154.Pc . 155.El 156.It Sy ^ 157The circumflex is special when used: 158.Bl -bullet 159.It 160as an anchor 161.Po see 162.Sx BRE Expression Anchoring , 163below 164.Pc . 165.It 166as the first character of a bracket expression 167.Po see 168.Sx RE Bracket Expression , 169below 170.Pc . 171.El 172.It Sy $ 173The dollar sign is special when used as an anchor. 174.El 175.Ss Periods in BREs 176A period 177.Pq Qq \&. , 178when used outside a bracket expression, is a BRE that matches any character in 179the supported character set except NUL. 180.Ss RE Bracket Expression 181A bracket expression 182.Po an expression enclosed in square brackets, 183.Qq [] 184.Pc 185is an RE that matches a single collating element contained in the non-empty set 186of collating elements represented by the bracket expression. 187.Pp 188The following rules and definitions apply to bracket expressions: 189.Bl -enum 190.It 191A 192.Em bracket expression 193is either a matching list expression or a non-matching list expression. 194It consists of one or more expressions: collating elements, collating symbols, 195equivalence classes, character classes, or range expressions 196.Pq see rule 7 below . 197Portable applications must not use range expressions, even though all 198implementations support them. 199The right-bracket 200.Pq Qq \&] 201loses its special meaning and represents itself in a bracket expression if it 202occurs first in the list 203.Po after an initial circumflex 204.Pq Qq ^ , 205if any 206.Pc . 207Otherwise, it terminates the bracket expression, unless it appears in a 208collating symbol 209.Po such as 210.Qq [.].] 211.Pc 212or is the ending right-bracket for a collating symbol, equivalence class, or 213character class. 214.Pp 215The special characters 216.Qq \&. , 217.Qq * , 218.Qq \&[ , 219.Qq \&\e 220.Pq period, asterisk, left-bracket and backslash, respectively 221lose their special meaning within a bracket expression. 222.Pp 223The character sequences 224.Qq [. , 225.Qq [= , 226.Qq [: 227.Pq left-bracket followed by a period, equals-sign, or colon 228are special inside a bracket expression and are used to delimit collating 229symbols, equivalence class expressions, and character class expressions. 230These symbols must be followed by a valid expression and the matching 231terminating sequence 232.Qq .] , 233.Qq =] 234or 235.Qq :] , 236as described in the following items. 237.It 238A 239.Em matching list expression 240specifies a list that matches any one of the expressions represented in the 241list. 242The first character in the list must not be the circumflex. 243For example, 244.Qq [abc] 245is an RE that matches any of the characters 246.Qq a , 247.Qq b 248or 249.Qq c . 250.It 251A 252.Em non-matching list expression 253begins with a circumflex 254.Pq Qq ^ , 255and specifies a list that matches any character or collating element except for 256the expressions represented in the list after the leading circumflex. 257For example, 258.Qq [^abc] 259is an RE that matches any character or collating element except the characters 260.Qq a , 261.Qq b , 262or 263.Qq c . 264The circumflex will have this special meaning only when it occurs first in the 265list, immediately following the left-bracket. 266.It 267A 268.Em collating symbol 269is a collating element enclosed within bracket-period 270.Pq Qq [..] 271delimiters. 272Multi-character collating elements must be represented as collating symbols when 273it is necessary to distinguish them from a list of the individual characters 274that make up the multi-character collating element. 275For example, if the string 276.Qq ch 277is a collating element in the current collation sequence with the associated 278collating symbol 279.Qq Aq ch , 280the expression 281.Qq [[.ch.]] 282will be treated as an RE matching the character sequence 283.Qq ch , 284while 285.Qq [ch] 286will be treated as an RE matching 287.Qq c 288or 289.Qq h . 290Collating symbols will be recognized only inside bracket expressions. 291This implies that the RE 292.Qq [[.ch.]]*c 293matches the first to fifth character in the string 294.Qq chchch. 295If the string is not a collating element in the current collating sequence 296definition, or if the collating element has no characters associated with it, 297the symbol will be treated as an invalid expression. 298.It 299An 300.Em equivalence class expression 301represents the set of collating elements belonging to an equivalence class. 302Only primary equivalence classes will be recognised. 303The class is expressed by enclosing any one of the collating elements in the 304equivalence class within bracket-equal 305.Pq Qq [==] 306delimiters. 307For example, if 308.Qq a 309and 310.Qq b 311belong to the same equivalence class, then 312.Qq [[=a=]b] , 313.Qq [[==]a] 314and 315.Qq [[==]b] 316will each be equivalent to 317.Qq [ab] . 318If the collating element does not belong to an equivalence class, the 319equivalence class expression will be treated as a 320.Em collating symbol . 321.It 322A 323.Em character class expression 324represents the set of characters belonging to a character class, as defined in 325the 326.Ev LC_CTYPE 327category in the current locale. 328All character classes specified in the current locale will be recognized. 329A character class expression is expressed as a character class name enclosed 330within bracket-colon 331.Pq Qq [::] 332delimiters. 333.Pp 334The following character class expressions are supported in all locales: 335.Bl -column "[:alnum:]" "[:cntrl:]" "[:lower:]" "[:xdigit:]" 336.It [:alnum:] Ta [:cntrl:] Ta [:lower:] Ta [:space:] 337.It [:alpha:] Ta [:digit:] Ta [:print:] Ta [:upper:] 338.It [:blank:] Ta [:graph:] Ta [:punct:] Ta [:xdigit:] 339.El 340.Pp 341In addition, character class expressions of the form 342.Qq [:name:] 343are recognized in those locales where the 344.Em name 345keyword has been given a 346.Em charclass 347definition in the 348.Ev LC_CTYPE 349category. 350.It 351A 352.Em range expression 353represents the set of collating elements that fall between two elements in the 354current collation sequence, inclusively. 355It is expressed as the starting point and the ending point separated by a hyphen 356.Pq Qq - . 357.Pp 358Range expressions must not be used in portable applications because their 359behavior is dependent on the collating sequence. 360Ranges will be treated according to the current collating sequence, and include 361such characters that fall within the range based on that collating sequence, 362regardless of character values. 363This, however, means that the interpretation will differ depending on collating 364sequence. 365If, for instance, one collating sequence defines as a variant of 366.Qq a , 367while another defines it as a letter following 368.Qq z , 369then the expression 370.Qq [-z] 371is valid in the first language and invalid in the second. 372.sp 373In the following, all examples assume the collation sequence specified for the 374POSIX locale, unless another collation sequence is specifically defined. 375.Pp 376The starting range point and the ending range point must be a collating element 377or collating symbol. 378An equivalence class expression used as a starting or ending point of a range 379expression produces unspecified results. 380An equivalence class can be used portably within a bracket expression, but only 381outside the range. 382For example, the unspecified expression 383.Qq [[=e=]-f] 384should be given as 385.Qq [[=e=]e-f] . 386The ending range point must collate equal to or higher than the starting range 387point; otherwise, the expression will be treated as invalid. 388The order used is the order in which the collating elements are specified in the 389current collation definition. 390One-to-many mappings 391.Po see 392.Xr locale 7 393.Pc 394will not be performed. 395For example, assuming that the character 396.Qq eszet 397is placed in the collation sequence after 398.Qq r 399and 400.Qq s , 401but before 402.Qq t , 403and that it maps to the sequence 404.Qq ss 405for collation purposes, then the expression 406.Qq [r-s] 407matches only 408.Qq r 409and 410.Qq s , 411but the expression 412.Qq [s-t] 413matches 414.Qq s , 415.Qq beta , 416or 417.Qq t . 418.Pp 419The interpretation of range expressions where the ending range point is also 420the starting range point of a subsequent range expression 421.Po for instance 422.Qq [a-m-o] 423.Pc 424is undefined. 425.Pp 426The hyphen character will be treated as itself if it occurs first 427.Po after an initial 428.Qq ^ , 429if any 430.Pc 431or last in the list, or as an ending range point in a range expression. 432As examples, the expressions 433.Qq [-ac] 434and 435.Qq [ac-] 436are equivalent and match any of the characters 437.Qq a , 438.Qq c , 439or 440.Qq -; 441.Qq [^-ac] 442and 443.Qq [^ac-] 444are equivalent and match any characters except 445.Qq a , 446.Qq c , 447or 448.Qq -; 449the expression 450.Qq [%--] 451matches any of the characters between 452.Qq % 453and 454.Qq - 455inclusive; the expression 456.Qq [--@] 457matches any of the characters between 458.Qq - 459and 460.Qq @ 461inclusive; and the expression 462.Qq [a--@] 463is invalid, because the letter 464.Qq a 465follows the symbol 466.Qq - 467in the POSIX locale. 468To use a hyphen as the starting range point, it must either come first in the 469bracket expression or be specified as a collating symbol, for example: 470.Qq [][.-.]-0] , 471which matches either a right bracket or any character or collating element that 472collates between hyphen and 0, inclusive. 473.Pp 474If a bracket expression must specify both 475.Qq - 476and 477.Qq \&] , 478the 479.Qq \&] 480must be placed first 481.Po after the 482.Qq ^ , 483if any 484.Pc 485and the 486.Qq - 487last within the bracket expression. 488.El 489.Pp 490Note: Latin-1 characters such as 491.Qq \(ga 492or 493.Qq ^ 494are not printable in some locales, for example, the 495.Em ja 496locale. 497.Ss BREs Matching Multiple Characters 498The following rules can be used to construct BREs matching multiple characters 499from BREs matching a single character: 500.Bl -enum 501.It 502The concatenation of BREs matches the concatenation of the strings matched 503by each component of the BRE. 504.It 505A 506.Em subexpression 507can be defined within a BRE by enclosing it between the character pairs 508.Qq \e( 509and 510.Qq \e) . 511Such a subexpression matches whatever it would have matched without the 512.Qq \e( 513and 514.Qq \e) , 515except that anchoring within subexpressions is optional behavior; see 516.Sx BRE Expression Anchoring , 517below. 518Subexpressions can be arbitrarily nested. 519.It 520The 521.Em back-reference 522expression 523.Qq \e Ns Em n 524matches the same 525.Pq possibly empty 526string of characters as was matched by a subexpression enclosed between 527.Qq \e( 528and 529.Qq \e) 530preceding the 531.Qq \e Ns Em n . 532The character 533.Qq Em n 534must be a digit from 1 to 9 inclusive, 535.Em n Ns th 536subexpression 537.Po the one that begins with the 538.Em n Ns th 539.Qq \e( 540and ends with the corresponding paired 541.Qq \e) 542.Pc . 543The expression is invalid if less than 544.Em n 545subexpressions precede the 546.Qq \e Ns Em n . 547For example, the expression 548.Qq ^\e(.*\e)\e1$ 549matches a line consisting of two adjacent appearances of the same string, and 550the expression 551.Qq \e(a\e)*\e1 552fails to match 553.Qq a . 554The limit of nine back-references to subexpressions in the RE is based on the 555use of a single digit identifier. 556This does not imply that only nine subexpressions are allowed in REs. 557.It 558When a BRE matching a single character, a subexpression or a back-reference is 559followed by the special character asterisk 560.Pq Qq * , 561together with that asterisk it matches what zero or more consecutive occurrences 562of the BRE would match. 563For example, 564.Qq [ab]* 565and 566.Qq [ab][ab] 567are equivalent when matching the string 568.Qq ab . 569.It 570When a BRE matching a single character, a subexpression, or a back-reference 571is followed by an 572.Em interval expression 573of the format 574.Qq \e{ Ns Em m Ns \e} , 575.Qq \e{ Ns Em m Ns ,\e} 576or 577.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} , 578together with that interval expression it matches what repeated consecutive 579occurrences of the BRE would match. 580The values of 581.Em m 582and 583.Em n 584will be decimal integers in the range 0 <= 585.Em m 586<= 587.Em n 588<= 589.Dv BRE_DUP_MAX , 590where 591.Em m 592specifies the exact or minimum number of occurrences and 593.Em n 594specifies the maximum number of occurrences. 595The expression 596.Qq \e{ Ns Em m Ns \e} 597matches exactly 598.Em m 599occurrences of the preceding BRE, 600.Qq \e{ Ns Em m Ns ,\e} 601matches at least 602.Em m 603occurrences and 604.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} 605matches any number of occurrences between 606.Em m 607and 608.Em n , 609inclusive. 610.Pp 611For example, in the string 612.Qq abababccccccd , 613the BRE 614.Qq c\e{3\e} 615is matched by characters seven to nine, the BRE 616.Qq \e(ab\e)\e{4,\e} 617is not matched at all and the BRE 618.Qq c\e{1,3\e}d 619is matched by characters ten to thirteen. 620.El 621.Pp 622The behavior of multiple adjacent duplication symbols 623.Po Qq * 624and intervals 625.Pc 626produces undefined results. 627.Ss BRE Precedence 628The order of precedence is as shown in the following table: 629.Bl -column "BRE Precedence (from high to low)" "" 630.It Sy BRE Precedence (from high to low) Ta 631.It collation-related bracket symbols Ta [= =] [: :] [. .] 632.It escaped characters Ta \e< Ns Em special character Ns > 633.It bracket expression Ta [ ] 634.It subexpressions/back-references Ta \e( \e) \e Ns Em n 635.It single-character-BRE duplication Ta * \e{ Ns Em m Ns \&, Ns Em n Ns \e} 636.It concatenation Ta 637.It anchoring Ta ^ $ 638.El 639.Ss BRE Expression Anchoring 640A BRE can be limited to matching strings that begin or end a line; this is 641called 642.Em anchoring . 643The circumflex and dollar sign special characters will be considered BRE anchors 644in the following contexts: 645.Bl -enum 646.It 647A circumflex 648.Pq Qq ^ 649is an anchor when used as the first character of an entire BRE. 650The implementation may treat circumflex as an anchor when used as the first 651character of a subexpression. 652The circumflex will anchor the expression to the beginning of a string; 653only sequences starting at the first character of a string will be matched by 654the BRE. 655For example, the BRE 656.Qq ^ab 657matches 658.Qq ab 659in the string 660.Qq abcdef , 661but fails to match in the string 662.Qq cdefab . 663A portable BRE must escape a leading circumflex in a subexpression to match a 664literal circumflex. 665.It 666A dollar sign 667.Pq Qq $ 668is an anchor when used as the last character of an entire BRE. 669The implementation may treat a dollar sign as an anchor when used as the last 670character of a subexpression. 671The dollar sign will anchor the expression to the end of the string being 672matched; the dollar sign can be said to match the end-of-string following the 673last character. 674.It 675A BRE anchored by both 676.Qq ^ 677and 678.Qq $ 679matches only an entire string. 680For example, the BRE 681^abcdef$ 682matches strings consisting only of 683.Qq abcdef . 684.It 685.Qq ^ 686and 687.Qq $ 688are not special in subexpressions. 689.El 690.Pp 691Note: The Solaris implementation does not support anchoring in BRE 692subexpressions. 693.Sh EXTENDED REGULAR EXPRESSIONS 694The rules specified for BREs apply to Extended Regular Expressions 695.Pq EREs 696with the following exceptions: 697.Bl -bullet 698.It 699The characters 700.Qq | , 701.Qq + , 702and 703.Qq \&? 704have special meaning, as defined below. 705.It 706The 707.Qq { 708and 709.Qq } 710characters, when used as the duplication operator, are not preceded by 711backslashes. 712The constructs 713.Qq \e{ 714and 715.Qq \e} 716simply match the characters 717.Qq { 718and 719.Qq }, respectively. 720.It 721The back reference operator is not supported. 722.It 723Anchoring 724.Pq Qq ^$ 725is supported in subexpressions. 726.El 727.Ss EREs Matching a Single Character 728An ERE ordinary character, a special character preceded by a backslash, or a 729period matches a single character. 730A bracket expression matches a single character or a single collating element. 731An 732.Em ERE matching a single character 733enclosed in parentheses matches the same as the ERE without parentheses would 734have matched. 735.Ss ERE Ordinary Characters 736An 737.Em ordinary character 738is an ERE that matches itself. 739An ordinary character is any character in the supported character set, except 740for the ERE special characters listed in 741.Sx ERE Special Characters 742below. 743The interpretation of an ordinary character preceded by a backslash 744.Pq Qq \&\e 745is undefined. 746.Ss ERE Special Characters 747An 748.Em ERE special character 749has special properties in certain contexts. 750Outside those contexts, or when preceded by a backslash, such a character is an 751ERE that matches the special character itself. 752The extended regular expression special characters and the contexts in which 753they have their special meaning are: 754.Bl -tag -width Ds 755.It Sy \&. \&[ \&\e \&( 756The period, left-bracket, backslash, and left-parenthesis are special except 757when used in a bracket expression 758.Po see 759.Sx RE Bracket Expression , 760above 761.Pc . 762Outside a bracket expression, a left-parenthesis immediately followed by a 763right-parenthesis produces undefined results. 764.It Sy \&) 765The right-parenthesis is special when matched with a preceding 766left-parenthesis, both outside a bracket expression. 767.It Sy * + \&? { 768The asterisk, plus-sign, question-mark, and left-brace are special except when 769used in a bracket expression 770.Po see 771.Sx RE Bracket Expression , 772above 773.Pc . 774Any of the following uses produce undefined results: 775.Bl -bullet 776.It 777if these characters appear first in an ERE, or immediately following a 778vertical-line, circumflex or left-parenthesis 779.It 780if a left-brace is not part of a valid interval expression. 781.El 782.It Sy \&| 783The vertical-line is special except when used in a bracket expression 784.Po see 785.Sx RE Bracket Expression , 786above 787.Pc . 788A vertical-line appearing first or last in an ERE, or immediately following a 789vertical-line or a left-parenthesis, or immediately preceding a 790right-parenthesis, produces undefined results. 791.It Sy ^ 792The circumflex is special when used: 793.Bl -bullet 794.It 795as an anchor 796.Po see 797.Sx ERE Expression Anchoring , 798below 799.Pc . 800.It 801as the first character of a bracket expression 802.Po see 803.Sx RE Bracket Expression , 804above 805.Pc . 806.El 807.It Sy $ 808The dollar sign is special when used as an anchor. 809.El 810.Ss Periods in EREs 811A period 812.Pq Qq \&. , 813when used outside a bracket expression, is an ERE that matches any character in 814the supported character set except NUL. 815.Ss ERE Bracket Expression 816The rules for ERE Bracket Expressions are the same as for Basic Regular 817Expressions; see 818.Sx RE Bracket Expression , 819above. 820.Ss EREs Matching Multiple Characters 821The following rules will be used to construct EREs matching multiple characters 822from EREs matching a single character: 823.Bl -enum 824.It 825A 826.Em concatenation of EREs 827matches the concatenation of the character sequences matched by each component 828of the ERE. 829A concatenation of EREs enclosed in parentheses matches whatever the 830concatenation without the parentheses matches. 831For example, both the ERE 832.Qq cd 833and the ERE 834.Qq (cd) 835are matched by the third and fourth character of the string 836.Qq abcdefabcdef . 837.It 838When an ERE matching a single character or an ERE enclosed in parentheses is 839followed by the special character plus-sign 840.Pq Qq + , 841together with that plus-sign it matches what one or more consecutive occurrences 842of the ERE would match. 843For example, the ERE 844.Qq b+(bc) 845matches the fourth to seventh characters in the string 846.Qq acabbbcde ; 847.Qq [ab]+ 848and 849.Qq [ab][ab]* 850are equivalent. 851.It 852When an ERE matching a single character or an ERE enclosed in parentheses is 853followed by the special character asterisk 854.Pq Qq * , 855together with that asterisk it matches what zero or more consecutive occurrences 856of the ERE would match. 857For example, the ERE 858.Qq b*c 859matches the first character in the string 860.Qq cabbbcde , 861and the ERE 862.Qq b*cd 863matches the third to seventh characters in the string 864.Qq cabbbcdebbbbbbcdbc . 865And, 866.Qq [ab]* 867and 868.Qq [ab][ab] 869are equivalent when matching the string 870.Qq ab . 871.It 872When an ERE matching a single character or an ERE enclosed in parentheses is 873followed by the special character question-mark 874.Pq Qq \&? , 875together with that question-mark it matches what zero or one consecutive 876occurrences of the ERE would match. 877For example, the ERE 878.Qq b?c 879matches the second character in the string 880.Qq acabbbcde . 881.It 882When an ERE matching a single character or an ERE enclosed in parentheses is 883followed by an 884.Em interval expression 885of the format 886.Qq { Ns Em m Ns } , 887.Qq { Ns Em m Ns ,} 888or 889.Qq { Ns Em m Ns \&, Ns Em n Ns } , 890together with that interval expression it matches what repeated consecutive 891occurrences of the ERE would match. 892The values of 893.Em m 894and 895.Em n 896will be decimal integers in the range 0 <= 897.Em m 898<= 899.Em n 900<= 901.Dv RE_DUP_MAX , 902where 903.Em m 904specifies the exact or minimum number of occurrences and 905.Em n 906specifies the maximum number of occurrences. 907The expression 908.Qq { Ns Em m Ns } 909matches exactly 910.Em m 911occurrences of the preceding ERE, 912.Qq { Ns Em m Ns ,} 913matches at least 914.Em m 915occurrences and 916.Qq { Ns m Ns \&, Ns Em n Ns } 917matches any number of occurrences between 918.Em m 919and 920.Em n , 921inclusive. 922.El 923.Pp 924For example, in the string 925.Qq abababccccccd 926the ERE 927.Qq c{3} 928is matched by characters seven to nine and the ERE 929.Qq (ab){2,} 930is matched by characters one to six. 931.Pp 932The behavior of multiple adjacent duplication symbols 933.Po 934.Qq + , 935.Qq * , 936.Qq \&? 937and intervals 938.Pc 939produces undefined results. 940.Ss ERE Alternation 941Two EREs separated by the special character vertical-line 942.Pq Qq | 943match a string that is matched by either. 944For example, the ERE 945.Qq a((bc)|d) 946matches the string 947.Qq abc 948and the string 949.Qq ad . 950Single characters, or expressions matching single characters, separated by the 951vertical bar and enclosed in parentheses, will be treated as an ERE matching a 952single character. 953.Ss ERE Precedence 954The order of precedence will be as shown in the following table: 955.Bl -column "ERE Precedence (from high to low)" "" 956.It Sy ERE Precedence (from high to low) Ta 957.It collation-related bracket symbols Ta [= =] [: :] [. .] 958.It escaped characters Ta \e< Ns Em special character Ns > 959.It bracket expression Ta \&[ \&] 960.It grouping Ta \&( \&) 961.It single-character-ERE duplication Ta * + \&? { Ns Em m Ns \&, Ns Em n Ns} 962.It concatenation Ta 963.It anchoring Ta ^ $ 964.It alternation Ta | 965.El 966.Pp 967For example, the ERE 968.Qq abba|cde 969matches either the string 970.Qq abba 971or the string 972.Qq cde 973.Po rather than the string 974.Qq abbade 975or 976.Qq abbcde , 977because concatenation has a higher order of precedence than alternation 978.Pc . 979.Ss ERE Expression Anchoring 980An ERE can be limited to matching strings that begin or end a line; this is 981called 982.Em anchoring . 983The circumflex and dollar sign special characters are considered ERE anchors 984when used anywhere outside a bracket expression. 985This has the following effects: 986.Bl -enum 987.It 988A circumflex 989.Pq Qq ^ 990outside a bracket expression anchors the expression or subexpression it begins 991to the beginning of a string; such an expression or subexpression can match only 992a sequence starting at the first character of a string. 993For example, the EREs 994.Qq ^ab 995and 996.Qq (^ab) 997match 998.Qq ab 999in the string 1000.Qq abcdef , 1001but fail to match in the string 1002.Qq cdefab , 1003and the ERE 1004.Qq a^b 1005is valid, but can never match because the 1006.Qq a 1007prevents the expression 1008.Qq ^b 1009from matching starting at the first character. 1010.It 1011A dollar sign 1012.Pq Qq $ 1013outside a bracket expression anchors the expression or subexpression it ends to 1014the end of a string; such an expression or subexpression can match only a 1015sequence ending at the last character of a string. 1016For example, the EREs 1017.Qq ef$ 1018and 1019.Qq (ef$) 1020match 1021.Qq ef 1022in the string 1023.Qq abcdef , 1024but fail to match in the string 1025.Qq cdefab , 1026and the ERE 1027.Qq e$f 1028is valid, but can never match because the 1029.Qq f 1030prevents the expression 1031.Qq e$ 1032from matching ending at the last character. 1033.El 1034.Sh SEE ALSO 1035.Xr localedef 1 , 1036.Xr regcomp 3C , 1037.Xr attributes 7 , 1038.Xr environ 7 , 1039.Xr locale 7 , 1040.Xr regexp 7 1041