#
f77b5b29 |
| 10-May-2024 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Allow -DNO_STRICT_REGEX to restore historic regex behavior
Allow restoring the behavior of '{' as described in regex(3). Ie. only treat it as start of bounds if followed by a digit.
If NO_STRICT_RE
Allow -DNO_STRICT_REGEX to restore historic regex behavior
Allow restoring the behavior of '{' as described in regex(3). Ie. only treat it as start of bounds if followed by a digit.
If NO_STRICT_REGEX is not defined, the behavior introduced by commit a4a801688c909ef39cbcbc3488bc4fdbabd69d66 is retained, otherwise the previous behavior is restored.
Differential Revision: https://reviews.freebsd.org/D45134
show more ...
|
Revision tags: release/13.3.0 |
|
#
619f455b |
| 02-Feb-2024 |
Corinna Vinschen <vinschen@redhat.com> |
regex: fix freeing g->charjump in low memory condition
computejumps() moves g->charjump to a position relativ to the value of CHAR_MIN. As such, g->charjump doesn't necessarily point to the address
regex: fix freeing g->charjump in low memory condition
computejumps() moves g->charjump to a position relativ to the value of CHAR_MIN. As such, g->charjump doesn't necessarily point to the address actually allocated. While regfree() takes that into account, the low memory handling in regcomp_internal() doesn't. Fix that by free'ing the actually allocated address, as in regfree().
MFC After: 2 weeks Reviewed by: imp,jrtc27 Pull Request: https://github.com/freebsd/freebsd-src/pull/692
show more ...
|
#
8f7ed58a |
| 21-Dec-2023 |
Bill Sommerfeld <sommerfeld@hamachi.org> |
regex: mixed sets are misidentified as singletons
Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element.
T
regex: mixed sets are misidentified as singletons
Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element.
The underlying cset representation is complex; most critically it records"small" characters (codepoint less than either 128 or 256 depending on locale) in a bit vector, and "wide" characters in a secondary array.
Unfortunately the "singleton" function uses to identify singleton sets treated a cset as a singleton if either the "small" or the "wide" sets had exactly one element (it would then ignore the other set).
The easiest way to demonstrate this bug:
$ export LANG=C.UTF-8 $ echo 'a' | grep '[abà]'
It should match (and print "a") but instead it doesn't match because the single accented character in the set is misinterpreted as a singleton.
Reviewed by: kevans, yuripv Obtained from: illumos Differential Revision: https://reviews.freebsd.org/D43149
show more ...
|
#
dc36d6f9 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
lib: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
lib: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
559a218c |
| 01-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
libc: Purge unneeded cdefs.h
These sys/cdefs.h are not needed. Purge them. They are mostly left-over from the $FreeBSD$ removal. A few in libc are still required for macros that cdefs.h defines. Kee
libc: Purge unneeded cdefs.h
These sys/cdefs.h are not needed. Purge them. They are mostly left-over from the $FreeBSD$ removal. A few in libc are still required for macros that cdefs.h defines. Keep those.
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D42385
show more ...
|
#
3fb80f14 |
| 30-Aug-2023 |
Christos Zoulas <christos@NetBSD.org> |
regcomp: use unsigned char when testing for escapes
- cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNE
regcomp: use unsigned char when testing for escapes
- cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha
PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD Differential Revision: https://reviews.freebsd.org/D41947
show more ...
|
#
1d386b48 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0, release/12.4.0 |
|
#
5b5fa75a |
| 04-Aug-2022 |
Ed Maste <emaste@FreeBSD.org> |
libc: drop "All rights reserved" from Foundation copyrights
This has already been done for most files that have the Foundation as the only listed copyright holder. Do it now for files that list mul
libc: drop "All rights reserved" from Foundation copyrights
This has already been done for most files that have the Foundation as the only listed copyright holder. Do it now for files that list multiple copyright holders, but have the Foundation copyright in its own section.
Sponsored by: The FreeBSD Foundation
show more ...
|
Revision tags: release/13.1.0, release/12.3.0, release/13.0.0 |
|
#
d36b5dbe |
| 08-Jan-2021 |
Miod Vallat <miod@online.fr> |
libc: regex: rework unsafe pointer arithmetic
regcomp.c uses the "start + count < end" idiom to check that there are "count" bytes available in an array of char "start" and "end" both point to.
Thi
libc: regex: rework unsafe pointer arithmetic
regcomp.c uses the "start + count < end" idiom to check that there are "count" bytes available in an array of char "start" and "end" both point to.
This is fine, unless "start + count" goes beyond the last element of the array. In this case, pedantic interpretation of the C standard makes the comparison of such a pointer against "end" undefined, and optimizers from hell will happily remove as much code as possible because of this.
An example of this occurs in regcomp.c's bothcases(), which defines bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...
Because bothcases() and p_bracket() are static functions in regcomp.c, there is a real risk of miscompilation if aggressive inlining happens.
The following diff rewrites the "start + count < end" constructs into "end - start > count". Assuming "end" and "start" are always pointing in the array (such as "bracket[3]" above), "end - start" is well-defined and can be compared without trouble.
As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a bit.
PR: 252403
show more ...
|
#
4afa7dd6 |
| 05-Dec-2020 |
Kyle Evans <kevans@FreeBSD.org> |
libc: regex: retire internal EMPTBR ("Empty branch present")
It was realized just a little too late that this was a hack that belonged in individual regex(3)-using applications. It was surrounded in
libc: regex: retire internal EMPTBR ("Empty branch present")
It was realized just a little too late that this was a hack that belonged in individual regex(3)-using applications. It was surrounded in NOTYET and not implemented in the engine, so remove it.
show more ...
|
#
6b986646 |
| 05-Dec-2020 |
Kyle Evans <kevans@FreeBSD.org> |
libregex: implement \b and \B (word boundary, not word boundary)
This is the last of the needed GNU expressions before we can unleash bsdgrep by default. \b is effectively an agnostic equivalent of
libregex: implement \b and \B (word boundary, not word boundary)
This is the last of the needed GNU expressions before we can unleash bsdgrep by default. \b is effectively an agnostic equivalent of \< and \>, while \B will match every space that isn't making a transition from nonchar -> char or char -> nonchar.
show more ...
|
#
ca53e5ae |
| 05-Dec-2020 |
Kyle Evans <kevans@FreeBSD.org> |
libregex: implement \` and \' (begin-of-subj, end-of-subj)
These are GNU extensions, generally equivalent to ^ and $ except that the new syntax will not match beginning of line after the first in a
libregex: implement \` and \' (begin-of-subj, end-of-subj)
These are GNU extensions, generally equivalent to ^ and $ except that the new syntax will not match beginning of line after the first in a multi-line expression or the end of line before absolute last in a multi-line expression.
show more ...
|
Revision tags: release/12.2.0 |
|
#
440cec3f |
| 12-Aug-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
e383ec74 |
| 06-Aug-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r363739 through r363986.
|
#
18a1e2e9 |
| 04-Aug-2020 |
Kyle Evans <kevans@FreeBSD.org> |
libregex: Implement a subset of the GNU extensions
The entire patch-set is not yet mature enough for commit, but this usable subset is generally enough for googletest to be happy with and mostly map
libregex: Implement a subset of the GNU extensions
The entire patch-set is not yet mature enough for commit, but this usable subset is generally enough for googletest to be happy with and mostly map to some existing concepts, so they're not as invasive.
The specific changes included here are:
- Branching in BREs with \| - \w and \W for [[:alnum:]] and [^[:alnum:]] respectively - \s and \S for [[:space:]] and [^[:space:]] respectively - Additional quantifiers in BREs, \? and \+ (self-explanatory)
There's some #ifdef'd out work for allowing empty branches as a match-all. This is a feature that's under assessment... future work will determine how standard this behavior is and act accordingly.
show more ...
|
#
c7aa572c |
| 31-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
17996960 |
| 31-Jul-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r363583 through r363738.
|
#
adeebf4c |
| 30-Jul-2020 |
Kyle Evans <kevans@FreeBSD.org> |
regex(3): Interpret many escaped ordinary characters as EESCAPE
In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for any character to be escaped, but "ORD_CHAR preceded by an un
regex(3): Interpret many escaped ordinary characters as EESCAPE
In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for any character to be escaped, but "ORD_CHAR preceded by an unescaped <backslash> character [gives undefined results]".
Historically, we've interpreted an escaped ordinary character as the ordinary character itself. This becomes problematic when some extensions give special meanings to an otherwise ordinary character (e.g. GNU's \b, \s, \w), meaning we may have two different valid interpretations of the same sequence.
To make this easier to deal with and given that the standard calls this undefined, we should throw an error (EESCAPE) if we run into this scenario to ease transition into a state where some escaped ordinaries are blessed with a special meaning -- it will either error out or have extended behavior, rather than have two entirely different versions of undefined behavior that leave the consumer of regex(3) guessing as to what behavior will be used or leaving them with false impressions.
This change bumps the symbol version of regcomp to FBSD_1.6 and provides the old escape semantics for legacy applications, just in case one has an older application that would immediately turn into a pumpkin because of an extraneous escape that's embedded or otherwise critical to its operation.
This is the final piece needed before enhancing libregex with GNU extensions and flipping the switch on bsdgrep.
[1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/
PR: 229925 (exp-run, courtesy of antoine) Differential Revision: https://reviews.freebsd.org/D10510
show more ...
|
Revision tags: release/11.4.0, release/12.1.0 |
|
#
668ee101 |
| 26-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352587 through r352763.
|
#
3c787714 |
| 24-Sep-2019 |
Yuri Pankov <yuripv@FreeBSD.org> |
lib/libc/regex: fix build with REDEBUG defined
Reviewed by: kevans Differential Revision: https://reviews.freebsd.org/D21760
|
Revision tags: release/11.3.0 |
|
#
e2a87ae3 |
| 20-Dec-2018 |
Yuri Pankov <yuripv@FreeBSD.org> |
regcomp: revert part of r341838 which turned out to be unrelated and caused issues with search in less.
PR: 234066 Reviewed by: pfg Differential revision: https://reviews.freebsd.org/D18611
|
#
547bc083 |
| 12-Dec-2018 |
Yuri Pankov <yuripv@FreeBSD.org> |
regcomp: reduce size of bitmap for multibyte locales
This fixes the obscure endless loop seen with case-insensitive patterns containing characters in 128-255 range; originally found running GNU gre
regcomp: reduce size of bitmap for multibyte locales
This fixes the obscure endless loop seen with case-insensitive patterns containing characters in 128-255 range; originally found running GNU grep test suite.
Our regex implementation being kludgy translates the characters in case-insensitive pattern to bracket expression containing both cases for the character and doesn't correctly handle the case when original character is in bitmap and the other case is not, falling into the endless loop going through in p_bracket(), ordinary(), and bothcases().
Reducing the bitmap to 0-127 range for multibyte locales solves this as none of these characters have other case mapping outside of bitmap. We are also safe in the case when the original character outside of bitmap has other case mapping in the bitmap (there are several of those in our current ctype maps having unidirectional mapping into bitmap).
Reviewed by: bapt, kevans, pfg Differential revision: https://reviews.freebsd.org/D18302
show more ...
|
Revision tags: release/12.0.0, release/11.2.0 |
|
#
fe5bf674 |
| 22-Jan-2018 |
Kyle Evans <kevans@FreeBSD.org> |
Add missing patch from r328240
regcomp uses some libc internal collation bits that are not available in the libregex context. It's easy enough to bring in the needed parts that can work in a librege
Add missing patch from r328240
regcomp uses some libc internal collation bits that are not available in the libregex context. It's easy enough to bring in the needed parts that can work in a libregex world, so do so.
Pointy hat to: me
show more ...
|
#
4f8f1c79 |
| 21-Jan-2018 |
Kyle Evans <kevans@FreeBSD.org> |
regex(3): Resolve issues with higher WARNS levels
libc is set for WARNS=2, but the incoming libregex will use WARNS=6. Sprinkle some casts and (void)bc's to alleviate the warnings that come along wi
regex(3): Resolve issues with higher WARNS levels
libc is set for WARNS=2, but the incoming libregex will use WARNS=6. Sprinkle some casts and (void)bc's to alleviate the warnings that come along with the higher WARNS level.
These 'bc' parameters could be outright removed, but as of right now they will be used in some parts of libregex land. Silence the warnings instead rather than flip-flopping.
show more ...
|
#
82725ba9 |
| 23-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325999 through r326131.
|