mirror of
https://github.com/MariaDB/server.git
synced 2025-07-29 05:21:33 +03:00
Merge branch 'merge-pcre' into 10.1
This commit is contained in:
@ -8,7 +8,7 @@ Email domain: cam.ac.uk
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England.
|
||||
|
||||
Copyright (c) 1997-2018 University of Cambridge
|
||||
Copyright (c) 1997-2019 University of Cambridge
|
||||
All rights reserved
|
||||
|
||||
|
||||
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
|
||||
Email local part: hzmester
|
||||
Emain domain: freemail.hu
|
||||
|
||||
Copyright(c) 2010-2018 Zoltan Herczeg
|
||||
Copyright(c) 2010-2019 Zoltan Herczeg
|
||||
All rights reserved.
|
||||
|
||||
|
||||
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
|
||||
Email local part: hzmester
|
||||
Emain domain: freemail.hu
|
||||
|
||||
Copyright(c) 2009-2018 Zoltan Herczeg
|
||||
Copyright(c) 2009-2019 Zoltan Herczeg
|
||||
All rights reserved.
|
||||
|
||||
|
||||
|
@ -5,6 +5,49 @@ Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
|
||||
development is happening in the PCRE2 10.xx series.
|
||||
|
||||
|
||||
Version 8.43 23-February-2019
|
||||
-----------------------------
|
||||
|
||||
1. Some time ago the config macro SUPPORT_UTF8 was changed to SUPPORT_UTF
|
||||
because it also applies to UTF-16 and UTF-32. However, this change was not made
|
||||
in the pcre2cpp files; consequently the C++ wrapper has from then been compiled
|
||||
with a bug in it, which would have been picked up by the unit test except that
|
||||
it also had its UTF8 code cut out. The bug was in a global replace when moving
|
||||
forward after matching an empty string.
|
||||
|
||||
2. The C++ wrapper got broken a long time ago (version 7.3, August 2007) when
|
||||
(*CR) was invented (assuming it was the first such start-of-pattern option).
|
||||
The wrapper could never handle such patterns because it wraps patterns in
|
||||
(?:...)\z in order to support end anchoring. I have hacked in some code to fix
|
||||
this, that is, move the wrapping till after any existing start-of-pattern
|
||||
special settings.
|
||||
|
||||
3. "pcre2grep" (sic) was accidentally mentioned in an error message (fix was
|
||||
ported from PCRE2).
|
||||
|
||||
4. Typo LCC_ALL for LC_ALL fixed in pcregrep.
|
||||
|
||||
5. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
|
||||
negative class with no characters less than 0x100 followed by a positive class
|
||||
with only characters less than 0x100, the first class was incorrectly being
|
||||
auto-possessified, causing incorrect match failures.
|
||||
|
||||
6. If the only branch in a conditional subpattern was anchored, the whole
|
||||
subpattern was treated as anchored, when it should not have been, since the
|
||||
assumed empty second branch cannot be anchored. Demonstrated by test patterns
|
||||
such as /(?(1)^())b/ or /(?(?=^))b/.
|
||||
|
||||
7. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
|
||||
a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
|
||||
|
||||
8. If a pattern started with a subroutine call that had a quantifier with a
|
||||
minimum of zero, an incorrect "match must start with this character" could be
|
||||
recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
|
||||
be the first character of a match.
|
||||
|
||||
9. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
|
||||
|
||||
|
||||
Version 8.42 20-March-2018
|
||||
--------------------------
|
||||
|
||||
|
10
pcre/LICENCE
10
pcre/LICENCE
@ -25,7 +25,7 @@ Email domain: cam.ac.uk
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England.
|
||||
|
||||
Copyright (c) 1997-2018 University of Cambridge
|
||||
Copyright (c) 1997-2019 University of Cambridge
|
||||
All rights reserved.
|
||||
|
||||
|
||||
@ -34,9 +34,9 @@ PCRE JUST-IN-TIME COMPILATION SUPPORT
|
||||
|
||||
Written by: Zoltan Herczeg
|
||||
Email local part: hzmester
|
||||
Emain domain: freemail.hu
|
||||
Email domain: freemail.hu
|
||||
|
||||
Copyright(c) 2010-2018 Zoltan Herczeg
|
||||
Copyright(c) 2010-2019 Zoltan Herczeg
|
||||
All rights reserved.
|
||||
|
||||
|
||||
@ -45,9 +45,9 @@ STACK-LESS JUST-IN-TIME COMPILER
|
||||
|
||||
Written by: Zoltan Herczeg
|
||||
Email local part: hzmester
|
||||
Emain domain: freemail.hu
|
||||
Email domain: freemail.hu
|
||||
|
||||
Copyright(c) 2009-2018 Zoltan Herczeg
|
||||
Copyright(c) 2009-2019 Zoltan Herczeg
|
||||
All rights reserved.
|
||||
|
||||
|
||||
|
10
pcre/NEWS
10
pcre/NEWS
@ -1,6 +1,16 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Note that this library (now called PCRE1) is now being maintained for bug fixes
|
||||
only. New projects are advised to use the new PCRE2 libraries.
|
||||
|
||||
|
||||
Release 8.43 23-February-2019
|
||||
-----------------------------
|
||||
|
||||
This is a bug-fix release.
|
||||
|
||||
|
||||
Release 8.42 20-March-2018
|
||||
--------------------------
|
||||
|
||||
|
@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
|
||||
dnl be defined as -RC2, for example. For real releases, it should be empty.
|
||||
|
||||
m4_define(pcre_major, [8])
|
||||
m4_define(pcre_minor, [42])
|
||||
m4_define(pcre_minor, [43])
|
||||
m4_define(pcre_prerelease, [])
|
||||
m4_define(pcre_date, [2018-03-20])
|
||||
m4_define(pcre_date, [2019-02-23])
|
||||
|
||||
# NOTE: The CMakeLists.txt file searches for the above variables in the first
|
||||
# 50 lines of this file. Please update that if the variables above are moved.
|
||||
|
||||
# Libtool shared library interface versions (current:revision:age)
|
||||
m4_define(libpcre_version, [3:10:2])
|
||||
m4_define(libpcre16_version, [2:10:2])
|
||||
m4_define(libpcre32_version, [0:10:0])
|
||||
m4_define(libpcre_version, [3:11:2])
|
||||
m4_define(libpcre16_version, [2:11:2])
|
||||
m4_define(libpcre32_version, [0:11:0])
|
||||
m4_define(libpcreposix_version, [0:6:0])
|
||||
m4_define(libpcrecpp_version, [0:1:0])
|
||||
|
||||
|
@ -6,7 +6,7 @@
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2016 University of Cambridge
|
||||
Copyright (c) 1997-2018 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -3300,7 +3300,7 @@ for(;;)
|
||||
if ((*xclass_flags & XCL_MAP) == 0)
|
||||
{
|
||||
/* No bits are set for characters < 256. */
|
||||
if (list[1] == 0) return TRUE;
|
||||
if (list[1] == 0) return (*xclass_flags & XCL_NOT) == 0;
|
||||
/* Might be an empty repeat. */
|
||||
continue;
|
||||
}
|
||||
@ -7645,6 +7645,8 @@ for (;; ptr++)
|
||||
/* Can't determine a first byte now */
|
||||
|
||||
if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
|
||||
zerofirstchar = firstchar;
|
||||
zerofirstcharflags = firstcharflags;
|
||||
continue;
|
||||
|
||||
|
||||
@ -8685,13 +8687,21 @@ do {
|
||||
if (!is_anchored(scode, new_map, cd, atomcount)) return FALSE;
|
||||
}
|
||||
|
||||
/* Positive forward assertions and conditions */
|
||||
/* Positive forward assertion */
|
||||
|
||||
else if (op == OP_ASSERT || op == OP_COND)
|
||||
else if (op == OP_ASSERT)
|
||||
{
|
||||
if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE;
|
||||
}
|
||||
|
||||
/* Condition; not anchored if no second branch */
|
||||
|
||||
else if (op == OP_COND)
|
||||
{
|
||||
if (scode[GET(scode,1)] != OP_ALT) return FALSE;
|
||||
if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE;
|
||||
}
|
||||
|
||||
/* Atomic groups */
|
||||
|
||||
else if (op == OP_ONCE || op == OP_ONCE_NC)
|
||||
|
@ -9002,7 +9002,7 @@ if (exact > 1)
|
||||
#ifdef SUPPORT_UTF
|
||||
&& !common->utf
|
||||
#endif
|
||||
)
|
||||
&& type != OP_ANYNL && type != OP_EXTUNI)
|
||||
{
|
||||
OP2(SLJIT_ADD, TMP1, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(exact));
|
||||
add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_GREATER, TMP1, 0, STR_END, 0));
|
||||
|
@ -80,6 +80,24 @@ static const string empty_string;
|
||||
// If the user doesn't ask for any options, we just use this one
|
||||
static RE_Options default_options;
|
||||
|
||||
// Specials for the start of patterns. See comments where start_options is used
|
||||
// below. (PH June 2018)
|
||||
static const char *start_options[] = {
|
||||
"(*UTF8)",
|
||||
"(*UTF)",
|
||||
"(*UCP)",
|
||||
"(*NO_START_OPT)",
|
||||
"(*NO_AUTO_POSSESS)",
|
||||
"(*LIMIT_RECURSION=",
|
||||
"(*LIMIT_MATCH=",
|
||||
"(*CRLF)",
|
||||
"(*CR)",
|
||||
"(*BSR_UNICODE)",
|
||||
"(*BSR_ANYCRLF)",
|
||||
"(*ANYCRLF)",
|
||||
"(*ANY)",
|
||||
"" };
|
||||
|
||||
void RE::Init(const string& pat, const RE_Options* options) {
|
||||
pattern_ = pat;
|
||||
if (options == NULL) {
|
||||
@ -135,7 +153,49 @@ pcre* RE::Compile(Anchor anchor) {
|
||||
} else {
|
||||
// Tack a '\z' at the end of RE. Parenthesize it first so that
|
||||
// the '\z' applies to all top-level alternatives in the regexp.
|
||||
string wrapped = "(?:"; // A non-counting grouping operator
|
||||
|
||||
/* When this code was written (for PCRE 6.0) it was enough just to
|
||||
parenthesize the entire pattern. Unfortunately, when the feature of
|
||||
starting patterns with (*UTF8) or (*CR) etc. was added to PCRE patterns,
|
||||
this code was never updated. This bug was not noticed till 2018, long after
|
||||
PCRE became obsolescent and its maintainer no longer around. Since PCRE is
|
||||
frozen, I have added a hack to check for all the existing "start of
|
||||
pattern" specials - knowing that no new ones will ever be added. I am not a
|
||||
C++ programmer, so the code style is no doubt crude. It is also
|
||||
inefficient, but is only run when the pattern starts with "(*".
|
||||
PH June 2018. */
|
||||
|
||||
string wrapped = "";
|
||||
|
||||
if (pattern_.c_str()[0] == '(' && pattern_.c_str()[1] == '*') {
|
||||
int kk, klen, kmat;
|
||||
for (;;) { // Loop for any number of leading items
|
||||
|
||||
for (kk = 0; start_options[kk][0] != 0; kk++) {
|
||||
klen = strlen(start_options[kk]);
|
||||
kmat = strncmp(pattern_.c_str(), start_options[kk], klen);
|
||||
if (kmat >= 0) break;
|
||||
}
|
||||
if (kmat != 0) break; // Not found
|
||||
|
||||
// If the item ended in "=" we must copy digits up to ")".
|
||||
|
||||
if (start_options[kk][klen-1] == '=') {
|
||||
while (isdigit(pattern_.c_str()[klen])) klen++;
|
||||
if (pattern_.c_str()[klen] != ')') break; // Syntax error
|
||||
klen++;
|
||||
}
|
||||
|
||||
// Move the item from the pattern to the start of the wrapped string.
|
||||
|
||||
wrapped += pattern_.substr(0, klen);
|
||||
pattern_.erase(0, klen);
|
||||
}
|
||||
}
|
||||
|
||||
// Wrap the rest of the pattern.
|
||||
|
||||
wrapped += "(?:"; // A non-counting grouping operator
|
||||
wrapped += pattern_;
|
||||
wrapped += ")\\z";
|
||||
re = pcre_compile(wrapped.c_str(), pcre_options,
|
||||
@ -415,7 +475,7 @@ int RE::GlobalReplace(const StringPiece& rewrite,
|
||||
matchend++;
|
||||
}
|
||||
// We also need to advance more than one char if we're in utf8 mode.
|
||||
#ifdef SUPPORT_UTF8
|
||||
#ifdef SUPPORT_UTF
|
||||
if (options_.utf8()) {
|
||||
while (matchend < static_cast<int>(str->length()) &&
|
||||
((*str)[matchend] & 0xc0) == 0x80)
|
||||
|
@ -309,7 +309,7 @@ static void TestReplace() {
|
||||
"@aa",
|
||||
"@@@",
|
||||
3 },
|
||||
#ifdef SUPPORT_UTF8
|
||||
#ifdef SUPPORT_UTF
|
||||
{ "b*",
|
||||
"bb",
|
||||
"\xE3\x83\x9B\xE3\x83\xBC\xE3\x83\xA0\xE3\x81\xB8", // utf8
|
||||
@ -327,7 +327,7 @@ static void TestReplace() {
|
||||
{ "", NULL, NULL, NULL, NULL, 0 }
|
||||
};
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
#ifdef SUPPORT_UTF
|
||||
const bool support_utf8 = true;
|
||||
#else
|
||||
const bool support_utf8 = false;
|
||||
@ -535,7 +535,7 @@ static void TestQuoteMetaLatin1() {
|
||||
}
|
||||
|
||||
static void TestQuoteMetaUtf8() {
|
||||
#ifdef SUPPORT_UTF8
|
||||
#ifdef SUPPORT_UTF
|
||||
TestQuoteMeta("Pl\xc3\xa1\x63ido Domingo", pcrecpp::UTF8());
|
||||
TestQuoteMeta("xyz", pcrecpp::UTF8()); // No fancy utf8
|
||||
TestQuoteMeta("\xc2\xb0", pcrecpp::UTF8()); // 2-byte utf8 (degree symbol)
|
||||
@ -1178,7 +1178,7 @@ int main(int argc, char** argv) {
|
||||
CHECK(re.error().empty()); // Must have no error
|
||||
}
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
#ifdef SUPPORT_UTF
|
||||
// Check UTF-8 handling
|
||||
{
|
||||
printf("Testing UTF-8 handling\n");
|
||||
@ -1203,6 +1203,30 @@ int main(int argc, char** argv) {
|
||||
RE re_test2("...", pcrecpp::UTF8());
|
||||
CHECK(re_test2.FullMatch(utf8_string));
|
||||
|
||||
// PH added these tests for leading option settings
|
||||
|
||||
RE re_testZ0("(*CR)(*NO_START_OPT).........");
|
||||
CHECK(re_testZ0.FullMatch(utf8_string));
|
||||
|
||||
#ifdef SUPPORT_UTF
|
||||
RE re_testZ1("(*UTF8)...");
|
||||
CHECK(re_testZ1.FullMatch(utf8_string));
|
||||
|
||||
RE re_testZ2("(*UTF)...");
|
||||
CHECK(re_testZ2.FullMatch(utf8_string));
|
||||
|
||||
#ifdef SUPPORT_UCP
|
||||
RE re_testZ3("(*UCP)(*UTF)...");
|
||||
CHECK(re_testZ3.FullMatch(utf8_string));
|
||||
|
||||
RE re_testZ4("(*UCP)(*LIMIT_MATCH=1000)(*UTF)...");
|
||||
CHECK(re_testZ4.FullMatch(utf8_string));
|
||||
|
||||
RE re_testZ5("(*UCP)(*LIMIT_MATCH=1000)(*ANY)(*UTF)...");
|
||||
CHECK(re_testZ5.FullMatch(utf8_string));
|
||||
#endif
|
||||
#endif
|
||||
|
||||
// Check that '.' matches one byte or UTF-8 character
|
||||
// according to the mode.
|
||||
string ss;
|
||||
@ -1248,7 +1272,7 @@ int main(int argc, char** argv) {
|
||||
CHECK(!match_sentence.FullMatch(target));
|
||||
CHECK(!match_sentence_re.FullMatch(target));
|
||||
}
|
||||
#endif /* def SUPPORT_UTF8 */
|
||||
#endif /* def SUPPORT_UTF */
|
||||
|
||||
printf("Testing error reporting\n");
|
||||
|
||||
|
@ -2252,7 +2252,7 @@ if (isdirectory(pathname))
|
||||
int fnlength = strlen(pathname) + strlen(nextfile) + 2;
|
||||
if (fnlength > 2048)
|
||||
{
|
||||
fprintf(stderr, "pcre2grep: recursive filename is too long\n");
|
||||
fprintf(stderr, "pcregrep: recursive filename is too long\n");
|
||||
rc = 2;
|
||||
break;
|
||||
}
|
||||
@ -3034,7 +3034,7 @@ LC_ALL environment variable is set, and if so, use it. */
|
||||
if (locale == NULL)
|
||||
{
|
||||
locale = getenv("LC_ALL");
|
||||
locale_from = "LCC_ALL";
|
||||
locale_from = "LC_ALL";
|
||||
}
|
||||
|
||||
if (locale == NULL)
|
||||
|
15
pcre/testdata/testinput1
vendored
15
pcre/testdata/testinput1
vendored
@ -5742,4 +5742,19 @@ AbcdCBefgBhiBqz
|
||||
/X+(?#comment)?/
|
||||
>XXX<
|
||||
|
||||
/ (?<word> \w+ )* \. /xi
|
||||
pokus.
|
||||
|
||||
/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi
|
||||
pokus.
|
||||
|
||||
/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi
|
||||
pokus.
|
||||
|
||||
/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi
|
||||
pokus.
|
||||
|
||||
/(?&word)* \. (?<word> \w+ )/xi
|
||||
pokus.hokus
|
||||
|
||||
/-- End of testinput1 --/
|
||||
|
3
pcre/testdata/testinput2
vendored
3
pcre/testdata/testinput2
vendored
@ -4257,4 +4257,7 @@ backtracking verbs. --/
|
||||
ab
|
||||
aaab
|
||||
|
||||
/(?(?=^))b/
|
||||
abc
|
||||
|
||||
/-- End of testinput2 --/
|
||||
|
3
pcre/testdata/testinput4
vendored
3
pcre/testdata/testinput4
vendored
@ -727,4 +727,7 @@
|
||||
/\C(\W?ſ)'?{{/8
|
||||
\\C(\\W?ſ)'?{{
|
||||
|
||||
/[^\x{100}-\x{ffff}]*[\x80-\xff]/8
|
||||
\x{99}\x{99}\x{99}
|
||||
|
||||
/-- End of testinput4 --/
|
||||
|
24
pcre/testdata/testoutput1
vendored
24
pcre/testdata/testoutput1
vendored
@ -9446,4 +9446,28 @@ No match
|
||||
>XXX<
|
||||
0: X
|
||||
|
||||
/ (?<word> \w+ )* \. /xi
|
||||
pokus.
|
||||
0: pokus.
|
||||
1: pokus
|
||||
|
||||
/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi
|
||||
pokus.
|
||||
0: pokus.
|
||||
|
||||
/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi
|
||||
pokus.
|
||||
0: pokus.
|
||||
1: <unset>
|
||||
2: pokus
|
||||
|
||||
/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi
|
||||
pokus.
|
||||
0: pokus.
|
||||
|
||||
/(?&word)* \. (?<word> \w+ )/xi
|
||||
pokus.hokus
|
||||
0: pokus.hokus
|
||||
1: hokus
|
||||
|
||||
/-- End of testinput1 --/
|
||||
|
4
pcre/testdata/testoutput2
vendored
4
pcre/testdata/testoutput2
vendored
@ -14721,4 +14721,8 @@ No need char
|
||||
0: ab
|
||||
1: a
|
||||
|
||||
/(?(?=^))b/
|
||||
abc
|
||||
0: b
|
||||
|
||||
/-- End of testinput2 --/
|
||||
|
4
pcre/testdata/testoutput4
vendored
4
pcre/testdata/testoutput4
vendored
@ -1277,4 +1277,8 @@ No match
|
||||
\\C(\\W?ſ)'?{{
|
||||
No match
|
||||
|
||||
/[^\x{100}-\x{ffff}]*[\x80-\xff]/8
|
||||
\x{99}\x{99}\x{99}
|
||||
0: \x{99}\x{99}\x{99}
|
||||
|
||||
/-- End of testinput4 --/
|
||||
|
Reference in New Issue
Block a user