1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-05 09:19:17 +03:00

4 Commits

Author SHA1 Message Date
Tom Lane
4e703d6719 Make some minor improvements in the regex code.
Push some hopefully-uncontroversial bits extracted from an upcoming
patch series, to remove non-relevant clutter from the main patches.

In compact(), return immediately after setting REG_ASSERT error;
continuing the loop would just lead to assertion failure below.
(Ask me how I know.)

In parseqatom(), remove assertion that moresubs() did its job.
When moresubs actually did its job, this is redundant with that
function's final assert; but when it failed on OOM, this is an
assertion crash.  We could avoid the crash by adding a NOERR()
check before the assertion, but it seems better to subtract code
than add it.  (Note that there's a NOERR exit a few lines further
down, and nothing else between here and there requires moresubs
to have succeeded.  So we don't really need an extra error exit.)
This is a live bug in assert-enabled builds, but given the very
low likelihood of OOM in moresub's tiny allocation, I don't think
it's worth back-patching.

On the other hand, it seems worthwhile to add an assertion that
our intended v->subs[subno] target is still null by the time we
are ready to insert into it, since there's a recursion in between.

In pg_regexec, ensure we fflush any debug output on the way out,
and try to make MDEBUG messages more uniform and helpful.  (In
particular, ensure that all of them are prefixed with the subre's
id number, so one can match up entry and exit reports.)

Add some test cases in test_regex to improve coverage of lookahead
and lookbehind constraints.  Adding these now is mainly to establish
that this is indeed the existing behavior.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-17 12:24:23 -05:00
Tom Lane
afcc8772ed Fix ancient bug in parsing of BRE-mode regular expressions.
brenext(), when parsing a '*' quantifier, forgot to return any "value"
for the token; per the equivalent case in next(), it should return
value 1 to indicate that greedy rather than non-greedy behavior is
wanted.  The result is that the compiled regexp could behave like 'x*?'
rather than the intended 'x*', if we were unlucky enough to have
a zero in v->nextvalue at this point.  That seems to happen with some
reliability if we have '.*' at the beginning of a BRE-mode regexp,
although that depends on the initial contents of a stack-allocated
struct, so it's not guaranteed to fail.

Found by Alexander Lakhin using valgrind testing.  This bug seems
to be aboriginal in Spencer's code, so back-patch all the way.

Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
2021-01-08 12:16:00 -05:00
Tom Lane
f7a1a805cb Fix bogus link in test comments.
I apparently copied-and-pasted the wrong link in commit ca8217c10.
Point it where it was meant to go.
2021-01-06 22:09:16 -05:00
Tom Lane
ca8217c101 Add a test module for the regular expression package.
This module provides a function test_regex() that is functionally
rather like regexp_matches(), but with additional debugging-oriented
options and additional output.  The debug options are somewhat obscure;
they are chosen to match the API of the test harness that Henry Spencer
wrote way-back-when for use in Tcl.  With this, we can import all the
test cases that Spencer wrote originally, even for regex functionality
that we don't currently expose in Postgres.  This seems necessary
because we can no longer rely on Tcl to act as upstream and verify
any fixes or improvements that we make.

In addition to Spencer's tests, I added a few for lookbehind
constraints (which we added in 2015, and Tcl still hasn't absorbed)
that are modeled on his tests for lookahead constraints.  After looking
at code coverage reports, I also threw in a couple of tests to more
fully exercise our "high colormap" logic.

According to my testing, this brings the check-world coverage
for src/backend/regex/ from 71.1% to 86.7% of lines.
(coverage.postgresql.org shows a slightly different number,
which I think is because it measures a non-assert build.)

Discussion: https://postgr.es/m/2873268.1609732164@sss.pgh.pa.us
2021-01-06 10:51:14 -05:00