1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-03 20:02:46 +03:00

Fix lexing of standard multi-character operators in edge cases.

Commits c6b3c939b (which fixed the precedence of >=, <=, <> operators)
and 865f14a2d (which added support for the standard => notation for
named arguments) created a class of lexer tokens which look like
multi-character operators but which have their own token IDs distinct
from Op. However, longest-match rules meant that following any of
these tokens with another operator character, as in (1<>-1), would
cause them to be incorrectly returned as Op.

The error here isn't immediately obvious, because the parser would
usually still find the correct operator via the Op token, but there
were more subtle problems:

1. If immediately followed by a comment or +-, >= <= <> would be given
   the old precedence of Op rather than the correct new precedence;

2. If followed by a comment, != would be returned as Op rather than as
   NOT_EQUAL, causing it not to be found at all;

3. If followed by a comment or +-, the => token for named arguments
   would be lexed as Op, causing the argument to be mis-parsed as a
   simple expression, usually causing an error.

Fix by explicitly checking for the operators in the {operator} code
block in addition to all the existing special cases there.

Backpatch to 9.5 where the problem was introduced.

Analysis and patch by me; review by Tom Lane.
Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk
This commit is contained in:
Andrew Gierth
2018-08-23 18:29:18 +01:00
parent 4854ead60a
commit 5ec70a9286
7 changed files with 221 additions and 0 deletions

View File

@ -335,6 +335,15 @@ identifier {ident_start}{ident_cont}*
typecast "::"
dot_dot \.\.
colon_equals ":="
/*
* These operator-like tokens (unlike the above ones) also match the {operator}
* rule, which means that they might be overridden by a longer match if they
* are followed by a comment start or a + or - character. Accordingly, if you
* add to this list, you must also add corresponding code to the {operator}
* block to return the correct token in such cases. (This is not needed in
* psqlscan.l since the token value is ignored there.)
*/
equals_greater "=>"
less_equals "<="
greater_equals ">="
@ -925,6 +934,25 @@ other .
if (nchars == 1 &&
strchr(",()[].;:+-*/%^<>=", yytext[0]))
return yytext[0];
/*
* Likewise, if what we have left is two chars, and
* those match the tokens ">=", "<=", "=>", "<>" or
* "!=", then we must return the appropriate token
* rather than the generic Op.
*/
if (nchars == 2)
{
if (yytext[0] == '=' && yytext[1] == '>')
return EQUALS_GREATER;
if (yytext[0] == '>' && yytext[1] == '=')
return GREATER_EQUALS;
if (yytext[0] == '<' && yytext[1] == '=')
return LESS_EQUALS;
if (yytext[0] == '<' && yytext[1] == '>')
return NOT_EQUALS;
if (yytext[0] == '!' && yytext[1] == '=')
return NOT_EQUALS;
}
}
/*