mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
197 lines
5.3 KiB
C
197 lines
5.3 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* parser.c
|
|
* Main entry point/driver for PostgreSQL grammar
|
|
*
|
|
* Note that the grammar is not allowed to perform any table access
|
|
* (since we need to be able to do basic parsing even while inside an
|
|
* aborted transaction). Therefore, the data structures returned by
|
|
* the grammar are "raw" parsetrees that still need to be analyzed by
|
|
* analyze.c and related files.
|
|
*
|
|
*
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* IDENTIFICATION
|
|
* src/backend/parser/parser.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
#include "parser/gramparse.h"
|
|
#include "parser/parser.h"
|
|
|
|
|
|
/*
|
|
* raw_parser
|
|
* Given a query in string form, do lexical and grammatical analysis.
|
|
*
|
|
* Returns a list of raw (un-analyzed) parse trees. The immediate elements
|
|
* of the list are always RawStmt nodes.
|
|
*/
|
|
List *
|
|
raw_parser(const char *str)
|
|
{
|
|
core_yyscan_t yyscanner;
|
|
base_yy_extra_type yyextra;
|
|
int yyresult;
|
|
|
|
/* initialize the flex scanner */
|
|
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
|
|
ScanKeywords, NumScanKeywords);
|
|
|
|
/* base_yylex() only needs this much initialization */
|
|
yyextra.have_lookahead = false;
|
|
|
|
/* initialize the bison parser */
|
|
parser_init(&yyextra);
|
|
|
|
/* Parse! */
|
|
yyresult = base_yyparse(yyscanner);
|
|
|
|
/* Clean up (release memory) */
|
|
scanner_finish(yyscanner);
|
|
|
|
if (yyresult) /* error */
|
|
return NIL;
|
|
|
|
return yyextra.parsetree;
|
|
}
|
|
|
|
|
|
/*
|
|
* Intermediate filter between parser and core lexer (core_yylex in scan.l).
|
|
*
|
|
* This filter is needed because in some cases the standard SQL grammar
|
|
* requires more than one token lookahead. We reduce these cases to one-token
|
|
* lookahead by replacing tokens here, in order to keep the grammar LALR(1).
|
|
*
|
|
* Using a filter is simpler than trying to recognize multiword tokens
|
|
* directly in scan.l, because we'd have to allow for comments between the
|
|
* words. Furthermore it's not clear how to do that without re-introducing
|
|
* scanner backtrack, which would cost more performance than this filter
|
|
* layer does.
|
|
*
|
|
* The filter also provides a convenient place to translate between
|
|
* the core_YYSTYPE and YYSTYPE representations (which are really the
|
|
* same thing anyway, but notationally they're different).
|
|
*/
|
|
int
|
|
base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
|
|
{
|
|
base_yy_extra_type *yyextra = pg_yyget_extra(yyscanner);
|
|
int cur_token;
|
|
int next_token;
|
|
int cur_token_length;
|
|
YYLTYPE cur_yylloc;
|
|
|
|
/* Get next token --- we might already have it */
|
|
if (yyextra->have_lookahead)
|
|
{
|
|
cur_token = yyextra->lookahead_token;
|
|
lvalp->core_yystype = yyextra->lookahead_yylval;
|
|
*llocp = yyextra->lookahead_yylloc;
|
|
*(yyextra->lookahead_end) = yyextra->lookahead_hold_char;
|
|
yyextra->have_lookahead = false;
|
|
}
|
|
else
|
|
cur_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
|
|
|
|
/*
|
|
* If this token isn't one that requires lookahead, just return it. If it
|
|
* does, determine the token length. (We could get that via strlen(), but
|
|
* since we have such a small set of possibilities, hardwiring seems
|
|
* feasible and more efficient.)
|
|
*/
|
|
switch (cur_token)
|
|
{
|
|
case NOT:
|
|
cur_token_length = 3;
|
|
break;
|
|
case NULLS_P:
|
|
cur_token_length = 5;
|
|
break;
|
|
case WITH:
|
|
cur_token_length = 4;
|
|
break;
|
|
default:
|
|
return cur_token;
|
|
}
|
|
|
|
/*
|
|
* Identify end+1 of current token. core_yylex() has temporarily stored a
|
|
* '\0' here, and will undo that when we call it again. We need to redo
|
|
* it to fully revert the lookahead call for error reporting purposes.
|
|
*/
|
|
yyextra->lookahead_end = yyextra->core_yy_extra.scanbuf +
|
|
*llocp + cur_token_length;
|
|
Assert(*(yyextra->lookahead_end) == '\0');
|
|
|
|
/*
|
|
* Save and restore *llocp around the call. It might look like we could
|
|
* avoid this by just passing &lookahead_yylloc to core_yylex(), but that
|
|
* does not work because flex actually holds onto the last-passed pointer
|
|
* internally, and will use that for error reporting. We need any error
|
|
* reports to point to the current token, not the next one.
|
|
*/
|
|
cur_yylloc = *llocp;
|
|
|
|
/* Get next token, saving outputs into lookahead variables */
|
|
next_token = core_yylex(&(yyextra->lookahead_yylval), llocp, yyscanner);
|
|
yyextra->lookahead_token = next_token;
|
|
yyextra->lookahead_yylloc = *llocp;
|
|
|
|
*llocp = cur_yylloc;
|
|
|
|
/* Now revert the un-truncation of the current token */
|
|
yyextra->lookahead_hold_char = *(yyextra->lookahead_end);
|
|
*(yyextra->lookahead_end) = '\0';
|
|
|
|
yyextra->have_lookahead = true;
|
|
|
|
/* Replace cur_token if needed, based on lookahead */
|
|
switch (cur_token)
|
|
{
|
|
case NOT:
|
|
/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
|
|
switch (next_token)
|
|
{
|
|
case BETWEEN:
|
|
case IN_P:
|
|
case LIKE:
|
|
case ILIKE:
|
|
case SIMILAR:
|
|
cur_token = NOT_LA;
|
|
break;
|
|
}
|
|
break;
|
|
|
|
case NULLS_P:
|
|
/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
|
|
switch (next_token)
|
|
{
|
|
case FIRST_P:
|
|
case LAST_P:
|
|
cur_token = NULLS_LA;
|
|
break;
|
|
}
|
|
break;
|
|
|
|
case WITH:
|
|
/* Replace WITH by WITH_LA if it's followed by TIME or ORDINALITY */
|
|
switch (next_token)
|
|
{
|
|
case TIME:
|
|
case ORDINALITY:
|
|
cur_token = WITH_LA;
|
|
break;
|
|
}
|
|
break;
|
|
}
|
|
|
|
return cur_token;
|
|
}
|