mirror of
https://github.com/postgres/postgres.git
synced 2025-07-12 21:01:52 +03:00
ecpg: clean up documentation of parse.pl, and add more input checking.
README.parser is the user's manual, such as it is, for parse.pl. It's rather poorly written if you ask me; so try to improve it. (More could be written here, but this at least covers the same info in a more organized fashion.) Also, the single solitary line of usage info in parse.pl itself was a lie. Replace. Add some error checks that the ecpg.addons entries meet the syntax rules set forth in README.parser. One of them didn't, but accidentally worked anyway because the logic in include_addon is such that 'block' is the default behavior. Also add a cross-check that each ecpg.addons entry is matched exactly once in the backend grammar. This exposed that there are two dead entries there --- they are dead because the %replace_types table in parse.pl causes their nonterminals to be ignored altogether. Removing them doesn't change the generated preproc.y file. (This implies that check_rules.pl is completely worthless and should be nuked: it adds build cycles and maintenance effort while failing to reliably accomplish its one job of detecting dead rules. I'll do that separately.) Discussion: https://postgr.es/m/2011420.1713493114@sss.pgh.pa.us
This commit is contained in:
@ -1,42 +1,77 @@
|
|||||||
ECPG modifies and extends the core grammar in a way that
|
ECPG's grammar (preproc.y) is built by parse.pl from the
|
||||||
1) every token in ECPG is <str> type. New tokens are
|
backend's grammar (gram.y) plus various add-on rules.
|
||||||
defined in ecpg.tokens, types are defined in ecpg.type
|
Some notes:
|
||||||
2) most tokens from the core grammar are simply converted
|
|
||||||
to literals concatenated together to form the SQL string
|
|
||||||
passed to the server, this is done by parse.pl.
|
|
||||||
3) some rules need side-effects, actions are either added
|
|
||||||
or completely overridden (compared to the basic token
|
|
||||||
concatenation) for them, these are defined in ecpg.addons,
|
|
||||||
the rules for ecpg.addons are explained below.
|
|
||||||
4) new grammar rules are needed for ECPG metacommands.
|
|
||||||
These are in ecpg.trailer.
|
|
||||||
5) ecpg.header contains common functions, etc. used by
|
|
||||||
actions for grammar rules.
|
|
||||||
|
|
||||||
In "ecpg.addons", every modified rule follows this pattern:
|
1) Most input matching core grammar productions is simply converted
|
||||||
ECPG: dumpedtokens postfix
|
to strings and concatenated together to form the SQL string
|
||||||
where "dumpedtokens" is simply tokens from core gram.y's
|
passed to the server. parse.pl can automatically build the
|
||||||
rules concatenated together. e.g. if gram.y has this:
|
grammar actions needed to do this.
|
||||||
ruleA: tokenA tokenB tokenC {...}
|
2) Some grammar rules need special actions that are added to or
|
||||||
then "dumpedtokens" is "ruleAtokenAtokenBtokenC".
|
completely override the default token-concatenation behavior.
|
||||||
"postfix" above can be:
|
This is controlled by ecpg.addons as explained below.
|
||||||
a) "block" - the automatic rule created by parse.pl is completely
|
3) Additional grammar rules are needed for ECPG's own commands.
|
||||||
overridden, the code block has to be written completely as
|
These are in ecpg.trailer, as is the "epilogue" part of preproc.y.
|
||||||
it were in a plain bison grammar
|
4) ecpg.header contains the "prologue" part of preproc.y, including
|
||||||
b) "rule" - the automatic rule is extended on, so new syntaxes
|
support functions, Bison options, etc.
|
||||||
are accepted for "ruleA". E.g.:
|
5) Additional terminals added by ECPG must be defined in ecpg.tokens.
|
||||||
ECPG: ruleAtokenAtokenBtokenC rule
|
Additional nonterminals added by ECPG must be defined in ecpg.type.
|
||||||
| tokenD tokenE { action_code; }
|
|
||||||
...
|
|
||||||
It will be substituted with:
|
|
||||||
ruleA: <original syntax forms and actions up to and including
|
|
||||||
"tokenA tokenB tokenC">
|
|
||||||
| tokenD tokenE { action_code; }
|
|
||||||
...
|
|
||||||
c) "addon" - the automatic action for the rule (SQL syntax constructed
|
|
||||||
from the tokens concatenated together) is prepended with a new
|
|
||||||
action code part. This code part is written as is's already inside
|
|
||||||
the { ... }
|
|
||||||
|
|
||||||
Multiple "addon" or "block" lines may appear together with the
|
ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
|
||||||
new code block if the code block is common for those rules.
|
copied verbatim into preproc.y at appropriate points.
|
||||||
|
|
||||||
|
ecpg.addons contains entries that begin with a line like
|
||||||
|
ECPG: concattokens ruletype
|
||||||
|
and typically have one or more following lines that are the code
|
||||||
|
for a grammar action. Any line not starting with "ECPG:" is taken
|
||||||
|
to be part of the code block for the preceding "ECPG:" line.
|
||||||
|
|
||||||
|
"concattokens" identifies which gram.y production this entry affects.
|
||||||
|
It is simply the target nonterminal and the tokens from the gram.y rule
|
||||||
|
concatenated together. For example, to modify the action for a gram.y
|
||||||
|
rule like this:
|
||||||
|
target: tokenA tokenB tokenC {...}
|
||||||
|
"concattokens" would be "targettokenAtokenBtokenC". If we want to
|
||||||
|
modify a non-first alternative for a nonterminal, we still write the
|
||||||
|
nonterminal. For example, "concattokens" should be "targettokenDtokenE"
|
||||||
|
to affect the second alternative in:
|
||||||
|
target: tokenA tokenB tokenC {...}
|
||||||
|
| tokenD tokenE {...}
|
||||||
|
|
||||||
|
"ruletype" is one of:
|
||||||
|
|
||||||
|
a) "block" - the automatic action that parse.pl would create is
|
||||||
|
completely overridden. Instead the entry's code block is emitted.
|
||||||
|
The code block must include the braces ({}) needed for a Bison action.
|
||||||
|
|
||||||
|
b) "addon" - the entry's code block is inserted into the generated
|
||||||
|
action, ahead of the automatic token-concatenation code.
|
||||||
|
In this case the code block need not contain braces, since
|
||||||
|
it will be inserted within braces.
|
||||||
|
|
||||||
|
c) "rule" - the automatic action is emitted, but then the entry's
|
||||||
|
code block is added verbatim afterwards. This typically is
|
||||||
|
used to add new alternatives to a nonterminal of the core grammar.
|
||||||
|
For example, given the entry:
|
||||||
|
ECPG: targettokenAtokenBtokenC rule
|
||||||
|
| tokenD tokenE { custom_action; }
|
||||||
|
what will be emitted is
|
||||||
|
target: tokenA tokenB tokenC { automatic_action; }
|
||||||
|
| tokenD tokenE { custom_action; }
|
||||||
|
|
||||||
|
Multiple "ECPG:" entries can share the same code block, if the
|
||||||
|
same action is needed for all. When an "ECPG:" line is immediately
|
||||||
|
followed by another one, it is not assigned an empty code block;
|
||||||
|
rather the next nonempty code block is assumed to apply to all
|
||||||
|
immediately preceding "ECPG:" entries.
|
||||||
|
|
||||||
|
In addition to the modifications specified by ecpg.addons,
|
||||||
|
parse.pl contains some tables that list backend grammar
|
||||||
|
productions to be ignored or modified.
|
||||||
|
|
||||||
|
Nonterminals that construct strings (as described above) should be
|
||||||
|
given <str> type, which is parse.pl's default assumption for
|
||||||
|
nonterminals found in gram.y. That can be overridden at need by
|
||||||
|
making an entry in parse.pl's %replace_types table. %replace_types
|
||||||
|
can also be used to suppress output of a nonterminal's rules
|
||||||
|
altogether (in which case ecpg.trailer had better provide replacement
|
||||||
|
rules, since the nonterminal will still be referred to elsewhere).
|
||||||
|
@ -497,7 +497,7 @@ ECPG: opt_array_boundsopt_array_bounds'['']' block
|
|||||||
$$.index2 = mm_strdup($3);
|
$$.index2 = mm_strdup($3);
|
||||||
$$.str = cat_str(4, $1.str, mm_strdup("["), $3, mm_strdup("]"));
|
$$.str = cat_str(4, $1.str, mm_strdup("["), $3, mm_strdup("]"));
|
||||||
}
|
}
|
||||||
ECPG: opt_array_bounds
|
ECPG: opt_array_bounds block
|
||||||
{
|
{
|
||||||
$$.index1 = mm_strdup("-1");
|
$$.index1 = mm_strdup("-1");
|
||||||
$$.index2 = mm_strdup("-1");
|
$$.index2 = mm_strdup("-1");
|
||||||
@ -510,15 +510,6 @@ ECPG: IconstICONST block
|
|||||||
ECPG: AexprConstNULL_P rule
|
ECPG: AexprConstNULL_P rule
|
||||||
| civar { $$ = $1; }
|
| civar { $$ = $1; }
|
||||||
| civarind { $$ = $1; }
|
| civarind { $$ = $1; }
|
||||||
ECPG: ColIdcol_name_keyword rule
|
|
||||||
| ECPGKeywords { $$ = $1; }
|
|
||||||
| ECPGCKeywords { $$ = $1; }
|
|
||||||
| CHAR_P { $$ = mm_strdup("char"); }
|
|
||||||
| VALUES { $$ = mm_strdup("values"); }
|
|
||||||
ECPG: type_function_nametype_func_name_keyword rule
|
|
||||||
| ECPGKeywords { $$ = $1; }
|
|
||||||
| ECPGTypeName { $$ = $1; }
|
|
||||||
| ECPGCKeywords { $$ = $1; }
|
|
||||||
ECPG: VariableShowStmtSHOWALL block
|
ECPG: VariableShowStmtSHOWALL block
|
||||||
{
|
{
|
||||||
mmerror(PARSE_ERROR, ET_ERROR, "SHOW ALL is not implemented");
|
mmerror(PARSE_ERROR, ET_ERROR, "SHOW ALL is not implemented");
|
||||||
|
@ -1,7 +1,13 @@
|
|||||||
#!/usr/bin/perl
|
#!/usr/bin/perl
|
||||||
# src/interfaces/ecpg/preproc/parse.pl
|
# src/interfaces/ecpg/preproc/parse.pl
|
||||||
# parser generator for ecpg version 2
|
# parser generator for ecpg
|
||||||
# call with backend parser as stdin
|
#
|
||||||
|
# See README.parser for some explanation of what this does.
|
||||||
|
#
|
||||||
|
# Command-line options:
|
||||||
|
# --srcdir: where to find ecpg-provided input files (default ".")
|
||||||
|
# --parser: the backend gram.y file to read (required, no default)
|
||||||
|
# --output: where to write preproc.y (required, no default)
|
||||||
#
|
#
|
||||||
# Copyright (c) 2007-2024, PostgreSQL Global Development Group
|
# Copyright (c) 2007-2024, PostgreSQL Global Development Group
|
||||||
#
|
#
|
||||||
@ -148,6 +154,14 @@ dump_buffer('trailer');
|
|||||||
|
|
||||||
close($parserfh);
|
close($parserfh);
|
||||||
|
|
||||||
|
# Cross-check that we don't have dead or ambiguous addon rules.
|
||||||
|
foreach (keys %addons)
|
||||||
|
{
|
||||||
|
die "addon rule $_ was never used\n" if $addons{$_}{used} == 0;
|
||||||
|
die "addon rule $_ was matched multiple times\n" if $addons{$_}{used} > 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
sub main
|
sub main
|
||||||
{
|
{
|
||||||
line: while (<$parserfh>)
|
line: while (<$parserfh>)
|
||||||
@ -487,7 +501,10 @@ sub include_addon
|
|||||||
my $rec = $addons{$block};
|
my $rec = $addons{$block};
|
||||||
return 0 unless $rec;
|
return 0 unless $rec;
|
||||||
|
|
||||||
my $rectype = (defined $rec->{type}) ? $rec->{type} : '';
|
# Track usage for later cross-check
|
||||||
|
$rec->{used}++;
|
||||||
|
|
||||||
|
my $rectype = $rec->{type};
|
||||||
if ($rectype eq 'rule')
|
if ($rectype eq 'rule')
|
||||||
{
|
{
|
||||||
dump_fields($stmt_mode, $fields, ' { ');
|
dump_fields($stmt_mode, $fields, ' { ');
|
||||||
@ -668,10 +685,10 @@ sub dump_line
|
|||||||
}
|
}
|
||||||
|
|
||||||
=top
|
=top
|
||||||
load addons into cache
|
load ecpg.addons into %addons hash. The result is something like
|
||||||
%addons = {
|
%addons = {
|
||||||
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ] },
|
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ], 'used' => 0 },
|
||||||
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ] }
|
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ], 'used' => 0 }
|
||||||
}
|
}
|
||||||
|
|
||||||
=cut
|
=cut
|
||||||
@ -681,17 +698,25 @@ sub preload_addons
|
|||||||
my $filename = $srcdir . "/ecpg.addons";
|
my $filename = $srcdir . "/ecpg.addons";
|
||||||
open(my $fh, '<', $filename) or die;
|
open(my $fh, '<', $filename) or die;
|
||||||
|
|
||||||
# there may be multiple lines starting ECPG: and then multiple lines of code.
|
# There may be multiple "ECPG:" lines and then multiple lines of code.
|
||||||
# the code need to be add to all prior ECPG records.
|
# The block of code needs to be added to each of the consecutively-
|
||||||
my (@needsRules, @code, $record);
|
# preceding "ECPG:" records.
|
||||||
|
my (@needsRules, @code);
|
||||||
|
|
||||||
# there may be comments before the first ECPG line, skip them
|
# there may be comments before the first "ECPG:" line, skip them
|
||||||
my $skip = 1;
|
my $skip = 1;
|
||||||
while (<$fh>)
|
while (<$fh>)
|
||||||
{
|
{
|
||||||
if (/^ECPG:\s(\S+)\s?(\w+)?/)
|
if (/^ECPG:\s+(\S+)\s+(\w+)\s*$/)
|
||||||
{
|
{
|
||||||
|
# Found an "ECPG:" line, so we're done skipping the header
|
||||||
$skip = 0;
|
$skip = 0;
|
||||||
|
# Validate record type and target
|
||||||
|
die "invalid record type $2 in addon rule for $1\n"
|
||||||
|
unless ($2 eq 'block' or $2 eq 'addon' or $2 eq 'rule');
|
||||||
|
die "duplicate addon rule for $1\n" if (exists $addons{$1});
|
||||||
|
# If we had some preceding code lines, attach them to all
|
||||||
|
# as-yet-unfinished records.
|
||||||
if (@code)
|
if (@code)
|
||||||
{
|
{
|
||||||
for my $x (@needsRules)
|
for my $x (@needsRules)
|
||||||
@ -701,20 +726,27 @@ sub preload_addons
|
|||||||
@code = ();
|
@code = ();
|
||||||
@needsRules = ();
|
@needsRules = ();
|
||||||
}
|
}
|
||||||
$record = {};
|
my $record = {};
|
||||||
$record->{type} = $2;
|
$record->{type} = $2;
|
||||||
$record->{lines} = [];
|
$record->{lines} = [];
|
||||||
if (exists $addons{$1}) { die "Ga! there are dups!\n"; }
|
$record->{used} = 0;
|
||||||
$addons{$1} = $record;
|
$addons{$1} = $record;
|
||||||
push(@needsRules, $record);
|
push(@needsRules, $record);
|
||||||
}
|
}
|
||||||
|
elsif (/^ECPG:/)
|
||||||
|
{
|
||||||
|
# Complain if preceding regex failed to match
|
||||||
|
die "incorrect syntax in ECPG line: $_\n";
|
||||||
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
# Non-ECPG line: add to @code unless we're still skipping
|
||||||
next if $skip;
|
next if $skip;
|
||||||
push(@code, $_);
|
push(@code, $_);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
close($fh);
|
close($fh);
|
||||||
|
# Deal with final code block
|
||||||
if (@code)
|
if (@code)
|
||||||
{
|
{
|
||||||
for my $x (@needsRules)
|
for my $x (@needsRules)
|
||||||
|
Reference in New Issue
Block a user