ecpg: clean up documentation of parse.pl, and add more input checking.

README.parser is the user's manual, such as it is, for parse.pl. It's rather poorly written if you ask me; so try to improve it. (More could be written here, but this at least covers the same info in a more organized fashion.) Also, the single solitary line of usage info in parse.pl itself was a lie. Replace. Add some error checks that the ecpg.addons entries meet the syntax rules set forth in README.parser. One of them didn't, but accidentally worked anyway because the logic in include_addon is such that 'block' is the default behavior. Also add a cross-check that each ecpg.addons entry is matched exactly once in the backend grammar. This exposed that there are two dead entries there --- they are dead because the %replace_types table in parse.pl causes their nonterminals to be ignored altogether. Removing them doesn't change the generated preproc.y file. (This implies that check_rules.pl is completely worthless and should be nuked: it adds build cycles and maintenance effort while failing to reliably accomplish its one job of detecting dead rules. I'll do that separately.) Discussion: https://postgr.es/m/2011420.1713493114@sss.pgh.pa.us
2025-07-12 21:01:52 +03:00 · 2024-10-14 13:29:36 -04:00
parent 7be4ba4a9d
commit 00b0e7204d
3 changed files with 121 additions and 63 deletions
--- a/src/interfaces/ecpg/preproc/README.parser
+++ b/src/interfaces/ecpg/preproc/README.parser
@ -1,42 +1,77 @@
-ECPG modifies and extends the core grammar in a way that
+ECPG's grammar (preproc.y) is built by parse.pl from the
-1) every token in ECPG is <str> type. New tokens are
+backend's grammar (gram.y) plus various add-on rules.
-   defined in ecpg.tokens, types are defined in ecpg.type
+Some notes:
 2) most tokens from the core grammar are simply converted
   to literals concatenated together to form the SQL string
   passed to the server, this is done by parse.pl.
 3) some rules need side-effects, actions are either added
   or completely overridden (compared to the basic token
   concatenation) for them, these are defined in ecpg.addons,
   the rules for ecpg.addons are explained below.
 4) new grammar rules are needed for ECPG metacommands.
   These are in ecpg.trailer.
 5) ecpg.header contains common functions, etc. used by
   actions for grammar rules.
-In "ecpg.addons", every modified rule follows this pattern:
+1) Most input matching core grammar productions is simply converted
-       ECPG: dumpedtokens postfix
+   to strings and concatenated together to form the SQL string
-where "dumpedtokens" is simply tokens from core gram.y's
+   passed to the server.  parse.pl can automatically build the
-rules concatenated together. e.g. if gram.y has this:
+   grammar actions needed to do this.
-       ruleA: tokenA tokenB tokenC {...}
+2) Some grammar rules need special actions that are added to or
-then "dumpedtokens" is "ruleAtokenAtokenBtokenC".
+   completely override the default token-concatenation behavior.
-"postfix" above can be:
+   This is controlled by ecpg.addons as explained below.
-a) "block" - the automatic rule created by parse.pl is completely
+3) Additional grammar rules are needed for ECPG's own commands.
-    overridden, the code block has to be written completely as
+   These are in ecpg.trailer, as is the "epilogue" part of preproc.y.
-    it were in a plain bison grammar
+4) ecpg.header contains the "prologue" part of preproc.y, including
-b) "rule" - the automatic rule is extended on, so new syntaxes
+   support functions, Bison options, etc.
-    are accepted for "ruleA". E.g.:
+5) Additional terminals added by ECPG must be defined in ecpg.tokens.
-      ECPG: ruleAtokenAtokenBtokenC rule
+   Additional nonterminals added by ECPG must be defined in ecpg.type.
          | tokenD tokenE { action_code; }
          ...
    It will be substituted with:
      ruleA: <original syntax forms and actions up to and including
                    "tokenA tokenB tokenC">
             | tokenD tokenE { action_code; }
             ...
 c) "addon" - the automatic action for the rule (SQL syntax constructed
    from the tokens concatenated together) is prepended with a new
    action code part. This code part is written as is's already inside
    the { ... }
-Multiple "addon" or "block" lines may appear together with the
+ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
-new code block if the code block is common for those rules.
+copied verbatim into preproc.y at appropriate points.
 ecpg.addons contains entries that begin with a line like
       ECPG: concattokens ruletype
 and typically have one or more following lines that are the code
 for a grammar action.  Any line not starting with "ECPG:" is taken
 to be part of the code block for the preceding "ECPG:" line.
 "concattokens" identifies which gram.y production this entry affects.
 It is simply the target nonterminal and the tokens from the gram.y rule
 concatenated together.  For example, to modify the action for a gram.y
 rule like this:
      target: tokenA tokenB tokenC {...}
 "concattokens" would be "targettokenAtokenBtokenC".  If we want to
 modify a non-first alternative for a nonterminal, we still write the
 nonterminal.  For example, "concattokens" should be "targettokenDtokenE"
 to affect the second alternative in:
      target: tokenA tokenB tokenC {...}
              | tokenD tokenE {...}
 "ruletype" is one of:
 a) "block" - the automatic action that parse.pl would create is
    completely overridden.  Instead the entry's code block is emitted.
    The code block must include the braces ({}) needed for a Bison action.
 b) "addon" - the entry's code block is inserted into the generated
    action, ahead of the automatic token-concatenation code.
    In this case the code block need not contain braces, since
    it will be inserted within braces.
 c) "rule" - the automatic action is emitted, but then the entry's
    code block is added verbatim afterwards.  This typically is
    used to add new alternatives to a nonterminal of the core grammar.
    For example, given the entry:
      ECPG: targettokenAtokenBtokenC rule
          | tokenD tokenE { custom_action; }
    what will be emitted is
      target: tokenA tokenB tokenC { automatic_action; }
          | tokenD tokenE { custom_action; }
 Multiple "ECPG:" entries can share the same code block, if the
 same action is needed for all.  When an "ECPG:" line is immediately
 followed by another one, it is not assigned an empty code block;
 rather the next nonempty code block is assumed to apply to all
 immediately preceding "ECPG:" entries.
 In addition to the modifications specified by ecpg.addons,
 parse.pl contains some tables that list backend grammar
 productions to be ignored or modified.
 Nonterminals that construct strings (as described above) should be
 given <str> type, which is parse.pl's default assumption for
 nonterminals found in gram.y.  That can be overridden at need by
 making an entry in parse.pl's %replace_types table.  %replace_types
 can also be used to suppress output of a nonterminal's rules
 altogether (in which case ecpg.trailer had better provide replacement
 rules, since the nonterminal will still be referred to elsewhere).
--- a/src/interfaces/ecpg/preproc/ecpg.addons
+++ b/src/interfaces/ecpg/preproc/ecpg.addons
@ -497,7 +497,7 @@ ECPG: opt_array_boundsopt_array_bounds'['']' block
 			$$.index2 = mm_strdup($3);
 		$$.str = cat_str(4, $1.str, mm_strdup("["), $3, mm_strdup("]"));
 	}
-ECPG: opt_array_bounds
+ECPG: opt_array_bounds block
 	{
 		$$.index1 = mm_strdup("-1");
 		$$.index2 = mm_strdup("-1");
@ -510,15 +510,6 @@ ECPG: IconstICONST block
 ECPG: AexprConstNULL_P rule
 	| civar							{ $$ = $1; }
 	| civarind						{ $$ = $1; }
 ECPG: ColIdcol_name_keyword rule
 	| ECPGKeywords					{ $$ = $1; }
 	| ECPGCKeywords					{ $$ = $1; }
 	| CHAR_P						{ $$ = mm_strdup("char"); }
 	| VALUES						{ $$ = mm_strdup("values"); }
 ECPG: type_function_nametype_func_name_keyword rule
 	| ECPGKeywords					{ $$ = $1; }
 	| ECPGTypeName					{ $$ = $1; }
 	| ECPGCKeywords					{ $$ = $1; }
 ECPG: VariableShowStmtSHOWALL block
 	{
 		mmerror(PARSE_ERROR, ET_ERROR, "SHOW ALL is not implemented");
--- a/src/interfaces/ecpg/preproc/parse.pl
+++ b/src/interfaces/ecpg/preproc/parse.pl
@ -1,7 +1,13 @@
 #!/usr/bin/perl
 # src/interfaces/ecpg/preproc/parse.pl
-# parser generator for ecpg version 2
+# parser generator for ecpg
-# call with backend parser as stdin
+#
 # See README.parser for some explanation of what this does.
 #
 # Command-line options:
 #   --srcdir: where to find ecpg-provided input files (default ".")
 #   --parser: the backend gram.y file to read (required, no default)
 #   --output: where to write preproc.y (required, no default)
 #
 # Copyright (c) 2007-2024, PostgreSQL Global Development Group
 #
@ -148,6 +154,14 @@ dump_buffer('trailer');
 close($parserfh);
 # Cross-check that we don't have dead or ambiguous addon rules.
 foreach (keys %addons)
 {
 	die "addon rule $_ was never used\n" if $addons{$_}{used} == 0;
 	die "addon rule $_ was matched multiple times\n" if $addons{$_}{used} > 1;
 }
 sub main
 {
  line: while (<$parserfh>)
@ -487,7 +501,10 @@ sub include_addon
 	my $rec = $addons{$block};
 	return 0 unless $rec;
-	my $rectype = (defined $rec->{type}) ? $rec->{type} : '';
+	# Track usage for later cross-check
 	$rec->{used}++;
 	my $rectype = $rec->{type};
 	if ($rectype eq 'rule')
 	{
 		dump_fields($stmt_mode, $fields, ' { ');
@ -668,10 +685,10 @@ sub dump_line
 }
 =top
-	load addons into cache
+	load ecpg.addons into %addons hash.  The result is something like
 	%addons = {
-		stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ] },
+		stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ], 'used' => 0 },
-		stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ] }
+		stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ], 'used' => 0 }
 	}
 =cut
@ -681,17 +698,25 @@ sub preload_addons
 	my $filename = $srcdir . "/ecpg.addons";
 	open(my $fh, '<', $filename) or die;
-	# there may be multiple lines starting ECPG: and then multiple lines of code.
+	# There may be multiple "ECPG:" lines and then multiple lines of code.
-	# the code need to be add to all prior ECPG records.
+	# The block of code needs to be added to each of the consecutively-
-	my (@needsRules, @code, $record);
+	# preceding "ECPG:" records.
 	my (@needsRules, @code);
-	# there may be comments before the first ECPG line, skip them
+	# there may be comments before the first "ECPG:" line, skip them
 	my $skip = 1;
 	while (<$fh>)
 	{
-		if (/^ECPG:\s(\S+)\s?(\w+)?/)
+		if (/^ECPG:\s+(\S+)\s+(\w+)\s*$/)
 		{
 			# Found an "ECPG:" line, so we're done skipping the header
 			$skip = 0;
 			# Validate record type and target
 			die "invalid record type $2 in addon rule for $1\n"
 			  unless ($2 eq 'block' or $2 eq 'addon' or $2 eq 'rule');
 			die "duplicate addon rule for $1\n" if (exists $addons{$1});
 			# If we had some preceding code lines, attach them to all
 			# as-yet-unfinished records.
 			if (@code)
 			{
 				for my $x (@needsRules)
@ -701,20 +726,27 @@ sub preload_addons
 				@code = ();
 				@needsRules = ();
 			}
-			$record = {};
+			my $record = {};
 			$record->{type} = $2;
 			$record->{lines} = [];
-			if (exists $addons{$1}) { die "Ga! there are dups!\n"; }
+			$record->{used} = 0;
 			$addons{$1} = $record;
 			push(@needsRules, $record);
 		}
 		elsif (/^ECPG:/)
 		{
 			# Complain if preceding regex failed to match
 			die "incorrect syntax in ECPG line: $_\n";
 		}
 		else
 		{
 			# Non-ECPG line: add to @code unless we're still skipping
 			next if $skip;
 			push(@code, $_);
 		}
 	}
 	close($fh);
 	# Deal with final code block
 	if (@code)
 	{
 		for my $x (@needsRules)