1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-25 13:17:41 +03:00
Commit Graph

489 Commits

Author SHA1 Message Date
Daniel Gustafsson
aa12781b0d Improve publication error messages
Commit 81d5995b4b introduced more fine-grained errormessages for
incorrect relkinds for publication, while unlogged and temporary
tables were reported with using the same message.  This provides
separate error messages for these types of relpersistence.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALj2ACW9S=AswyQHjtO6WMcsergMkCBTtzXGrM8DX26DzfeTLQ@mail.gmail.com
2021-11-17 14:40:38 +01:00
Tom Lane
f8abb0f5e1 postgres_fdw: suppress casts on constants in limited cases.
When deparsing an expression of the form "remote_var OP constant",
we'd normally apply a cast to the constant to make sure that the
remote parser thinks it's of the same type we do.  However, doing
so is often not necessary, and it causes problems if the user has
intentionally declared the local column as being of a different
type than the remote column.  A plausible use-case for that is
using text to represent a type that's an enum on the remote side.
A comparison on such a column will get shipped as "var = 'foo'::text",
which blows up on the remote side because there's no enum = text
operator.  But if we simply leave off the explicit cast, the
comparison will do exactly what the user wants.

It's possible to do this without major risk of semantic problems, by
relying on the longstanding parser heuristic that "if one operand of
an operator is of type unknown, while the other one has a known type,
assume that the unknown operand is also of that type".  Hence, this
patch leaves off the cast only if (a) the operator inputs have the same
type locally; (b) the constant will print as a string literal or NULL,
both of which are initially taken as type unknown; and (c) the non-Const
input is a plain foreign Var.  Rule (c) guarantees that the remote
parser will know the type of the non-Const input; moreover, it means
that if this cast-omission does cause any semantic surprises, that can
only happen in cases where the local column has a different type than
the remote column.  That wasn't guaranteed to work anyway, and this
patch should represent a net usability gain for such cases.

One point that I (tgl) remain slightly uncomfortable with is that we
will ignore an implicit RelabelType when deciding if the non-Const input
is a plain Var.  That makes it a little squishy to argue that the remote
should resolve the Const as being of the same type as its Var, because
then our Const is not the same type as our Var.  However, if we don't do
that, then this hack won't work as desired if the user chooses to use
varchar rather than text to represent some remote column.  That seems
useful, so do it like this for now.  We might have to give up the
RelabelType-ignoring bit if any problems surface.

Dian Fay, with review and kibitzing by me

Discussion: https://postgr.es/m/C9LU294V7K4F.34LRRDU449O45@lamia
2021-11-12 11:50:47 -05:00
Fujii Masao
5fedf7417b Improve HINT message that FDW reports when there are no valid options.
The foreign data wrapper's validator function provides a HINT message with
list of valid options for the object specified in CREATE or ALTER command,
when the option given in the command is invalid. Previously
postgresql_fdw_validator() and the validator functions for postgres_fdw and
dblink_fdw worked in that way even there were no valid options in the object,
which could lead to the HINT message with empty list (because there were
no valid options). For example, ALTER FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (format 'csv') reported the following ERROR and HINT messages.
This behavior was confusing.

    ERROR: invalid option "format"
    HINT: Valid options in this context are:

There is no such issue in file_fdw. The validator function for file_fdw
reports the HINT message "There are no valid options in this context."
instead in that case.

This commit improves postgresql_fdw_validator() and the validator functions
for postgres_fdw and dblink_fdw so that they do likewise. For example,
this change causes the above ALTER FOREIGN DATA WRAPPER command to
report the following messages.

    ERROR:  invalid option "nonexistent"
    HINT:  There are no valid options in this context.

Author: Kosei Masumura
Reviewed-by: Bharath Rupireddy, Fujii Masao
Discussion: https://postgr.es/m/557d06cebe19081bfcc83ee2affc98d3@oss.nttdata.com
2021-10-27 00:46:52 +09:00
Etsuro Fujita
8c7be86883 postgres_fdw: Move comments about elog level in (sub)abort cleanup.
The comments were misplaced when adding postgres_fdw.  Fix that by
moving the comments to more appropriate functions.

Author: Etsuro Fujita
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAPmGK164sAXQtC46mDFyu6d-T25Mzvh5qaRNkit06VMmecYnOA%40mail.gmail.com
2021-10-13 19:00:00 +09:00
Etsuro Fujita
972c7c6567 postgres_fdw: Fix comments in connection.c.
Commit 27e1f1456 missed updating some comments.

Reviewed-by: Bharath Rupireddy
Backpatch-through: 14
Discussion: https://postgr.es/m/CAPmGK15Q2Nm6U%2Ba_GwskrWFEVBZ9_3VKOvRrprGufpx91M_3Sw%40mail.gmail.com
2021-10-07 18:15:00 +09:00
Tom Lane
3071bbfe44 Fix null-pointer crash in postgres_fdw's conversion_error_callback.
Commit c7b7311f6 adjusted conversion_error_callback to always use
information from the query's rangetable, to avoid doing catalog lookups
in an already-failed transaction.  However, as a result of the utterly
inadequate documentation for make_tuple_from_result_row, I failed to
realize that fsstate could be NULL in some contexts.  That led to a
crash if we got a conversion error in such a context.  Fix by falling
back to the previous coding when fsstate is NULL.  Improve the
commentary, too.

Per report from Andrey Borodin.  Back-patch to 9.6, like the previous
patch.

Discussion: https://postgr.es/m/08916396-55E4-4D68-AB3A-BD6066F9E5C0@yandex-team.ru
2021-10-06 15:50:24 -04:00
Fujii Masao
85c6961128 postgres_fdw: Refactor transaction rollback code to avoid code duplication.
In postgres_fdw, pgfdw_xact_callback() and pgfdw_subxact_callback()
callback functions do almost the same thing to rollback remote toplevel-
and sub-transaction. But previously their such rollback logics were
implemented separately in each function and in different way. Which
could decrease the readability and maintainability of the code.

To fix the issue, this commit creates the common function to rollback
remote transactions, and makes those callback functions use it. Which
allows us to avoid unnecessary code duplication.

Author: Fujii Masao
Reviewed-by: Zhihong Yu, Bharath Rupireddy
Discussion: https://postgr.es/m/62fbb63a-d46c-fb47-a56d-f6be1909aa44@oss.nttdata.com
2021-09-22 23:47:36 +09:00
Noah Misch
b073c3ccd0 Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.
This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a new database cluster from scratch may
need to create a schema, grant more privileges, etc.  Out-of-tree test
suites may require such updates.

Reviewed by Peter Eisentraut.

Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com
2021-09-09 23:38:09 -07:00
Peter Eisentraut
639a86e36a Remove Value node struct
The Value node struct is a weird construct.  It is its own node type,
but most of the time, it actually has a node type of Integer, Float,
String, or BitString.  As a consequence, the struct name and the node
type don't match most of the time, and so it has to be treated
specially a lot.  There doesn't seem to be any value in the special
construct.  There is very little code that wants to accept all Value
variants but nothing else (and even if it did, this doesn't provide
any convenient way to check it), and most code wants either just one
particular node type (usually String), or it accepts a broader set of
node types besides just Value.

This change removes the Value struct and node type and replaces them
by separate Integer, Float, String, and BitString node types that are
proper node types and structs of their own and behave mostly like
normal node types.

Also, this removes the T_Null node tag, which was previously also a
possible variant of Value but wasn't actually used outside of the
Value contained in A_Const.  Replace that by an isnull field in
A_Const.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
2021-09-09 08:36:53 +02:00
Fujii Masao
98dbef90eb postgres_fdw: Revert unstable tests for postgres_fdw.application_name.
Commit 449ab63505 added the tests that check that postgres_fdw.application_name
GUC works as expected. But they were unstable and caused some buildfarm
members to report the failure. This commit reverts those unstable tests.

Reported-by: Tom Lane as per buildfarm
Discussion: https://postgr.es/m/3220909.1631054766@sss.pgh.pa.us
2021-09-08 16:28:43 +09:00
Fujii Masao
449ab63505 postgres_fdw: Allow application_name of remote connection to be set via GUC.
This commit adds postgres_fdw.application_name GUC which specifies
a value for application_name configuration parameter used
when postgres_fdw establishes a connection to a foreign server.
This GUC setting always overrides application_name option of
the foreign server object. This GUC is useful when we want to
specify our own application_name per remote connection.

Previously application_name of a remote connection could be set
basically only via options of a server object. But which meant that
every session connecting to the same foreign server basically
should use the same application_name. Also if we want to change
the setting, we had to execute "ALTER SERVER ... OPTIONS ..." command.
It was inconvenient.

Author: Hayato Kuroda
Reviewed-by: Masahiro Ikeda, Fujii Masao
Discussion: https://postgr.es/m/TYCPR01MB5870D1E8B949DAF6D3B84E02F5F29@TYCPR01MB5870.jpnprd01.prod.outlook.com
2021-09-07 12:27:30 +09:00
Tom Lane
2dc53fe2a7 Refactor postgresImportForeignSchema to avoid code duplication.
Avoid repeating fragments of the query we're building, along the
same lines as recent cleanup in pg_dump.  I got annoyed about this
because aa769f80e broke my pending patch to change postgres_fdw's
collation handling, due to each of us having incompletely done
this same refactoring.  Let's finish that job in hopes of having
a more stable base.
2021-09-01 16:21:13 -04:00
Etsuro Fujita
aa769f80ed postgres_fdw: Fix issues with generated columns in foreign tables.
postgres_fdw imported generated columns from the remote tables as plain
columns, and caused failures like "ERROR: cannot insert a non-DEFAULT
value into column "foo"" when inserting into the foreign tables, as it
tried to insert values into the generated columns.  To fix, we do the
following under the assumption that generated columns in a postgres_fdw
foreign table are defined so that they represent generated columns in
the underlying remote table:

* Send DEFAULT for the generated columns to the foreign server on insert
  or update, not generated column values computed on the local server.
* Add to postgresImportForeignSchema() an option "import_generated" to
  include column generated expressions in the definitions of foreign
  tables imported from a foreign server.  The option is true by default.

The assumption seems reasonable, because that would make a query of the
postgres_fdw foreign table return values for the generated columns that
are consistent with the generated expression.

While here, fix another issue in postgresImportForeignSchema(): it tried
to include column generated expressions as column default expressions in
the foreign table definitions when the import_default option was enabled.

Per bug #16631 from Daniel Cherniy.  Back-patch to v12 where generated
columns were added.

Discussion: https://postgr.es/m/16631-e929fe9db0ffc7cf%40postgresql.org
2021-08-05 20:00:00 +09:00
Etsuro Fujita
a8ed9bd59d Fix oversight in commit 1ec7fca859.
I failed to account for the possibility that when
ExecAppendAsyncEventWait() notifies multiple async-capable nodes using
postgres_fdw, a preceding node might invoke process_pending_request() to
process a pending asynchronous request made by a succeeding node.  In
that case the succeeding node should produce a tuple to return to the
parent Append node from tuples fetched by process_pending_request() when
notified.  Repair.

Per buildfarm via Michael Paquier.  Back-patch to v14, like the previous
commit.

Thanks to Tom Lane for testing.

Discussion: https://postgr.es/m/YQP0UPT8KmPiHTMs%40paquier.xyz
2021-08-02 12:45:00 +09:00
Tom Lane
5d44fff01e In postgres_fdw, allow CASE expressions to be pushed to the remote server.
This is simple enough except for the need to check whether CaseTestExpr
nodes have a collation that is not derived from a remote Var.  For that,
examine the CASE's "arg" expression and then pass that info down into the
recursive examination of the WHEN expressions.

Alexander Pyhalov, reviewed by Gilles Darold and myself

Discussion: https://postgr.es/m/fda09032e90d85d9b726a41e03f9097f@postgrespro.ru
2021-07-30 13:39:48 -04:00
Etsuro Fujita
1ec7fca859 postgres_fdw: Fix handling of pending asynchronous requests.
A pending asynchronous request is handled by process_pending_request(),
which previously not only processed an in-progress remote query but
performed ExecForeignScan() to produce a tuple to return to the local
server asynchronously from the result of the remote query.  But that led
to a server crash when executing a query or led to an "InstrStartNode
called twice in a row" or "InstrEndLoop called on running node" failure
when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it
contained multiple async-capable nodes accessing the same
initplan/subplan that contained multiple async-capable nodes scanning
the same foreign tables as for the parent async-capable nodes, as
reported by Andrey Lepikhov.  The reason is that the second step in
process_pending_request() invoked when executing the initplan/subplan
for one of the parent async-capable nodes caused recursive execution of
the initplan/subplan for another of the parent async-capable nodes.

To fix, split process_pending_request() into the two steps and postpone
the second step until ForeignAsyncConfigureWait() is called for each of
the pending asynchronous requests.  Also, in ExecAppendAsyncEventWait()
we assumed that FDWs would register at least one wait event in a
WaitEventSet created there when they were called from
ForeignAsyncConfigureWait() in that function, but allow FDWs to register
zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait()
to just return in that case.

Oversight in commit 27e1f1456.  Back-patch to v14 where that commit went
in.

Andrey Lepikhov and Etsuro Fujita

Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru
2021-07-30 17:00:00 +09:00
Fujii Masao
0e1275fb07 Avoid using ambiguous word "non-negative" in error messages.
The error messages using the word "non-negative" are confusing
because it's ambiguous about whether it accepts zero or not.
This commit improves those error messages by replacing it with
less ambiguous word like "greater than zero" or
"greater than or equal to zero".

Also this commit added the note about the word "non-negative" to
the error message style guide, to help writing the new error messages.

When postgres_fdw option fetch_size was set to zero, previously
the error message "fetch_size requires a non-negative integer value"
was reported. This error message was outright buggy. Therefore
back-patch to all supported versions where such buggy error message
could be thrown.

Reported-by: Hou Zhijie
Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/OS0PR01MB5716415335A06B489F1B3A8194569@OS0PR01MB5716.jpnprd01.prod.outlook.com
2021-07-28 01:20:16 +09:00
Tom Lane
48c5c90682 Use the "pg_temp" schema alias in EXPLAIN and related output.
This patch causes EXPLAIN output to refer to objects that are in
the current session's temp schema with the "pg_temp" schema alias
rather than that schema's actual name.  This is useful for our own
testing purposes since it will stabilize EXPLAIN VERBOSE output
for such cases, allowing us to use that in regression tests.
It should be less confusing for end users too.

Since ruleutils.c needs to change behavior for this, the change
also leaks into a few other users of ruleutils.c, for example
pg_get_viewdef().  AFAICS that won't cause any problems.
We did find that aggressively trying to change this behavior
across-the-board would cause issues, but as long as "pg_temp"
only appears within generated SQL text, I think it'll be fine.

Along the way, make get_namespace_name_or_temp conform to the
same API as get_namespace_name, ie that it returns a palloc'd
string or NULL.  The current behavior hasn't caused any bugs
since no callers attempt to pfree the result, but if it gets
more widespread usage that could become a problem.

Amul Sul, reviewed and extended by me

Discussion: https://postgr.es/m/CAAJ_b97W=QaGmag9AhWNbmx3uEYsNkXWL+OVW1_E1D3BtgWvtw@mail.gmail.com
2021-07-27 12:03:16 -04:00
David Rowley
83f4fcc655 Change the name of the Result Cache node to Memoize
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough.  That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize".  People seem to like "Memoize", so let's do the rename.

Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
2021-07-14 12:43:58 +12:00
Tom Lane
d68a003912 Rename debug_invalidate_system_caches_always to debug_discard_caches.
The name introduced by commit 4656e3d66 was agreed to be unreasonably
long.  To match this change, rename initdb's recently-added
--clobber-cache option to --discard-caches.

Discussion: https://postgr.es/m/1374320.1625430433@sss.pgh.pa.us
2021-07-13 15:01:01 -04:00
Tom Lane
b9734c13f1 Fix crash in postgres_fdw for provably-empty remote UPDATE/DELETE.
In 86dc90056, I'd written find_modifytable_subplan with the assumption
that if the immediate child of a ModifyTable is a Result, it must be
a projecting Result with a subplan.  However, if the UPDATE or DELETE
has a provably-constant-false WHERE clause, that's not so: we'll
generate a dummy subplan with a childless Result.  Add the missing
null-check so we don't crash on such cases.

Per report from Alexander Pyhalov.

Discussion: https://postgr.es/m/b9a6f53549456b2f3e2fd150dcd79d72@postgrespro.ru
2021-07-07 15:21:25 -04:00
Fujii Masao
d854720df6 postgres_fdw: Tighten up allowed values for batch_size, fetch_size options.
Previously the values such as '100$%$#$#', '9,223,372,' were accepted and
treated as valid integers for postgres_fdw options batch_size and fetch_size.
Whereas this is not the case with fdw_startup_cost and fdw_tuple_cost options
for which an error is thrown. This was because endptr was not used
while converting strings to integers using strtol.

This commit changes the logic so that it uses parse_int function
instead of strtol as it serves the purpose by returning false in case
if it is unable to convert the string to integer. Note that
this function also rounds off the values such as '100.456' to 100 and
'100.567' or '100.678' to 101.

While on this, use parse_real for fdw_startup_cost and fdw_tuple_cost options.

Since parse_int and parse_real are being used for reloptions and GUCs,
it is more appropriate to use in postgres_fdw rather than using strtol
and strtod directly.

Back-patch to v14.

Author: Bharath Rupireddy
Reviewed-by: Ashutosh Bapat, Tom Lane, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/CALj2ACVMO6wY5Pc4oe1OCgUOAtdjHuFsBDw8R5uoYR86eWFQDA@mail.gmail.com
2021-07-07 11:13:40 +09:00
Tom Lane
c7b7311f61 Avoid doing catalog lookups in postgres_fdw's conversion_error_callback.
As in 50371df26, this is a bad idea since the callback can't really
know what error is being thrown and thus whether or not it is safe
to attempt catalog accesses.  Rather than pushing said accesses into
the mainline code where they'd usually be a waste of cycles, we can
look at the query's rangetable instead.

This change does mean that we'll be printing query aliases (if any
were used) rather than the table or column's true name.  But that
doesn't seem like a bad thing: it's certainly a more useful definition
in self-join cases, for instance.  In any case, it seems unlikely that
any applications would be depending on this detail, so it seems safe
to change.

Patch by me.  Original complaint by Andres Freund; Bharath Rupireddy
noted the connection to conversion_error_callback.

Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
2021-07-06 12:36:12 -04:00
Tom Lane
8021770909 Further stabilize postgres_fdw test.
The queries involving ft1_nopw don't stably return the same row
anymore.  I surmise that an autovacuum hitting "S 1"."T 1"
right after the updates introduced by f61db909d/5843659d0 freed
some space, changing where subsequent insertions get stored.
It's only by good luck that these results were stable before,
though, since a LIMIT without ORDER BY isn't well defined,
and it's not like we've ever treated that table as append-only
in this test script.

Since we only really care whether these commands succeed or not,
just replace "SELECT *" with "SELECT 1".

Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2021-06-23%2019%3A52%3A08
2021-06-24 15:02:13 -04:00
Tom Lane
5843659d09 Stabilize test case added by commit f61db909d.
Buildfarm members ayu and tern have sometimes shown a different
plan than expected for this query.  I'd been unable to reproduce
that before today, but I finally realized what is happening.
If there is a concurrent open transaction (probably an autovacuum
run in the buildfarm, but this can also be arranged manually),
then the index entries for the rows removed by the DELETE a few
lines up are not killed promptly, causing a change in the planner's
estimate of the extremal value of ft2.c1, which moves the rowcount
estimate for "c1 > 1100" by enough to change the join plan from
nestloop to hash.

To fix, change the query condition to "c1 > 1000", causing the
hash plan to be preferred whether or not a concurrent open
transaction exists.  Since this UPDATE is tailored to be a no-op,
nothing else changes.

Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-09%2022%3A45%3A48
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-13%2022%3A38%3A18
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-06-20%2004%3A55%3A36
2021-06-20 11:48:44 -04:00
Tomas Vondra
99cea49d65 Fix copying data into slots with FDW batching
Commit b676ac443b optimized handling of tuple slots with bulk inserts
into foreign tables, so that the slots are initialized only once and
reused for all batches. The data was however copied into the slots only
after the initialization, inserting duplicate values when the slot gets
reused. Fixed by moving the ExecCopySlot outside the init branch.

The existing postgres_fdw tests failed to catch this due to inserting
data into foreign tables without unique indexes, and then checking only
the number of inserted rows. This adds a new test with both a unique
index and a check of inserted values.

Reported-by: Alexander Pyhalov
Discussion: https://postgr.es/m/7a8cf8d56b3d18e5c0bccd6cd42d04ac%40postgrespro.ru
2021-06-16 23:49:25 +02:00
Tomas Vondra
cb92703384 Adjust batch size in postgres_fdw to not use too many parameters
The FE/BE protocol identifies parameters with an Int16 index, which
limits the maximum number of parameters per query to 65535. With
batching added to postges_fdw this limit is much easier to hit, as
the whole batch is essentially a single query, making this error much
easier to hit.

The failures are a bit unpredictable, because it also depends on the
number of columns in the query. So instead of just failing, this patch
tweaks the batch_size to not exceed the maximum number of parameters.

Reported-by: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com
2021-06-08 20:28:31 +02:00
Etsuro Fujita
f3baaf28a6 Fix rescanning of async-aware Append nodes.
In cases where run-time pruning isn't required, the synchronous and
asynchronous subplans for an async-aware Append node determined using
classify_matching_subplans() should be re-used when rescanning the node,
but the previous code re-determined them using that function repeatedly
each time when rescanning the node, leading to incorrect results in a
normal build and an Assert failure in an Assert-enabled build as that
function doesn't assume that it's called repeatedly in such cases.  Fix
the code as mentioned above.

My oversight in commit 27e1f1456.

While at it, initialize async-related pointers/variables to NULL/zero
explicitly in ExecInitAppend() and ExecReScanAppend(), just to be sure.
(The variables would have been set to zero before we get to the latter
function, but let's do so.)

Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAPmGK16Q4B2_KY%2BJH7rb7wQbw54AUprp7TMekGTd2T1B62yysQ%40mail.gmail.com
2021-06-07 12:45:00 +09:00
Tom Lane
f61db909df Fix postgres_fdw failure with whole-row Vars of type RECORD.
Commit 86dc90056 expects that FDWs can cope with whole-row Vars for
their tables, even if the Vars are marked with vartype RECORDOID.
Previously, whole-row Vars generated by the planner had vartype equal
to the relevant table's rowtype OID.  (The point behind this change is
to enable sharing of resjunk columns across inheritance child tables.)

It turns out that postgres_fdw fails to cope with this, though through
bad fortune none of its test cases exposed that.  Things mostly work,
but when we try to read back a value of such a Var, the expected
rowtype is not available to record_in().  Fortunately, it's not
difficult to hack up the tupdesc that controls this process to
substitute the foreign table's rowtype for RECORDOID.  Thus we can
solve the runtime problem while still sharing the resjunk column
with other tables.

Per report from Alexander Pyhalov.

Discussion: https://postgr.es/m/7817fb9ebd6661cdf9b67dec6e129a78@postgrespro.ru
2021-06-04 20:07:08 -04:00
Tom Lane
889592344c Fix planner's row-mark code for inheritance from a foreign table.
Commit 428b260f8 broke planning of cases where row marks are needed
(SELECT FOR UPDATE, etc) and one of the query's tables is a foreign
table that has regular table(s) as inheritance children.  We got the
reverse case right, but apparently were thinking that foreign tables
couldn't be inheritance parents.  Not so; so we need to be able to
add a CTID junk column while adding a new child, not only a wholerow
junk column.

Back-patch to v12 where the faulty code came in.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqEmo3FV1LAQ4TVyS2h1WM=kMkZUmbNuZSCnfHvMcUcPeA@mail.gmail.com
2021-06-02 14:38:14 -04:00
Etsuro Fujita
a784859f44 Prevent asynchronous execution of direct foreign-table modifications.
Commits 27e1f1456 and 86dc90056, which were independently discussed,
cause a crash when executing an inherited foreign UPDATE/DELETE query
with asynchronous execution enabled, where children of an Append node
that is the direct/indirect child of the ModifyTable node are rewritten
so as to modify foreign tables directly by postgresPlanDirectModify();
as in that case the direct modifications are executed asynchronously,
which is not currently supported by asynchronous execution.  Fix by
disabling asynchronous execution of the direct modifications in that
function.

Author: Etsuro Fujita
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAPmGK158e9sJOfuWxfn%2B0ynrspXQU3JhNjSCbaoeSzMvnga%2Bbw%40mail.gmail.com
2021-05-13 20:00:00 +09:00
Tom Lane
def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Etsuro Fujita
a363bc6da9 Fix EXPLAIN ANALYZE for async-capable nodes.
EXPLAIN ANALYZE for an async-capable ForeignScan node associated with
postgres_fdw is done just by using instrumentation for ExecProcNode()
called from the node's callbacks, causing the following problems:

1) If the remote table to scan is empty, the node is incorrectly
   considered as "never executed" by the command even if the node is
   executed, as ExecProcNode() isn't called from the node's callbacks at
   all in that case.
2) The command fails to collect timings for things other than
   ExecProcNode() done in the node, such as creating a cursor for the
   node's remote query.

To fix these problems, add instrumentation for async-capable nodes, and
modify postgres_fdw accordingly.

My oversight in commit 27e1f1456.

While at it, update a comment for the AsyncRequest struct in execnodes.h
and the documentation for the ForeignAsyncRequest API in fdwhandler.sgml
to match the code in ExecAsyncAppendResponse() in nodeAppend.c, and fix
typos in comments in nodeAppend.c.

Per report from Andrey Lepikhov, though I didn't use his patch.

Reviewed-by: Andrey Lepikhov
Discussion: https://postgr.es/m/2eb662bb-105d-fc20-7412-2f027cc3ca72%40postgrespro.ru
2021-05-12 14:00:00 +09:00
Tomas Vondra
c6a01d9249 Copy the INSERT query in postgres_fdw
When executing the INSERT with batching, we may need to rebuild the
query when the batch size changes, in which case we pfree the current
string. We must not release the original string, stored in fdw_private,
because that may be needed in EXPLAIN ANALYZE. So make copy of the SQL,
but only for INSERT queries.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRCL_Rjw-MCR6J7VX9OF7MR6PA5K8qUbrMvprW_e-aHkfQ%40mail.gmail.com
2021-05-07 22:29:43 +02:00
Tom Lane
1273a15bf9 Disable cache clobber to avoid breaking postgres_fdw termination test.
Commit 93f414614 improved a pre-existing test case so that it would
show whether or not termination of the "remote" worker process happened.
This soon exposed that, when debug_invalidate_system_caches_always
(nee CLOBBER_CACHE_ALWAYS) is enabled, no such termination occurs.
That's because cache invalidation forces postgres_fdw connections
to be dropped at end of transaction, so that there's no worker to
terminate.  There's a race condition as to whether the worker will
manage to get out of the BackendStatusArray before we look, but at
least on buildfarm member hyrax, it's failed twice in two attempts.

Rather than re-lobotomizing the test, let's fix this by transiently
disabling debug_invalidate_system_caches_always.  (Hooray for that
being just a GUC nowadays, rather than a compile-time option.)
If this proves not to be enough to make the test stable, we can
do the other thing instead.

Discussion: https://postgr.es/m/3854538.1620081771@sss.pgh.pa.us
2021-05-04 13:36:26 -04:00
Fujii Masao
8e9ea08bae Don't pass "ONLY" options specified in TRUNCATE to foreign data wrapper.
Commit 8ff1c94649 allowed TRUNCATE command to truncate foreign tables.
Previously the information about "ONLY" options specified in TRUNCATE
command were passed to the foreign data wrapper. Then postgres_fdw
constructed the TRUNCATE command to issue the remote server and
included "ONLY" options in it based on the passed information.

On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE
have no effect when accessing or modifying the remote table, i.e.,
are not passed to the foreign data wrapper. So it's inconsistent to
make only TRUNCATE command pass the "ONLY" options to the foreign data
wrapper. Therefore this commit changes the TRUNCATE command so that
it doesn't pass the "ONLY" options to the foreign data wrapper,
for the consistency with other statements. Also this commit changes
postgres_fdw so that it always doesn't include "ONLY" options in
the TRUNCATE command that it constructs.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu
Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com
2021-04-27 14:41:27 +09:00
Etsuro Fujita
bb684c82f7 Minor code cleanup in asynchronous execution support.
This is cleanup for commit 27e1f1456:

* ExecAppendAsyncEventWait(), which was modified a bit further by commit
  a8af856d3, duplicated the same nevents calculation.  Simplify the code
  a little bit to avoid the duplication.  Update comments there.
* Add an assertion to ExecAppendAsyncRequest().
* Update a comment about merging the async_capable options from input
  relations in merge_fdw_options(), per complaint from Kyotaro Horiguchi.
* Add a comment for fetch_more_data_begin().

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK1637W30Wx3MnrReewhafn6F_0J76mrJGoFXFnpPq4QfvA%40mail.gmail.com
2021-04-23 12:00:00 +09:00
Michael Paquier
93f4146144 Simplify tests of postgres_fdw terminating connections
The tests introduced in 32a9c0b for connections broken and
re-established rely on pg_terminate_backend() for their logic.  When
these were introduced, this function simply sent a signal to a backend
without waiting for the operation to complete, and the tests repeatedly
looked at pg_stat_activity to check if the operation was completed or
not.  Since aaf0432, it is possible to define a timeout to make
pg_terminate_backend() wait for a certain duration, so make use of it,
with a timeout reasonably large enough (3min) to give enough room for
the tests to pass even on slow machines.

Some measurements show that the tests of postgres_fdw are much faster
with this change.  For example, on my laptop, they now take 4s instead
of 6s.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXGY_EfGrMTjKjHy2zi-u1u9rdeioU_fro0T6Jo8t56KQ@mail.gmail.com
2021-04-14 14:23:53 +09:00
Fujii Masao
8ff1c94649 Allow TRUNCATE command to truncate foreign tables.
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.

The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.

For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.

Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
2021-04-08 20:56:08 +09:00
Peter Eisentraut
5c55dc8b47 libpq: Set Server Name Indication (SNI) for SSL connections
By default, have libpq set the TLS extension "Server Name Indication" (SNI).

This allows an SNI-aware SSL proxy to route connections.  (This
requires a proxy that is aware of the PostgreSQL protocol, not just
any SSL proxy.)

In the future, this could also allow the server to use different SSL
certificates for different host specifications.  (That would require
new server functionality.  This would be the client-side functionality
for that.)

Since SNI makes the host name appear in cleartext in the network
traffic, this might be undesirable in some cases.  Therefore, also add
a libpq connection option "sslsni" to turn it off.

Discussion: https://www.postgresql.org/message-id/flat/7289d5eb-62a5-a732-c3b9-438cee2cb709%40enterprisedb.com
2021-04-07 15:11:41 +02:00
Fujii Masao
a3740c48eb postgres_fdw: Allow partitions specified in LIMIT TO to be imported.
Commit f49bcd4ef3 disallowed postgres_fdw to import table partitions.
Because all data can be accessed through the partitioned table which
is the root of the partitioning hierarchy, importing only partitioned
table should allow access to all the data without creating extra objects.
This is a reasonable default when importing a whole schema. But there
may be the case where users want to explicitly import one of
a partitioned tables' partitions.

For that use case, this commit allows postgres_fdw to import tables or
foreign tables which are partitions of some other table only when they
are explicitly specified in LIMIT TO clause.  It doesn't change
the behavior that any partitions not specified in LIMIT TO are
automatically excluded in IMPORT FOREIGN SCHEMA command.

Author: Matthias van de Meent
Reviewed-by: Bernd Helmle, Amit Langote, Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/CAEze2Whwg4i=mzApMe+PXxCEfgoZmHGqdqQFW7J4bmj_5p6t1A@mail.gmail.com
2021-04-07 02:32:10 +09:00
Fujii Masao
b1be3074ac postgres_fdw: Add option to control whether to keep connections open.
This commit adds a new option keep_connections that controls
whether postgres_fdw keeps the connections to the foreign server open
so that the subsequent queries can re-use them. This option can only be
specified for a foreign server. The default is on. If set to off,
all connections to the foreign server will be discarded
at the end of transaction. Closed connections will be re-established
when they are necessary by future queries using a foreign table.

This option is useful, for example, when users want to prevent
the connections from eating up the foreign servers connections
capacity.

Author: Bharath Rupireddy
Reviewed-by: Alexey Kondratov, Vignesh C, Fujii Masao
Discussion: https://postgr.es/m/CALj2ACVvrp5=AVp2PupEm+nAC8S4buqR3fJMmaCoc7ftT0aD2A@mail.gmail.com
2021-04-02 19:45:42 +09:00
David Rowley
9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
David Rowley
28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley
b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00
Tom Lane
8998e3cafa Silence compiler warning in non-assert builds.
Per buildfarm.
2021-03-31 16:50:45 -04:00
Tom Lane
86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
Etsuro Fujita
27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
Amit Kapila
13cb5bd846 Remove extra semicolon in postgres_fdw tests.
Author: Suraj Kharage
Reviewed-by: Bharath Rupireddy, Vignesh C
Discussion: https://postgr.es/m/CAF1DzPWRfxUeH-wShz7P_pK5Tx6M_nEK+TkS8gn5ngvg07Q5=g@mail.gmail.com
2021-03-31 10:36:39 +05:30
David Rowley
ed934d4fa3 Allow estimate_num_groups() to pass back further details about the estimation
Here we add a new output parameter to estimate_num_groups() to allow it to
inform the caller of additional, possibly useful information about the
estimation.

The new output parameter is a struct that currently contains just a single
field with a set of flags.  This was done rather than having the flags as
an output parameter to allow future fields to be added without having to
change the signature of the function at a later date when we want to pass
back further information that might not be suitable to store in the flags
field.

It seems reasonable that one day in the future that the planner would want
to know more about the estimation. For example, how many individual sets
of statistics was the estimation generated from?  The planner may want to
take that into account if we ever want to consider risks as well as costs
when generating plans.

For now, there's only 1 flag we set in the flags field.  This is to
indicate if the estimation fell back on using the hard-coded constants in
any part of the estimation. Callers may like to change their behavior if
this is set, and this gives them the ability to do so.  Callers may pass
the flag pointer as NULL if they have no interest in obtaining any
additional information about the estimate.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.  Additionally, we're also not
making any changes to add support for clauselist_selectivity() and
clauselist_selectivity_ext().  However, if this is required in the future
then the same struct being added here should be fine to use as a new
output argument for those functions too.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
2021-03-30 20:52:46 +13:00