1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

pgbench: Use COPY for client-side data generation

This commit switches the client-side data generation from INSERT queries
to COPY for the two tables pgbench_branches and pgbench_tellers.
pgbench_accounts was already using COPY.

COPY is a better interface for bulk loading or high latency connections
(this point can be countered with the option for server-side data
generation, still client-side is the default), and measurements have
proved that using it for these two other tables can lead to improvements
during initialization.  I did not notice slowdowns at large scale
numbers on a local setup, either, most of the work happening for the
accounts table.

Previously COPY was only used for the pgbench_accounts table because the
amount of data was much larger than the two other tables.  The code is
refactored so as all three tables use the same code path to execute the
COPY queries, with a callback to build data rows.

Author: Tristan Partin
Discussion: https://postgr.es/m/CSTU5P82ONZ1.19XFUGHMXHBRY@c3po
This commit is contained in:
Michael Paquier
2023-07-24 13:48:22 +09:00
parent 29836df323
commit e35cc3b3f2
2 changed files with 98 additions and 66 deletions

View File

@ -231,10 +231,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
extensively through a <command>COPY</command>.
<command>pgbench</command> uses the FREEZE option with version 14 or later
of <productname>PostgreSQL</productname> to speed up
subsequent <command>VACUUM</command>, unless partitions are enabled.
Using <literal>g</literal> causes logging to print one message
every 100,000 rows while generating data for the
<structname>pgbench_accounts</structname> table.
subsequent <command>VACUUM</command>, except on the
<literal>pgbench_accounts</literal> table if partitions are
enabled. Using <literal>g</literal> causes logging to
print one message every 100,000 rows while generating data for all
tables.
</para>
<para>
With <literal>G</literal> (server-side data generation),