1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-24 06:01:07 +03:00

Set random seed for pgbench.

Setting random could increase reproducibility of test in some cases. Patch
suggests three providers for seed: time (default), strong random
generator (if available) and unsigned constant. Seed could be set from
command line or enviroment variable.

Author: Fabien Coelho
Reviewed by: Chapman Flack
Discussion: https://www.postgresql.org/message-id/flat/20160407082711.q7iq3ykffqxcszkv@alap3.anarazel.de
This commit is contained in:
Teodor Sigaev
2018-03-26 18:26:27 +03:00
parent 530bcf7581
commit 64f85894ad
4 changed files with 170 additions and 12 deletions

View File

@@ -679,6 +679,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
</listitem>
</varlistentry>
<varlistentry>
<term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
<listitem>
<para>
Set random generator seed. Seeds the system random number generator,
which then produces a sequence of initial generator states, one for
each thread.
Values for <replaceable>SEED</replaceable> may be:
<literal>time</literal> (the default, the seed is based on the current time),
<literal>rand</literal> (use a strong random source, failing if none
is available), or an unsigned decimal integer value.
The random generator is invoked explicitly from a pgbench script
(<literal>random...</literal> functions) or implicitly (for instance option
<option>--rate</option> uses it to schedule transactions).
When explicitly set, the value used for seeding is shown on the terminal.
Any value allowed for <replaceable>SEED</replaceable> may also be
provided through the environment variable
<literal>PGBENCH_RANDOM_SEED</literal>.
To ensure that the provided seed impacts all possible uses, put this option
first or use the environment variable.
</para>
<para>
Setting the seed explicitly allows to reproduce a <command>pgbench</command>
run exactly, as far as random numbers are concerned.
As the random state is managed per thread, this means the exact same
<command>pgbench</command> run for an identical invocation if there is one
client per thread and there are no external or data dependencies.
From a statistical viewpoint reproducing runs exactly is a bad idea because
it can hide the performance variability or improve performance unduly,
e.g. by hitting the same pages as a previous run.
However, it may also be of great help for debugging, for instance
re-running a tricky case which leads to an error.
Use wisely.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
<listitem>
@@ -883,6 +920,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
<entry>seed used in hash functions by default</entry>
</row>
<row>
<entry> <literal>random_seed</literal> </entry>
<entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
</row>
<row>
<entry> <literal>scale</literal> </entry>
<entry>current scale factor</entry>