Set random seed for pgbench.

Setting random could increase reproducibility of test in some cases. Patch suggests three providers for seed: time (default), strong random generator (if available) and unsigned constant. Seed could be set from command line or enviroment variable. Author: Fabien Coelho Reviewed by: Chapman Flack Discussion: https://www.postgresql.org/message-id/flat/20160407082711.q7iq3ykffqxcszkv@alap3.anarazel.de
2025-12-24 06:01:07 +03:00 · 2018-03-26 18:26:27 +03:00
parent 530bcf7581
commit 64f85894ad
4 changed files with 170 additions and 12 deletions
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -679,6 +679,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </listitem>
     </varlistentry>

+     <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or an unsigned decimal integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitly set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry>
      <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
      <listitem>
@@ -883,6 +920,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       <entry>seed used in hash functions by default</entry>
      </row>

+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
+      </row>
+
      <row>
       <entry> <literal>scale</literal> </entry>
       <entry>current scale factor</entry>