1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-06 18:42:54 +03:00

Add approximated Zipfian-distributed random generator to pgbench.

Generator helps to make close to real-world tests.

Author: Alik Khilazhev
Reviewed-By: Fabien COELHO
Discussion: https://www.postgresql.org/message-id/flat/BF3B6F54-68C3-417A-BFAB-FB4D66F2B410@postgrespro.ru
This commit is contained in:
Teodor Sigaev
2017-12-14 14:30:22 +03:00
parent 538d114f6d
commit 1fcd0adeb3
5 changed files with 263 additions and 3 deletions

View File

@@ -1092,6 +1092,14 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
<entry><literal>random_gaussian(1, 10, 2.5)</literal></entry>
<entry>an integer between <literal>1</literal> and <literal>10</literal></entry>
</row>
<row>
<entry><literal><function>random_zipfian(<replaceable>lb</replaceable>, <replaceable>ub</replaceable>, <replaceable>parameter</replaceable>)</function></literal></entry>
<entry>integer</entry>
<entry>Zipfian-distributed random integer in <literal>[lb, ub]</literal>,
see below</entry>
<entry><literal>random_zipfian(1, 10, 1.5)</literal></entry>
<entry>an integer between <literal>1</literal> and <literal>10</literal></entry>
</row>
<row>
<entry><literal><function>sqrt(<replaceable>x</replaceable>)</function></literal></entry>
<entry>double</entry>
@@ -1173,6 +1181,27 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
of the Box-Muller transform.
</para>
</listitem>
<listitem>
<para>
<literal>random_zipfian</literal> generates an approximated bounded zipfian
distribution. For <replaceable>parameter</replaceable> in (0, 1), an
approximated algorithm is taken from
"Quickly Generating Billion-Record Synthetic Databases",
Jim Gray et al, SIGMOD 1994. For <replaceable>parameter</replaceable>
in (1, 1000), a rejection method is used, based on
"Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
Springer 1986. The distribution is not defined when the parameter's
value is 1.0. The drawing performance is poor for parameter values
close and above 1.0 and on a small range.
</para>
<para>
<replaceable>parameter</replaceable>
defines how skewed the distribution is. The larger the <replaceable>parameter</replaceable>, the more
frequently values to the beginning of the interval are drawn.
The closer to 0 <replaceable>parameter</replaceable> is,
the flatter (more uniform) the access distribution.
</para>
</listitem>
</itemizedlist>
<para>