mirror of
https://github.com/postgres/postgres.git
synced 2025-12-19 17:02:53 +03:00
Move pg_test_timing from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This commit is contained in:
@@ -194,6 +194,7 @@ Complete list of usable sgml source files in this directory.
|
||||
<!ENTITY pgRestore SYSTEM "pg_restore.sgml">
|
||||
<!ENTITY pgRewind SYSTEM "pg_rewind.sgml">
|
||||
<!ENTITY pgtestfsync SYSTEM "pgtestfsync.sgml">
|
||||
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
|
||||
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
|
||||
<!ENTITY postgres SYSTEM "postgres-ref.sgml">
|
||||
<!ENTITY postmaster SYSTEM "postmaster.sgml">
|
||||
|
||||
300
doc/src/sgml/ref/pgtesttiming.sgml
Normal file
300
doc/src/sgml/ref/pgtesttiming.sgml
Normal file
@@ -0,0 +1,300 @@
|
||||
<!-- doc/src/sgml/ref/pgtesttiming.sgml -->
|
||||
|
||||
<refentry id="pgtesttiming">
|
||||
<indexterm zone="pgtesttiming">
|
||||
<primary>pg_test_timing</primary>
|
||||
</indexterm>
|
||||
|
||||
<refmeta>
|
||||
<refentrytitle><application>pg_test_timing</application></refentrytitle>
|
||||
<manvolnum>1</manvolnum>
|
||||
<refmiscinfo>Application</refmiscinfo>
|
||||
</refmeta>
|
||||
|
||||
<refnamediv>
|
||||
<refname>pg_test_timing</refname>
|
||||
<refpurpose>measure timing overhead</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsynopsisdiv>
|
||||
<cmdsynopsis>
|
||||
<command>pg_test_timing</command>
|
||||
<arg rep="repeat"><replaceable>option</replaceable></arg>
|
||||
</cmdsynopsis>
|
||||
</refsynopsisdiv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
|
||||
<para>
|
||||
<application>pg_test_timing</> is a tool to measure the timing overhead
|
||||
on your system and confirm that the system time never moves backwards.
|
||||
Systems that are slow to collect timing data can give less accurate
|
||||
<command>EXPLAIN ANALYZE</command> results.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
<title>Options</title>
|
||||
|
||||
<para>
|
||||
<application>pg_test_timing</application> accepts the following
|
||||
command-line options:
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>-d <replaceable class="parameter">duration</replaceable></option></term>
|
||||
<term><option>--duration=<replaceable class="parameter">duration</replaceable></option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Specifies the test duration, in seconds. Longer durations
|
||||
give slightly better accuracy, and are more likely to discover
|
||||
problems with the system clock moving backwards. The default
|
||||
test duration is 3 seconds.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>-V</></term>
|
||||
<term><option>--version</></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Print the <application>pg_test_timing</application> version and exit.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>-?</></term>
|
||||
<term><option>--help</></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Show help about <application>pg_test_timing</application> command line
|
||||
arguments, and exit.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
<title>Usage</title>
|
||||
|
||||
<refsect2>
|
||||
<title>Interpreting results</title>
|
||||
|
||||
<para>
|
||||
Good results will show most (>90%) individual timing calls take less than
|
||||
one microsecond. Average per loop overhead will be even lower, below 100
|
||||
nanoseconds. This example from an Intel i7-860 system using a TSC clock
|
||||
source shows excellent performance:
|
||||
|
||||
<screen>
|
||||
Testing timing overhead for 3 seconds.
|
||||
Per loop time including overhead: 35.96 nsec
|
||||
Histogram of timing durations:
|
||||
< usec % of total count
|
||||
1 96.40465 80435604
|
||||
2 3.59518 2999652
|
||||
4 0.00015 126
|
||||
8 0.00002 13
|
||||
16 0.00000 2
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that different units are used for the per loop time than the
|
||||
histogram. The loop can have resolution within a few nanoseconds (nsec),
|
||||
while the individual timing calls can only resolve down to one microsecond
|
||||
(usec).
|
||||
</para>
|
||||
|
||||
</refsect2>
|
||||
<refsect2>
|
||||
<title>Measuring executor timing overhead</title>
|
||||
|
||||
<para>
|
||||
When the query executor is running a statement using
|
||||
<command>EXPLAIN ANALYZE</command>, individual operations are timed as well
|
||||
as showing a summary. The overhead of your system can be checked by
|
||||
counting rows with the <application>psql</application> program:
|
||||
|
||||
<screen>
|
||||
CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
|
||||
\timing
|
||||
SELECT COUNT(*) FROM t;
|
||||
EXPLAIN ANALYZE SELECT COUNT(*) FROM t;
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The i7-860 system measured runs the count query in 9.8 ms while
|
||||
the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
|
||||
processing just over 100,000 rows. That 6.8 ms difference means the timing
|
||||
overhead per row is 68 ns, about twice what pg_test_timing estimated it
|
||||
would be. Even that relatively small amount of overhead is making the fully
|
||||
timed count statement take almost 70% longer. On more substantial queries,
|
||||
the timing overhead would be less problematic.
|
||||
</para>
|
||||
|
||||
</refsect2>
|
||||
|
||||
<refsect2>
|
||||
<title>Changing time sources</title>
|
||||
<para>
|
||||
On some newer Linux systems, it's possible to change the clock source used
|
||||
to collect timing data at any time. A second example shows the slowdown
|
||||
possible from switching to the slower acpi_pm time source, on the same
|
||||
system used for the fast results above:
|
||||
|
||||
<screen>
|
||||
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
|
||||
tsc hpet acpi_pm
|
||||
# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
|
||||
# pg_test_timing
|
||||
Per loop time including overhead: 722.92 nsec
|
||||
Histogram of timing durations:
|
||||
< usec % of total count
|
||||
1 27.84870 1155682
|
||||
2 72.05956 2990371
|
||||
4 0.07810 3241
|
||||
8 0.01357 563
|
||||
16 0.00007 3
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In this configuration, the sample <command>EXPLAIN ANALYZE</command> above
|
||||
takes 115.9 ms. That's 1061 nsec of timing overhead, again a small multiple
|
||||
of what's measured directly by this utility. That much timing overhead
|
||||
means the actual query itself is only taking a tiny fraction of the
|
||||
accounted for time, most of it is being consumed in overhead instead. In
|
||||
this configuration, any <command>EXPLAIN ANALYZE</command> totals involving
|
||||
many timed operations would be inflated significantly by timing overhead.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
FreeBSD also allows changing the time source on the fly, and it logs
|
||||
information about the timer selected during boot:
|
||||
|
||||
<screen>
|
||||
# dmesg | grep "Timecounter"
|
||||
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
|
||||
Timecounter "i8254" frequency 1193182 Hz quality 0
|
||||
Timecounters tick every 10.000 msec
|
||||
Timecounter "TSC" frequency 2531787134 Hz quality 800
|
||||
# sysctl kern.timecounter.hardware=TSC
|
||||
kern.timecounter.hardware: ACPI-fast -> TSC
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Other systems may only allow setting the time source on boot. On older
|
||||
Linux systems the "clock" kernel setting is the only way to make this sort
|
||||
of change. And even on some more recent ones, the only option you'll see
|
||||
for a clock source is "jiffies". Jiffies are the older Linux software clock
|
||||
implementation, which can have good resolution when it's backed by fast
|
||||
enough timing hardware, as in this example:
|
||||
|
||||
<screen>
|
||||
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
|
||||
jiffies
|
||||
$ dmesg | grep time.c
|
||||
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
|
||||
time.c: Detected 2400.153 MHz processor.
|
||||
$ pg_test_timing
|
||||
Testing timing overhead for 3 seconds.
|
||||
Per timing duration including loop overhead: 97.75 ns
|
||||
Histogram of timing durations:
|
||||
< usec % of total count
|
||||
1 90.23734 27694571
|
||||
2 9.75277 2993204
|
||||
4 0.00981 3010
|
||||
8 0.00007 22
|
||||
16 0.00000 1
|
||||
32 0.00000 1
|
||||
</screen></para>
|
||||
|
||||
</refsect2>
|
||||
|
||||
<refsect2>
|
||||
<title>Clock hardware and timing accuracy</title>
|
||||
|
||||
<para>
|
||||
Collecting accurate timing information is normally done on computers using
|
||||
hardware clocks with various levels of accuracy. With some hardware the
|
||||
operating systems can pass the system clock time almost directly to
|
||||
programs. A system clock can also be derived from a chip that simply
|
||||
provides timing interrupts, periodic ticks at some known time interval. In
|
||||
either case, operating system kernels provide a clock source that hides
|
||||
these details. But the accuracy of that clock source and how quickly it can
|
||||
return results varies based on the underlying hardware.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Inaccurate time keeping can result in system instability. Test any change
|
||||
to the clock source very carefully. Operating system defaults are sometimes
|
||||
made to favor reliability over best accuracy. And if you are using a virtual
|
||||
machine, look into the recommended time sources compatible with it. Virtual
|
||||
hardware faces additional difficulties when emulating timers, and there are
|
||||
often per operating system settings suggested by vendors.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The Time Stamp Counter (TSC) clock source is the most accurate one available
|
||||
on current generation CPUs. It's the preferred way to track the system time
|
||||
when it's supported by the operating system and the TSC clock is
|
||||
reliable. There are several ways that TSC can fail to provide an accurate
|
||||
timing source, making it unreliable. Older systems can have a TSC clock that
|
||||
varies based on the CPU temperature, making it unusable for timing. Trying
|
||||
to use TSC on some older multicore CPUs can give a reported time that's
|
||||
inconsistent among multiple cores. This can result in the time going
|
||||
backwards, a problem this program checks for. And even the newest systems
|
||||
can fail to provide accurate TSC timing with very aggressive power saving
|
||||
configurations.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Newer operating systems may check for the known TSC problems and switch to a
|
||||
slower, more stable clock source when they are seen. If your system
|
||||
supports TSC time but doesn't default to that, it may be disabled for a good
|
||||
reason. And some operating systems may not detect all the possible problems
|
||||
correctly, or will allow using TSC even in situations where it's known to be
|
||||
inaccurate.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The High Precision Event Timer (HPET) is the preferred timer on systems
|
||||
where it's available and TSC is not accurate. The timer chip itself is
|
||||
programmable to allow up to 100 nanosecond resolution, but you may not see
|
||||
that much accuracy in your system clock.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Advanced Configuration and Power Interface (ACPI) provides a Power
|
||||
Management (PM) Timer, which Linux refers to as the acpi_pm. The clock
|
||||
derived from acpi_pm will at best provide 300 nanosecond resolution.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Timers used on older PC hardware include the 8254 Programmable Interval
|
||||
Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt
|
||||
Controller (APIC) timer, and the Cyclone timer. These timers aim for
|
||||
millisecond resolution.
|
||||
</para>
|
||||
</refsect2>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
<title>See Also</title>
|
||||
|
||||
<simplelist type="inline">
|
||||
<member><xref linkend="sql-explain"></member>
|
||||
</simplelist>
|
||||
</refsect1>
|
||||
</refentry>
|
||||
Reference in New Issue
Block a user