diff --git a/doc/src/sgml/pgtesttiming.sgml b/doc/src/sgml/pgtesttiming.sgml index 586611f6f41..4e124516d8a 100644 --- a/doc/src/sgml/pgtesttiming.sgml +++ b/doc/src/sgml/pgtesttiming.sgml @@ -29,7 +29,7 @@ pg_test_timing is a tool to measure the timing overhead on your system and confirm that the system time never moves backwards. - Systems that are slow to collect timing data can give less accurate + Systems that are slow to collect timing data can give less accurate EXPLAIN ANALYZE results. @@ -68,11 +68,10 @@ Interpreting results - Good results will show most (>90%) individual timing calls take less - than one microsecond. Average per loop overhead will be even lower, - below 100 nanoseconds. This example from an Intel i7-860 system using - a TSC clock source shows excellent performance: - + Good results will show most (>90%) individual timing calls take less than + one microsecond. Average per loop overhead will be even lower, below 100 + nanoseconds. This example from an Intel i7-860 system using a TSC clock + source shows excellent performance: Testing timing overhead for 3 seconds. @@ -85,12 +84,13 @@ Histogram of timing durations: 2: 2999652 3.59518% 1: 80435604 96.40465% + - Note that different units are used for the per loop time than the - histogram. The loop can have resolution within a few nanoseconds - (nsec), while the individual timing calls can only resolve down to - one microsecond (usec). + Note that different units are used for the per loop time than the + histogram. The loop can have resolution within a few nanoseconds (nsec), + while the individual timing calls can only resolve down to one microsecond + (usec). @@ -98,11 +98,10 @@ Histogram of timing durations: Measuring executor timing overhead - When the query executor is running a statement using - EXPLAIN ANALYZE, individual operations are - timed as well as showing a summary. The overhead of your system - can be checked by counting rows with the psql program: - + When the query executor is running a statement using + EXPLAIN ANALYZE, individual operations are timed as well + as showing a summary. The overhead of your system can be checked by + counting rows with the psql program: CREATE TABLE t AS SELECT * FROM generate_series(1,100000); @@ -110,16 +109,16 @@ CREATE TABLE t AS SELECT * FROM generate_series(1,100000); SELECT COUNT(*) FROM t; EXPLAIN ANALYZE SELECT COUNT(*) FROM t; + - The i7-860 system measured runs the count query in 9.8 ms while - the EXPLAIN ANALYZE version takes 16.6 ms, - each processing just over 100,000 rows. That 6.8 ms difference - means the timing overhead per row is 68 ns, about twice what - pg_test_timing estimated it would be. Even that relatively - small amount of overhead is making the fully timed count statement - take almost 70% longer. On more substantial queries, the - timing overhead would be less problematic. + The i7-860 system measured runs the count query in 9.8 ms while + the EXPLAIN ANALYZE version takes 16.6 ms, each + processing just over 100,000 rows. That 6.8 ms difference means the timing + overhead per row is 68 ns, about twice what pg_test_timing estimated it + would be. Even that relatively small amount of overhead is making the fully + timed count statement take almost 70% longer. On more substantial queries, + the timing overhead would be less problematic. @@ -127,14 +126,13 @@ EXPLAIN ANALYZE SELECT COUNT(*) FROM t; Changing time sources - On some newer Linux systems, it's possible to change the clock - source used to collect timing data at any time. A second example - shows the slowdown possible from switching to the slower acpi_pm - time source, on the same system used for the fast results above: - + On some newer Linux systems, it's possible to change the clock source used + to collect timing data at any time. A second example shows the slowdown + possible from switching to the slower acpi_pm time source, on the same + system used for the fast results above: -# cat /sys/devices/system/clocksource/clocksource0/available_clocksource +# cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource # pg_test_timing @@ -147,45 +145,43 @@ Histogram of timing durations: 2: 2990371 72.05956% 1: 1155682 27.84870% - - - In this configuration, the sample EXPLAIN ANALYZE - above takes 115.9 ms. That's 1061 nsec of timing overhead, again - a small multiple of what's measured directly by this utility. - That much timing overhead means the actual query itself is only - taking a tiny fraction of the accounted for time, most of it - is being consumed in overhead instead. In this configuration, - any EXPLAIN ANALYZE totals involving many - timed operations would be inflated significantly by timing overhead. - FreeBSD also allows changing the time source on the fly, and - it logs information about the timer selected during boot: + In this configuration, the sample EXPLAIN ANALYZE above + takes 115.9 ms. That's 1061 nsec of timing overhead, again a small multiple + of what's measured directly by this utility. That much timing overhead + means the actual query itself is only taking a tiny fraction of the + accounted for time, most of it is being consumed in overhead instead. In + this configuration, any EXPLAIN ANALYZE totals involving + many timed operations would be inflated significantly by timing overhead. + + FreeBSD also allows changing the time source on the fly, and it logs + information about the timer selected during boot: + -dmesg | grep "Timecounter" +dmesg | grep "Timecounter" sysctl kern.timecounter.hardware=TSC + - Other systems may only allow setting the time source on boot. - On older Linux systems the "clock" kernel setting is the only way - to make this sort of change. And even on some more recent ones, - the only option you'll see for a clock source is "jiffies". Jiffies - are the older Linux software clock implementation, which can have - good resolution when it's backed by fast enough timing hardware, - as in this example: - + Other systems may only allow setting the time source on boot. On older + Linux systems the "clock" kernel setting is the only way to make this sort + of change. And even on some more recent ones, the only option you'll see + for a clock source is "jiffies". Jiffies are the older Linux software clock + implementation, which can have good resolution when it's backed by fast + enough timing hardware, as in this example: -$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource -jiffies +$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource +jiffies $ dmesg | grep time.c time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. time.c: Detected 2400.153 MHz processor. -$ ./pg_test_timing +$ pg_test_timing Testing timing overhead for 3 seconds. Per timing duration including loop overhead: 97.75 ns Histogram of timing durations: @@ -197,76 +193,74 @@ Histogram of timing durations: 2: 2993204 9.75277% 1: 27694571 90.23734% + + Clock hardware and timing accuracy - Collecting accurate timing information is normally done on computers - using hardware clocks with various levels of accuracy. With some - hardware the operating systems can pass the system clock time almost - directly to programs. A system clock can also be derived from a chip - that simply provides timing interrupts, periodic ticks at some known - time interval. In either case, operating system kernels provide - a clock source that hides these details. But the accuracy of that - clock source and how quickly it can return results varies based - on the underlying hardware. + Collecting accurate timing information is normally done on computers using + hardware clocks with various levels of accuracy. With some hardware the + operating systems can pass the system clock time almost directly to + programs. A system clock can also be derived from a chip that simply + provides timing interrupts, periodic ticks at some known time interval. In + either case, operating system kernels provide a clock source that hides + these details. But the accuracy of that clock source and how quickly it can + return results varies based on the underlying hardware. - Inaccurate time keeping can result in system instability. Test - any change to the clock source very carefully. Operating system - defaults are sometimes made to favor reliability over best - accuracy. And if you are using a virtual machine, look into the - recommended time sources compatible with it. Virtual hardware - faces additional difficulties when emulating timers, and there - are often per operating system settings suggested by vendors. + Inaccurate time keeping can result in system instability. Test any change + to the clock source very carefully. Operating system defaults are sometimes + made to favor reliability over best accuracy. And if you are using a virtual + machine, look into the recommended time sources compatible with it. Virtual + hardware faces additional difficulties when emulating timers, and there are + often per operating system settings suggested by vendors. - The Time Stamp Counter (TSC) clock source is the most accurate one - available on current generation CPUs. It's the preferred way to track - the system time when it's supported by the operating system and the - TSC clock is reliable. There are several ways that TSC can fail - to provide an accurate timing source, making it unreliable. Older - systems can have a TSC clock that varies based on the CPU - temperature, making it unusable for timing. Trying to use TSC on some - older multi-core CPUs can give a reported time that's inconsistent - among multiple cores. This can result in the time going backwards, a - problem this program checks for. And even the newest systems can - fail to provide accurate TSC timing with very aggressive power saving - configurations. + The Time Stamp Counter (TSC) clock source is the most accurate one available + on current generation CPUs. It's the preferred way to track the system time + when it's supported by the operating system and the TSC clock is + reliable. There are several ways that TSC can fail to provide an accurate + timing source, making it unreliable. Older systems can have a TSC clock that + varies based on the CPU temperature, making it unusable for timing. Trying + to use TSC on some older multi-core CPUs can give a reported time that's + inconsistent among multiple cores. This can result in the time going + backwards, a problem this program checks for. And even the newest systems + can fail to provide accurate TSC timing with very aggressive power saving + configurations. - Newer operating systems may check for the known TSC problems and - switch to a slower, more stable clock source when they are seen. - If your system supports TSC time but doesn't default to that, it - may be disabled for a good reason. And some operating systems may - not detect all the possible problems correctly, or will allow using - TSC even in situations where it's known to be inaccurate. + Newer operating systems may check for the known TSC problems and switch to a + slower, more stable clock source when they are seen. If your system + supports TSC time but doesn't default to that, it may be disabled for a good + reason. And some operating systems may not detect all the possible problems + correctly, or will allow using TSC even in situations where it's known to be + inaccurate. - The High Precision Event Timer (HPET) is the preferred timer on - systems where it's available and TSC is not accurate. The timer - chip itself is programmable to allow up to 100 nanosecond resolution, - but you may not see that much accuracy in your system clock. + The High Precision Event Timer (HPET) is the preferred timer on systems + where it's available and TSC is not accurate. The timer chip itself is + programmable to allow up to 100 nanosecond resolution, but you may not see + that much accuracy in your system clock. - Advanced Configuration and Power Interface (ACPI) provides a - Power Management (PM) Timer, which Linux refers to as the acpi_pm. - The clock derived from acpi_pm will at best provide 300 nanosecond - resolution. + Advanced Configuration and Power Interface (ACPI) provides a Power + Management (PM) Timer, which Linux refers to as the acpi_pm. The clock + derived from acpi_pm will at best provide 300 nanosecond resolution. - Timers used on older PC hardware including the 8254 Programmable - Interval Timer (PIT), the real-time clock (RTC), the Advanced - Programmable Interrupt Controller (APIC) timer, and the Cyclone - timer. These timers aim for millisecond resolution. + Timers used on older PC hardware including the 8254 Programmable Interval + Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt + Controller (APIC) timer, and the Cyclone timer. These timers aim for + millisecond resolution.