1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Change floating-point output format for improved performance.

Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).

Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.

Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.

Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.

We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.

Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:

 - Output format is changed to use fixed-point notation for small
   exponents, as printf would, and also to use lowercase 'e', a
   minimum of 2 exponent digits, and a mandatory sign on the exponent,
   to keep the formatting as close as possible to previous output.

 - The output of exact midpoint values is disabled as noted above.

 - The integer fast-path code is changed somewhat (since we have
   fixed-point output and the upstream did not).

 - Our project style has been largely applied to the code with the
   exception of C99 declaration-after-statement, which has been
   retained as an exception to our present policy.

 - Most of upstream's debugging and conditionals are removed, and we
   use our own configure tests to determine things like uint128
   availability.

Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).

Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.

This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.

Code by me, adapting the work of Ulf Adams and other contributors.

References:
https://dl.acm.org/citation.cfm?id=3192369

Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
This commit is contained in:
Andrew Gierth
2019-02-13 15:20:33 +00:00
parent f397e08599
commit 02ddd49932
50 changed files with 5466 additions and 368 deletions

View File

@ -7871,16 +7871,37 @@ SET XML OPTION { DOCUMENT | CONTENT };
</term>
<listitem>
<para>
This parameter adjusts the number of digits displayed for
This parameter adjusts the number of digits used for textual output of
floating-point values, including <type>float4</type>, <type>float8</type>,
and geometric data types. The parameter value is added to the
standard number of digits (<literal>FLT_DIG</literal> or <literal>DBL_DIG</literal>
as appropriate). The value can be set as high as 3, to include
partially-significant digits; this is especially useful for dumping
float data that needs to be restored exactly. Or it can be set
negative to suppress unwanted digits.
See also <xref linkend="datatype-float"/>.
and geometric data types.
</para>
<para>
If the value is 1 (the default) or above, float values are output in
shortest-precise format; see <xref linkend="datatype-float"/>. The
actual number of digits generated depends only on the value being
output, not on the value of this parameter. At most 17 digits are
required for <type>float8</type> values, and 9 for <type>float4</type>
values. This format is both fast and precise, preserving the original
binary float value exactly when correctly read. For historical
compatibility, values up to 3 are permitted.
</para>
<para>
If the value is zero or negative, then the output is rounded to a
given decimal precision. The precision used is the standard number of
digits for the type (<literal>FLT_DIG</literal>
or <literal>DBL_DIG</literal> as appropriate) reduced according to the
value of this parameter. (For example, specifying -1 will cause float4
values to be output rounded to 5 significant digits, and float8 values
rounded to 14 digits.) This format is slower and does not preserve all
the bits of the binary float value, but may be more human-readable.
</para>
<note>
<para>
The meaning of this parameter, and its default value, changed
in <productname>PostgreSQL</productname> 12;
see <xref linkend="datatype-float"/> for further discussion.
</para>
</note>
</listitem>
</varlistentry>

View File

@ -671,13 +671,12 @@ FROM generate_series(-3.5, 3.5, 1) as x;
</indexterm>
<para>
The data types <type>real</type> and <type>double
precision</type> are inexact, variable-precision numeric types.
In practice, these types are usually implementations of
<acronym>IEEE</acronym> Standard 754 for Binary Floating-Point
Arithmetic (single and double precision, respectively), to the
extent that the underlying processor, operating system, and
compiler support it.
The data types <type>real</type> and <type>double precision</type> are
inexact, variable-precision numeric types. On all currently supported
platforms, these types are implementations of <acronym>IEEE</acronym>
Standard 754 for Binary Floating-Point Arithmetic (single and double
precision, respectively), to the extent that the underlying processor,
operating system, and compiler support it.
</para>
<para>
@ -715,24 +714,57 @@ FROM generate_series(-3.5, 3.5, 1) as x;
</para>
<para>
On most platforms, the <type>real</type> type has a range of at least
1E-37 to 1E+37 with a precision of at least 6 decimal digits. The
<type>double precision</type> type typically has a range of around
1E-307 to 1E+308 with a precision of at least 15 digits. Values that
are too large or too small will cause an error. Rounding might
take place if the precision of an input number is too high.
Numbers too close to zero that are not representable as distinct
from zero will cause an underflow error.
On all currently supported platforms, the <type>real</type> type has a
range of around 1E-37 to 1E+37 with a precision of at least 6 decimal
digits. The <type>double precision</type> type has a range of around
1E-307 to 1E+308 with a precision of at least 15 digits. Values that are
too large or too small will cause an error. Rounding might take place if
the precision of an input number is too high. Numbers too close to zero
that are not representable as distinct from zero will cause an underflow
error.
</para>
<para>
By default, floating point values are output in text form in their
shortest precise decimal representation; the decimal value produced is
closer to the true stored binary value than to any other value
representable in the same binary precision. (However, the output value is
currently never <emphasis>exactly</emphasis> midway between two
representable values, in order to avoid a widespread bug where input
routines do not properly respect the round-to-even rule.) This value will
use at most 17 significant decimal digits for <type>float8</type>
values, and at most 9 digits for <type>float4</type> values.
</para>
<note>
<para>
The <xref linkend="guc-extra-float-digits"/> setting controls the
number of extra significant digits included when a floating point
value is converted to text for output. With the default value of
<literal>0</literal>, the output is the same on every platform
supported by PostgreSQL. Increasing it will produce output that
more accurately represents the stored value, but may be unportable.
This shortest-precise output format is much faster to generate than the
historical rounded format.
</para>
</note>
<para>
For compatibility with output generated by older versions
of <productname>PostgreSQL</productname>, and to allow the output
precision to be reduced, the <xref linkend="guc-extra-float-digits"/>
parameter can be used to select rounded decimal output instead. Setting a
value of 0 restores the previous default of rounding the value to 6
(for <type>float4</type>) or 15 (for <type>float8</type>)
significant decimal digits. Setting a negative value reduces the number
of digits further; for example -2 would round output to 4 or 13 digits
respectively.
</para>
<para>
Any value of <xref linkend="guc-extra-float-digits"/> greater than 0
selects the shortest-precise format.
</para>
<note>
<para>
Applications that wanted precise values have historically had to set
<xref linkend="guc-extra-float-digits"/> to 3 obtain them. For maximum
compatibility between versions, they should continue to do so.
</para>
</note>
@ -751,9 +783,7 @@ FROM generate_series(-3.5, 3.5, 1) as x;
</literallayout>
These represent the IEEE 754 special values
<quote>infinity</quote>, <quote>negative infinity</quote>, and
<quote>not-a-number</quote>, respectively. (On a machine whose
floating-point arithmetic does not follow IEEE 754, these values
will probably not work as expected.) When writing these values
<quote>not-a-number</quote>, respectively. When writing these values
as constants in an SQL command, you must put quotes around them,
for example <literal>UPDATE table SET x = '-Infinity'</literal>. On input,
these strings are recognized in a case-insensitive manner.
@ -786,17 +816,6 @@ FROM generate_series(-3.5, 3.5, 1) as x;
<type>double precision</type>.
</para>
<note>
<para>
The assumption that <type>real</type> and
<type>double precision</type> have exactly 24 and 53 bits in the
mantissa respectively is correct for IEEE-standard floating point
implementations. On non-IEEE platforms it might be off a little, but
for simplicity the same ranges of <replaceable>p</replaceable> are used
on all platforms.
</para>
</note>
</sect2>
<sect2 id="datatype-serial">