📝 add documentation for numbers

2025-07-29 23:01:16 +03:00 · 2021-04-28 20:33:05 +02:00
parent a34e011e24
commit cdfe865486
20 changed files with 400 additions and 83 deletions
--- a/doc/mkdocs/docs/features/iterators.md
+++ b/doc/mkdocs/docs/features/iterators.md
@ -10,7 +10,7 @@ As for other containers, `begin()` returns an iterator to the first value and `e

 ### Iteration order for objects

-When iterating over objects, values are ordered with respect to the `object_comparator_t` type which defaults to `std::less`. See the [types documentation](types.md#key-order) for more information.
+When iterating over objects, values are ordered with respect to the `object_comparator_t` type which defaults to `std::less`. See the [types documentation](types/index.md#key-order) for more information.

 ??? example

--- a/doc/mkdocs/docs/features/types/index.md
+++ b/doc/mkdocs/docs/features/types/index.md
@ -1,4 +1,4 @@
-# Types
+# Overview

 This page gives an overview how JSON values are stored and how this can be configured.

@ -107,7 +107,7 @@ using binary_t = nlohmann::byte_container_with_subtype<BinaryType>;

 ## Objects

-[RFC 7159](http://rfc7159.net/rfc7159) describes JSON objects as follows:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) describes JSON objects as follows:

 > An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

@ -135,11 +135,11 @@ The choice of `object_t` influences the behavior of the JSON class. With the def

 ### Key order

-The order name/value pairs are added to the object is *not* preserved by the library. Therefore, iterating an object may return name/value pairs in a different order than they were originally stored. In fact, keys will be traversed in alphabetical order as `std::map` with `std::less` is used by default. Please note this behavior conforms to [RFC 7159](http://rfc7159.net/rfc7159), because any order implements the specified "unordered" nature of JSON objects.
+The order name/value pairs are added to the object is *not* preserved by the library. Therefore, iterating an object may return name/value pairs in a different order than they were originally stored. In fact, keys will be traversed in alphabetical order as `std::map` with `std::less` is used by default. Please note this behavior conforms to [RFC 8259](https://tools.ietf.org/html/rfc8259), because any order implements the specified "unordered" nature of JSON objects.

 ### Limits

-[RFC 7159](http://rfc7159.net/rfc7159) specifies:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) specifies:

 > An implementation may set limits on the maximum depth of nesting.

@ -152,7 +152,7 @@ Objects are stored as pointers in a `basic_json` type. That is, for any access t

 ## Arrays

-[RFC 7159](http://rfc7159.net/rfc7159) describes JSON arrays as follows:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) describes JSON arrays as follows:

 > An array is an ordered sequence of zero or more values.

@ -169,7 +169,7 @@ std::vector<

 ### Limits

-[RFC 7159](http://rfc7159.net/rfc7159) specifies:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) specifies:

 > An implementation may set limits on the maximum depth of nesting.

@ -182,7 +182,7 @@ Arrays are stored as pointers in a `basic_json` type. That is, for any access to

 ## Strings

-[RFC 7159](http://rfc7159.net/rfc7159) describes JSON strings as follows:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) describes JSON strings as follows:

 > A string is a sequence of zero or more Unicode characters.

@ -198,7 +198,7 @@ Strings are stored in UTF-8 encoding. Therefore, functions like `std::string::si

 ### String comparison

-[RFC 7159](http://rfc7159.net/rfc7159) states:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) states:

 > Software implementations are typically required to test names of object members for equality. Implementations that transform the textual representation into sequences of Unicode code units and then perform the comparison numerically, code unit by code unit, are interoperable in the sense that implementations will agree in all cases on equality or inequality of two strings. For example, implementations that compare strings with escaped characters unconverted may incorrectly find that `"a\\b"` and `"a\u005Cb"` are not equal.

@ -211,7 +211,7 @@ String values are stored as pointers in a `basic_json` type. That is, for any ac

 ## Booleans

-[RFC 7159](http://rfc7159.net/rfc7159) implicitly describes a boolean as a type which differentiates the two literals `true` and `false`.
+[RFC 8259](https://tools.ietf.org/html/rfc8259) implicitly describes a boolean as a type which differentiates the two literals `true` and `false`.

 ### Default type

@ -223,7 +223,9 @@ Boolean values are stored directly inside a `basic_json` type.

 ## Numbers

-[RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows:
+See the [number handling](number_handling.md) article for a detailed discussion on how numbers are handled by this library.
+
+[RFC 8259](https://tools.ietf.org/html/rfc8259) describes numbers as follows:

 > The representation of numbers is similar to that used in most programming languages. A number is represented in base 10 using decimal digits. It contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. Leading zeros are not allowed. (...) Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted.

@ -242,7 +244,7 @@ With the default values for *NumberFloatType* (`#!cpp double`), the default valu

 ### Limits

-[RFC 7159](http://rfc7159.net/rfc7159) specifies:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) specifies:

 > An implementation may set limits on the range and precision of numbers.

@ -250,13 +252,13 @@ When the default type is used, the maximal integer number that can be stored is

 When the default type is used, the maximal unsigned integer number that can be stored is `#!c 18446744073709551615` (`UINT64_MAX`) and the minimal integer number that can be stored is `#!c 0`. Integer numbers that are out of range will yield over/underflow when used in a constructor. During deserialization, too large or small integer numbers will be automatically be stored as `number_integer_t` or `number_float_t`.

-[RFC 7159](http://rfc7159.net/rfc7159) further states:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) further states:

 > Note that when such software is used, numbers that are integers and are in the range $[-2^{53}+1, 2^{53}-1]$ are interoperable in the sense that implementations will agree exactly on their numeric values.

 As this range is a subrange of the exactly supported range [`INT64_MIN`, `INT64_MAX`], this class's integer type is interoperable.

-[RFC 7159](http://rfc7159.net/rfc7159) states:
+[RFC 8259](https://tools.ietf.org/html/rfc8259) states:

 > This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision.

--- a/doc/mkdocs/docs/features/types/number_handling.md
+++ b/doc/mkdocs/docs/features/types/number_handling.md
@ -0,0 +1,313 @@
+# Number Handling
+
+This document describes how the library is handling numbers.
+
+## Background
+
+This section briefly summarizes how the JSON specification describes how numbers should be handled.
+
+### JSON number syntax
+
+JSON defines the syntax of numbers as follows:
+
+!!! quote "[RFC 8259](https://tools.ietf.org/html/rfc8259#section-6), Section 6"
+
+    The representation of numbers is similar to that used in most
+    programming languages.  A number is represented in base 10 using
+    decimal digits.  It contains an integer component that may be
+    prefixed with an optional minus sign, which may be followed by a
+    fraction part and/or an exponent part.  Leading zeros are not
+    allowed.
+
+    A fraction part is a decimal point followed by one or more digits.
+    
+    An exponent part begins with the letter E in uppercase or lowercase,
+    which may be followed by a plus or minus sign.  The E and optional
+    sign are followed by one or more digits.
+
+The following railroad diagram from [json.org](https://json.org) visualizes the number syntax:
+
+![Syntax for JSON numbers](../../images/json_syntax_number.png)
+
+### Number interoperability
+
+On number interoperability, the following remarks are made:
+
+!!! quote "[RFC 8259](https://tools.ietf.org/html/rfc8259#section-6), Section 6"
+
+    This specification allows implementations to set limits on the range
+    and precision of numbers accepted.  Since software that implements
+    IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
+    available and widely used, good interoperability can be achieved by
+    implementations that expect no more precision or range than these
+    provide, in the sense that implementations will approximate JSON
+    numbers within the expected precision.  A JSON number such as 1E400
+    or 3.141592653589793238462643383279 may indicate potential
+    interoperability problems, since it suggests that the software that
+    created it expects receiving software to have greater capabilities
+    for numeric magnitude and precision than is widely available.
+    
+    Note that when such software is used, numbers that are integers and
+    are in the range $[-2^{53}+1, 2^{53}-1]$ are interoperable in the
+    sense that implementations will agree exactly on their numeric
+    values.
+
+## Library implementation
+
+This section describes how the above number specification is implemented by this library.
+
+### Number storage
+
+In the default [`json`](../../api/json.md) type, numbers are stored as `#!c std::uint64_t`, `#!c std::int64_t`, and
+`#!c double`,  respectively. Thereby, `#!c std::uint64_t` and `#!c std::int64_t` are used only if they can store the 
+number without loss of  precision. If this is impossible (e.g., if the number is too large), the number is stored as
+`#!c double`.
+
+!!! info "Notes"
+
+    - Numbers with a decimal digit or scientific notation are always stored as `#!c double`.
+    - The number types can be changed, see [Template number types](#template-number-types). 
+    - As of version 3.9.1, the conversion is realized by
+      [`std::strtoull`](https://en.cppreference.com/w/cpp/string/byte/strtoul),
+      [`std::strtoll`](https://en.cppreference.com/w/cpp/string/byte/strtol), and
+      [`std::strtod`](https://en.cppreference.com/w/cpp/string/byte/strtof), respectively.
+
+!!! example "Examples"
+
+    - Integer `#!c -12345678912345789123456789` is smaller than `#!c INT64_MIN` and will be stored as floating-point
+      number `#!c -1.2345678912345788e+25`.
+    - Integer `#!c 1E3` will be stored as floating-point number `#!c 1000.0`.
+
+### Number limits
+
+- Any 64 bit signed or unsigned integer can be stored without loss of precision.
+- Numbers exceeding the limits of `#!c double` (i.e., numbers that after conversion via
+[`std::strtod`](https://en.cppreference.com/w/cpp/string/byte/strtof) are not satisfying
+[`std::isfinite`](https://en.cppreference.com/w/cpp/numeric/math/isfinite) such as `#!c 1E400`) will throw exception
+[`json.exception.out_of_range.406`](../../home/exceptions.md#jsonexceptionout_of_range406) during parsing.
+- Floating-point numbers are rounded to the next number representable as `double`. For instance
+`#!c 3.141592653589793238462643383279` is stored as [`0x400921fb54442d18`](https://float.exposed/0x400921fb54442d18).
+This is the same behavior as the code `#!c double x = 3.141592653589793238462643383279;`.
+
+!!! success "Interoperability"
+
+    - The library interoperable with respect to the specification, because its supported range $[-2^{63}, 2^{64}-1]$ is
+      larger than the described range $[-2^{53}+1, 2^{53}-1]$.
+    - All integers outside the range $[-2^{63}, 2^{64}-1]$, as well as floating-point numbers are stored as `double`.
+      This also concurs with the specification above.
+
+### Number serialization
+
+- Integer numbers are serialized as is; that is, no scientific notation is used.
+- Floating-point numbers are serialized as specified by the `#!c %g` printf modifier with 
+  [`std::numeric_limits<double>::max_digits10`](https://en.cppreference.com/w/cpp/types/numeric_limits/max_digits10)
+  significant digits). The rationale is to use the shortest representation while still allow round-tripping.
+
+!!! hint "Notes regarding precision of floating-point numbers"
+
+    As described above, floating-point numbers are rounded to the nearest double and serialized with the shortest
+    representation to allow round-tripping. This can yield confusing examples:
+
+    - The serialization can have fewer decimal places than the input: `#!c 2555.5599999999999` will be serialized as
+      `#!c 2555.56`. The reverse can also be true.
+    - The serialization can be in scientific notation even if the input is not: `#!c 0.0000972439793401814` will be 
+      serialized as `#!c 9.72439793401814e-05`. The reverse can also be true: `#!c 12345E-5` will be serialized as
+      `#!c 0.12345`.
+    - Conversions from `#!c float` to `#!c double` can also introduce rouding errors:
+        ```cpp
+        float f = 0.3;
+        json j = f;
+        std::cout << j << '\n';
+        ```
+        yields `#!c 0.30000001192092896`.
+
+    All examples here can be reproduced by passing the original double value to
+
+    ```cpp
+    std::printf("%.*g\n", std::numeric_limits<double>::max_digits10, double_value);
+    ```
+
+#### NaN handling
+
+NaN (not-a-number) cannot be expressed with the number syntax described above and are in fact explicitly excluded:
+
+!!! quote "[RFC 8259](https://tools.ietf.org/html/rfc8259#section-6), Section 6"
+
+    Numeric values that cannot be represented in the grammar below (such
+    as Infinity and NaN) are not permitted.
+
+That is, there is no way to *parse* a NaN value. However, NaN values can be stored in a JSON value by assignment.
+
+This library serializes NaN values  as `#!js null`. This corresponds to the behavior of JavaScript's
+[`JSON.stringify`](https://www.w3schools.com/js/js_json_stringify.asp) function.
+
+!!! example
+
+    The following example shows how a NaN value is stored in a `json` value.
+
+    ```cpp
+    int main()
+    {
+        double val = std::numeric_limits<double>::quiet_NaN();
+        std::cout << "val=" << val << std::endl;
+        json j = val;
+        std::cout << "j=" << j.dump() << std::endl;
+        val = j;
+        std::cout << "val=" << val << std::endl;
+    }
+    ```
+    
+    output:
+    
+    ```
+    val=nan
+    j=null
+    val=nan
+    ```
+
+### Number comparison
+
+Floating-point inside JSON values numbers are compared with `#!c json::number_float_t::operator==` which is
+`#!c double::operator==` by default.
+
+!!! example "Alternative comparison functions"
+
+    To compare floating-point while respecting an epsilon, an alternative
+    [comparison function](https://github.com/mariokonrad/marnav/blob/master/include/marnav/math/floatingpoint.hpp#L34-#L39)
+    could be used, for instance
+    
+    ```cpp
+    template<typename T, typename = typename std::enable_if<std::is_floating_point<T>::value, T>::type>
+    inline bool is_same(T a, T b, T epsilon = std::numeric_limits<T>::epsilon()) noexcept
+    {
+        return std::abs(a - b) <= epsilon;
+    }
+    ```
+    Or you can self-define an operator equal function like this:
+    
+    ```cpp
+    bool my_equal(const_reference lhs, const_reference rhs)
+    {
+        const auto lhs_type lhs.type();
+        const auto rhs_type rhs.type();
+        if (lhs_type == rhs_type)
+        {
+            switch(lhs_type)
+            {
+                // self_defined case
+                case value_t::number_float:
+                    return std::abs(lhs - rhs) <= std::numeric_limits<float>::epsilon();
+        
+                // other cases remain the same with the original
+                ...
+            }
+        }
+        ...
+    }
+    ```
+    
+    (see [#703](https://github.com/nlohmann/json/issues/703) for more information.)
+    
+!!! note
+
+    NaN values never compare equal to themselves or to other NaN values. See [#514](https://github.com/nlohmann/json/issues/514).
+
+### Number conversion
+
+Just like the C++ language itself, the `get` family of functions allows conversions between unsigned and signed
+integers, and  between integers and floating-point values to integers. This behavior may be surprising.
+
+!!! warning "Unconditional number conversions"
+
+    ```cpp hl_lines="3"
+    double d = 42.3;                          // non-integer double value 42.3
+    json jd = d;                              // stores double value 42.3
+    std::int64_t i = jd.get<std::int64_t>();  // now i==42; no warning or error is produced
+    ```
+
+    Note the last line with throw a [`json.exception.type_error.302`](../../home/exceptions.md#jsonexceptiontype_error302)
+    exception if `jd` is not a numerical type, for instance a string.
+
+The rationale is twofold:
+
+1. JSON does not define a number type or precision (see [#json-specification](above)).
+2. C++ also allows to silently convert between number types.
+
+!!! success "Conditional number conversion"
+
+    The code above can be solved by explicitly checking the nature of the value with members such as
+    [`is_number_integer()`](../../api/basic_json/is_number_integer.md) or
+    [`is_number_unsigned()`](../../api/basic_json/is_number_unsigned.md):
+
+    ```cpp hl_lines="2"
+    // check if jd is really integer-valued
+    if (jd.is_number_integer())
+    {
+        // if so, do the conversion and use i
+        std::int64_t i = jd.get<std::int64_t>();
+        // ...
+    }
+    else
+    {
+        // otherwise, take appropriate action
+        // ...
+    }
+    ```
+
+    Note this approach also has the advantage that it can react on non-numerical JSON value types such as strings.
+
+    (Example taken from [#777](https://github.com/nlohmann/json/issues/777#issuecomment-459968458).)
+
+### Determine number types
+
+As the example in [Number conversion](#number_conversion) shows, there are different functions to determine the type of
+the stored number:
+
+- [`is_number()`](../../api/basic_json/is_number.md) returns `#!c true` for any number type
+- [`is_number_integer()`](../../api/basic_json/is_number_integer.md) returns `#!c true` for signed and unsigned integers
+- [`is_number_unsigned()`](../../api/basic_json/is_number_unsigned.md) returns `#!c true` for unsigned integers only
+- [`is_number_float()`](../../api/basic_json/is_number_float.md) returns `#!c true` for floating-point numbers
+- [`type_name()`](../../api/basic_json/type_name.md) returns `#!c "number"` for any number type
+- [`type()`](../../api/basic_json/type.md) returns an different enumerator of
+  [`value_t`](../../api/basic_json/value_t.md) for all number types
+
+| function | unsigned integer | signed integer | floating-point | string |
+| -------- | ---------------- | -------------- | -------------- | ------ |
+| [`is_number()`](../../api/basic_json/is_number.md) | `#!c true` | `#!c true` | `#!c true` | `#!c false` |
+| [`is_number_integer()`](../../api/basic_json/is_number_integer.md) | `#!c true` | `#!c true` | `#!c false` | `#!c false` |
+| [`is_number_unsigned()`](../../api/basic_json/is_number_unsigned.md) | `#!c true` | `#!c false` | `#!c false` | `#!c false` |
+| [`is_number_float()`](../../api/basic_json/is_number_float.md) | `#!c false` | `#!c false` | `#!c true` | `#!c false` |
+| [`type_name()`](../../api/basic_json/type_name.md) | `#!c "number"` | `#!c "number"` | `#!c "number"` | `#!c "string"` |
+| [`type()`](../../api/basic_json/type.md) | `number_unsigned` | `number_integer` | `number_float` | `string` |
+
+### Template number types
+
+The number types can be changed with template parameters.
+
+| position | number type | default type | possible values |
+| -------- | ----------- | ------------ | --------------- |
+| 5        | signed integers | `#!c std::int64_t` | `#!c std::int32_t`, `#!c std::int16_t`, etc. |
+| 6        | unsigned integers | `#!c std::uint64_t` | `#!c std::uint32_t`, `#!c std::uint16_t`, etc. |
+| 7        | floating-point | `#!c double` | `#!c float`, `#!c long double` |
+
+!!! info "Constraints on number types"
+
+    - The type for signed integers must be convertible from `#!c long long`. The type for floating-point numbers is used
+      in case of overflow.
+    - The type for unsigned integers must be convertible from `#!c unsigned long long`.  The type for floating-point
+      numbers is used in case of overflow.
+    - The types for signed and unsigned integers must be distinct, see
+      [#2573](https://github.com/nlohmann/json/issues/2573).
+    - Only `#!c double`, `#!c float`, and `#!c long double` are supported for floating-point numbers.
+
+!!! example
+
+    A `basic_json` type that uses `#!c long double` as floating-point type.
+
+    ```cpp hl_lines="2"
+    using json_ld = nlohmann::basic_json<std::map, std::vector, std::string, bool,
+                                         std::int64_t, std::uint64_t, long double>;
+    ```
+
+    Note values should then be parsed with `json_ld::parse` rather than `json::parse` as the latter would parse
+    floating-point values to `#!c double` before then converting them to `#!c long double`.