mirror of
https://github.com/postgres/postgres.git
synced 2025-04-24 10:47:04 +03:00
878 lines
34 KiB
Plaintext
878 lines
34 KiB
Plaintext
<!-- doc/src/sgml/json.sgml -->
|
|
|
|
<sect1 id="datatype-json">
|
|
<title><acronym>JSON</acronym> Types</title>
|
|
|
|
<indexterm zone="datatype-json">
|
|
<primary>JSON</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="datatype-json">
|
|
<primary>JSONB</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
JSON data types are for storing JSON (JavaScript Object Notation)
|
|
data, as specified in <ulink url="https://tools.ietf.org/html/rfc7159">RFC
|
|
7159</ulink>. Such data can also be stored as <type>text</type>, but
|
|
the JSON data types have the advantage of enforcing that each
|
|
stored value is valid according to the JSON rules. There are also
|
|
assorted JSON-specific functions and operators available for data stored
|
|
in these data types; see <xref linkend="functions-json"/>.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> offers two types for storing JSON
|
|
data: <type>json</type> and <type>jsonb</type>. To implement effective query
|
|
mechanisms for these data types, <productname>PostgreSQL</productname>
|
|
also provides the <type>jsonpath</type> data type described in
|
|
<xref linkend="datatype-jsonpath"/>.
|
|
</para>
|
|
|
|
<para>
|
|
The <type>json</type> and <type>jsonb</type> data types
|
|
accept <emphasis>almost</emphasis> identical sets of values as
|
|
input. The major practical difference is one of efficiency. The
|
|
<type>json</type> data type stores an exact copy of the input text,
|
|
which processing functions must reparse on each execution; while
|
|
<type>jsonb</type> data is stored in a decomposed binary format that
|
|
makes it slightly slower to input due to added conversion
|
|
overhead, but significantly faster to process, since no reparsing
|
|
is needed. <type>jsonb</type> also supports indexing, which can be a
|
|
significant advantage.
|
|
</para>
|
|
|
|
<para>
|
|
Because the <type>json</type> type stores an exact copy of the input text, it
|
|
will preserve semantically-insignificant white space between tokens, as
|
|
well as the order of keys within JSON objects. Also, if a JSON object
|
|
within the value contains the same key more than once, all the key/value
|
|
pairs are kept. (The processing functions consider the last value as the
|
|
operative one.) By contrast, <type>jsonb</type> does not preserve white
|
|
space, does not preserve the order of object keys, and does not keep
|
|
duplicate object keys. If duplicate keys are specified in the input,
|
|
only the last value is kept.
|
|
</para>
|
|
|
|
<para>
|
|
In general, most applications should prefer to store JSON data as
|
|
<type>jsonb</type>, unless there are quite specialized needs, such as
|
|
legacy assumptions about ordering of object keys.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> allows only one character set
|
|
encoding per database. It is therefore not possible for the JSON
|
|
types to conform rigidly to the JSON specification unless the database
|
|
encoding is UTF8. Attempts to directly include characters that
|
|
cannot be represented in the database encoding will fail; conversely,
|
|
characters that can be represented in the database encoding but not
|
|
in UTF8 will be allowed.
|
|
</para>
|
|
|
|
<para>
|
|
RFC 7159 permits JSON strings to contain Unicode escape sequences
|
|
denoted by <literal>\u<replaceable>XXXX</replaceable></literal>. In the input
|
|
function for the <type>json</type> type, Unicode escapes are allowed
|
|
regardless of the database encoding, and are checked only for syntactic
|
|
correctness (that is, that four hex digits follow <literal>\u</literal>).
|
|
However, the input function for <type>jsonb</type> is stricter: it disallows
|
|
Unicode escapes for non-ASCII characters (those above <literal>U+007F</literal>)
|
|
unless the database encoding is UTF8. The <type>jsonb</type> type also
|
|
rejects <literal>\u0000</literal> (because that cannot be represented in
|
|
<productname>PostgreSQL</productname>'s <type>text</type> type), and it insists
|
|
that any use of Unicode surrogate pairs to designate characters outside
|
|
the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes
|
|
are converted to the equivalent ASCII or UTF8 character for storage;
|
|
this includes folding surrogate pairs into a single character.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Many of the JSON processing functions described
|
|
in <xref linkend="functions-json"/> will convert Unicode escapes to
|
|
regular characters, and will therefore throw the same types of errors
|
|
just described even if their input is of type <type>json</type>
|
|
not <type>jsonb</type>. The fact that the <type>json</type> input function does
|
|
not make these checks may be considered a historical artifact, although
|
|
it does allow for simple storage (without processing) of JSON Unicode
|
|
escapes in a non-UTF8 database encoding. In general, it is best to
|
|
avoid mixing Unicode escapes in JSON with a non-UTF8 database encoding,
|
|
if possible.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
When converting textual JSON input into <type>jsonb</type>, the primitive
|
|
types described by <acronym>RFC</acronym> 7159 are effectively mapped onto
|
|
native <productname>PostgreSQL</productname> types, as shown
|
|
in <xref linkend="json-type-mapping-table"/>.
|
|
Therefore, there are some minor additional constraints on what
|
|
constitutes valid <type>jsonb</type> data that do not apply to
|
|
the <type>json</type> type, nor to JSON in the abstract, corresponding
|
|
to limits on what can be represented by the underlying data type.
|
|
Notably, <type>jsonb</type> will reject numbers that are outside the
|
|
range of the <productname>PostgreSQL</productname> <type>numeric</type> data
|
|
type, while <type>json</type> will not. Such implementation-defined
|
|
restrictions are permitted by <acronym>RFC</acronym> 7159. However, in
|
|
practice such problems are far more likely to occur in other
|
|
implementations, as it is common to represent JSON's <type>number</type>
|
|
primitive type as IEEE 754 double precision floating point
|
|
(which <acronym>RFC</acronym> 7159 explicitly anticipates and allows for).
|
|
When using JSON as an interchange format with such systems, the danger
|
|
of losing numeric precision compared to data originally stored
|
|
by <productname>PostgreSQL</productname> should be considered.
|
|
</para>
|
|
|
|
<para>
|
|
Conversely, as noted in the table there are some minor restrictions on
|
|
the input format of JSON primitive types that do not apply to
|
|
the corresponding <productname>PostgreSQL</productname> types.
|
|
</para>
|
|
|
|
<table id="json-type-mapping-table">
|
|
<title>JSON Primitive Types and Corresponding <productname>PostgreSQL</productname> Types</title>
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry>JSON primitive type</entry>
|
|
<entry><productname>PostgreSQL</productname> type</entry>
|
|
<entry>Notes</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><type>string</type></entry>
|
|
<entry><type>text</type></entry>
|
|
<entry><literal>\u0000</literal> is disallowed, as are non-ASCII Unicode
|
|
escapes if database encoding is not UTF8</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>number</type></entry>
|
|
<entry><type>numeric</type></entry>
|
|
<entry><literal>NaN</literal> and <literal>infinity</literal> values are disallowed</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>boolean</type></entry>
|
|
<entry><type>boolean</type></entry>
|
|
<entry>Only lowercase <literal>true</literal> and <literal>false</literal> spellings are accepted</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>null</type></entry>
|
|
<entry>(none)</entry>
|
|
<entry>SQL <literal>NULL</literal> is a different concept</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<sect2 id="json-keys-elements">
|
|
<title>JSON Input and Output Syntax</title>
|
|
<para>
|
|
The input/output syntax for the JSON data types is as specified in
|
|
<acronym>RFC</acronym> 7159.
|
|
</para>
|
|
<para>
|
|
The following are all valid <type>json</type> (or <type>jsonb</type>) expressions:
|
|
<programlisting>
|
|
-- Simple scalar/primitive value
|
|
-- Primitive values can be numbers, quoted strings, true, false, or null
|
|
SELECT '5'::json;
|
|
|
|
-- Array of zero or more elements (elements need not be of same type)
|
|
SELECT '[1, 2, "foo", null]'::json;
|
|
|
|
-- Object containing pairs of keys and values
|
|
-- Note that object keys must always be quoted strings
|
|
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;
|
|
|
|
-- Arrays and objects can be nested arbitrarily
|
|
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
As previously stated, when a JSON value is input and then printed without
|
|
any additional processing, <type>json</type> outputs the same text that was
|
|
input, while <type>jsonb</type> does not preserve semantically-insignificant
|
|
details such as whitespace. For example, note the differences here:
|
|
<programlisting>
|
|
SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;
|
|
json
|
|
-------------------------------------------------
|
|
{"bar": "baz", "balance": 7.77, "active":false}
|
|
(1 row)
|
|
|
|
SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb;
|
|
jsonb
|
|
--------------------------------------------------
|
|
{"bar": "baz", "active": false, "balance": 7.77}
|
|
(1 row)
|
|
</programlisting>
|
|
One semantically-insignificant detail worth noting is that
|
|
in <type>jsonb</type>, numbers will be printed according to the behavior of the
|
|
underlying <type>numeric</type> type. In practice this means that numbers
|
|
entered with <literal>E</literal> notation will be printed without it, for
|
|
example:
|
|
<programlisting>
|
|
SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
|
|
json | jsonb
|
|
-----------------------+-------------------------
|
|
{"reading": 1.230e-5} | {"reading": 0.00001230}
|
|
(1 row)
|
|
</programlisting>
|
|
However, <type>jsonb</type> will preserve trailing fractional zeroes, as seen
|
|
in this example, even though those are semantically insignificant for
|
|
purposes such as equality checks.
|
|
</para>
|
|
|
|
<para>
|
|
For the list of built-in functions and operators available for
|
|
constructing and processing JSON values, see <xref linkend="functions-json"/>.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="json-doc-design">
|
|
<title>Designing JSON Documents Effectively</title>
|
|
<para>
|
|
Representing data as JSON can be considerably more flexible than
|
|
the traditional relational data model, which is compelling in
|
|
environments where requirements are fluid. It is quite possible
|
|
for both approaches to co-exist and complement each other within
|
|
the same application. However, even for applications where maximal
|
|
flexibility is desired, it is still recommended that JSON documents
|
|
have a somewhat fixed structure. The structure is typically
|
|
unenforced (though enforcing some business rules declaratively is
|
|
possible), but having a predictable structure makes it easier to write
|
|
queries that usefully summarize a set of <quote>documents</quote> (datums)
|
|
in a table.
|
|
</para>
|
|
<para>
|
|
JSON data is subject to the same concurrency-control
|
|
considerations as any other data type when stored in a table.
|
|
Although storing large documents is practicable, keep in mind that
|
|
any update acquires a row-level lock on the whole row.
|
|
Consider limiting JSON documents to a
|
|
manageable size in order to decrease lock contention among updating
|
|
transactions. Ideally, JSON documents should each
|
|
represent an atomic datum that business rules dictate cannot
|
|
reasonably be further subdivided into smaller datums that
|
|
could be modified independently.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="json-containment">
|
|
<title><type>jsonb</type> Containment and Existence</title>
|
|
<indexterm>
|
|
<primary>jsonb</primary>
|
|
<secondary>containment</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>jsonb</primary>
|
|
<secondary>existence</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Testing <firstterm>containment</firstterm> is an important capability of
|
|
<type>jsonb</type>. There is no parallel set of facilities for the
|
|
<type>json</type> type. Containment tests whether
|
|
one <type>jsonb</type> document has contained within it another one.
|
|
These examples return true except as noted:
|
|
</para>
|
|
<programlisting>
|
|
-- Simple scalar/primitive values contain only the identical value:
|
|
SELECT '"foo"'::jsonb @> '"foo"'::jsonb;
|
|
|
|
-- The array on the right side is contained within the one on the left:
|
|
SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;
|
|
|
|
-- Order of array elements is not significant, so this is also true:
|
|
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;
|
|
|
|
-- Duplicate array elements don't matter either:
|
|
SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb;
|
|
|
|
-- The object with a single pair on the right side is contained
|
|
-- within the object on the left side:
|
|
SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @> '{"version": 9.4}'::jsonb;
|
|
|
|
-- The array on the right side is <emphasis>not</emphasis> considered contained within the
|
|
-- array on the left, even though a similar array is nested within it:
|
|
SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false
|
|
|
|
-- But with a layer of nesting, it is contained:
|
|
SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;
|
|
|
|
-- Similarly, containment is not reported here:
|
|
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- yields false
|
|
|
|
-- A top-level key and an empty object is contained:
|
|
SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb;
|
|
</programlisting>
|
|
|
|
<para>
|
|
The general principle is that the contained object must match the
|
|
containing object as to structure and data contents, possibly after
|
|
discarding some non-matching array elements or object key/value pairs
|
|
from the containing object.
|
|
But remember that the order of array elements is not significant when
|
|
doing a containment match, and duplicate array elements are effectively
|
|
considered only once.
|
|
</para>
|
|
|
|
<para>
|
|
As a special exception to the general principle that the structures
|
|
must match, an array may contain a primitive value:
|
|
</para>
|
|
<programlisting>
|
|
-- This array contains the primitive string value:
|
|
SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb;
|
|
|
|
-- This exception is not reciprocal -- non-containment is reported here:
|
|
SELECT '"bar"'::jsonb @> '["bar"]'::jsonb; -- yields false
|
|
</programlisting>
|
|
|
|
<para>
|
|
<type>jsonb</type> also has an <firstterm>existence</firstterm> operator, which is
|
|
a variation on the theme of containment: it tests whether a string
|
|
(given as a <type>text</type> value) appears as an object key or array
|
|
element at the top level of the <type>jsonb</type> value.
|
|
These examples return true except as noted:
|
|
</para>
|
|
<programlisting>
|
|
-- String exists as array element:
|
|
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
|
|
|
|
-- String exists as object key:
|
|
SELECT '{"foo": "bar"}'::jsonb ? 'foo';
|
|
|
|
-- Object values are not considered:
|
|
SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false
|
|
|
|
-- As with containment, existence must match at the top level:
|
|
SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false
|
|
|
|
-- A string is considered to exist if it matches a primitive JSON string:
|
|
SELECT '"foo"'::jsonb ? 'foo';
|
|
</programlisting>
|
|
|
|
<para>
|
|
JSON objects are better suited than arrays for testing containment or
|
|
existence when there are many keys or elements involved, because
|
|
unlike arrays they are internally optimized for searching, and do not
|
|
need to be searched linearly.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
Because JSON containment is nested, an appropriate query can skip
|
|
explicit selection of sub-objects. As an example, suppose that we have
|
|
a <structfield>doc</structfield> column containing objects at the top level, with
|
|
most objects containing <literal>tags</literal> fields that contain arrays of
|
|
sub-objects. This query finds entries in which sub-objects containing
|
|
both <literal>"term":"paris"</literal> and <literal>"term":"food"</literal> appear,
|
|
while ignoring any such keys outside the <literal>tags</literal> array:
|
|
<programlisting>
|
|
SELECT doc->'site_name' FROM websites
|
|
WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}';
|
|
</programlisting>
|
|
One could accomplish the same thing with, say,
|
|
<programlisting>
|
|
SELECT doc->'site_name' FROM websites
|
|
WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]';
|
|
</programlisting>
|
|
but that approach is less flexible, and often less efficient as well.
|
|
</para>
|
|
|
|
<para>
|
|
On the other hand, the JSON existence operator is not nested: it will
|
|
only look for the specified key or array element at top level of the
|
|
JSON value.
|
|
</para>
|
|
</tip>
|
|
|
|
<para>
|
|
The various containment and existence operators, along with all other
|
|
JSON operators and functions are documented
|
|
in <xref linkend="functions-json"/>.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="json-indexing">
|
|
<title><type>jsonb</type> Indexing</title>
|
|
<indexterm>
|
|
<primary>jsonb</primary>
|
|
<secondary>indexes on</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
GIN indexes can be used to efficiently search for
|
|
keys or key/value pairs occurring within a large number of
|
|
<type>jsonb</type> documents (datums).
|
|
Two GIN <quote>operator classes</quote> are provided, offering different
|
|
performance and flexibility trade-offs.
|
|
</para>
|
|
<para>
|
|
The default GIN operator class for <type>jsonb</type> supports queries with
|
|
top-level key-exists operators <literal>?</literal>, <literal>?&</literal>
|
|
and <literal>?|</literal> operators and path/value-exists operator
|
|
<literal>@></literal>.
|
|
(For details of the semantics that these operators
|
|
implement, see <xref linkend="functions-jsonb-op-table"/>.)
|
|
An example of creating an index with this operator class is:
|
|
<programlisting>
|
|
CREATE INDEX idxgin ON api USING GIN (jdoc);
|
|
</programlisting>
|
|
The non-default GIN operator class <literal>jsonb_path_ops</literal>
|
|
supports indexing the <literal>@></literal> operator only.
|
|
An example of creating an index with this operator class is:
|
|
<programlisting>
|
|
CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Consider the example of a table that stores JSON documents
|
|
retrieved from a third-party web service, with a documented schema
|
|
definition. A typical document is:
|
|
<programlisting>
|
|
{
|
|
"guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
|
|
"name": "Angela Barton",
|
|
"is_active": true,
|
|
"company": "Magnafone",
|
|
"address": "178 Howard Place, Gulf, Washington, 702",
|
|
"registered": "2009-11-07T08:53:22 +08:00",
|
|
"latitude": 19.793713,
|
|
"longitude": 86.513373,
|
|
"tags": [
|
|
"enim",
|
|
"aliquip",
|
|
"qui"
|
|
]
|
|
}
|
|
</programlisting>
|
|
We store these documents in a table named <structname>api</structname>,
|
|
in a <type>jsonb</type> column named <structfield>jdoc</structfield>.
|
|
If a GIN index is created on this column,
|
|
queries like the following can make use of the index:
|
|
<programlisting>
|
|
-- Find documents in which the key "company" has value "Magnafone"
|
|
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
|
|
</programlisting>
|
|
However, the index could not be used for queries like the
|
|
following, because though the operator <literal>?</literal> is indexable,
|
|
it is not applied directly to the indexed column <structfield>jdoc</structfield>:
|
|
<programlisting>
|
|
-- Find documents in which the key "tags" contains key or array element "qui"
|
|
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
|
|
</programlisting>
|
|
Still, with appropriate use of expression indexes, the above
|
|
query can use an index. If querying for particular items within
|
|
the <literal>"tags"</literal> key is common, defining an index like this
|
|
may be worthwhile:
|
|
<programlisting>
|
|
CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));
|
|
</programlisting>
|
|
Now, the <literal>WHERE</literal> clause <literal>jdoc -> 'tags' ? 'qui'</literal>
|
|
will be recognized as an application of the indexable
|
|
operator <literal>?</literal> to the indexed
|
|
expression <literal>jdoc -> 'tags'</literal>.
|
|
(More information on expression indexes can be found in <xref
|
|
linkend="indexes-expressional"/>.)
|
|
</para>
|
|
<para>
|
|
Also, GIN index supports <literal>@@</literal> and <literal>@?</literal>
|
|
operators, which perform <literal>jsonpath</literal> matching.
|
|
<programlisting>
|
|
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';
|
|
</programlisting>
|
|
<programlisting>
|
|
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] ? (@ == "qui")';
|
|
</programlisting>
|
|
GIN index extracts statements of following form out of
|
|
<literal>jsonpath</literal>: <literal>accessors_chain = const</literal>.
|
|
Accessors chain may consist of <literal>.key</literal>,
|
|
<literal>[*]</literal> and <literal>[index]</literal> accessors.
|
|
<literal>jsonb_ops</literal> additionally supports <literal>.*</literal>
|
|
and <literal>.**</literal> statements.
|
|
</para>
|
|
<para>
|
|
Another approach to querying is to exploit containment, for example:
|
|
<programlisting>
|
|
-- Find documents in which the key "tags" contains array element "qui"
|
|
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';
|
|
</programlisting>
|
|
A simple GIN index on the <structfield>jdoc</structfield> column can support this
|
|
query. But note that such an index will store copies of every key and
|
|
value in the <structfield>jdoc</structfield> column, whereas the expression index
|
|
of the previous example stores only data found under
|
|
the <literal>tags</literal> key. While the simple-index approach is far more
|
|
flexible (since it supports queries about any key), targeted expression
|
|
indexes are likely to be smaller and faster to search than a simple
|
|
index.
|
|
</para>
|
|
|
|
<para>
|
|
Although the <literal>jsonb_path_ops</literal> operator class supports
|
|
only queries with the <literal>@></literal>, <literal>@@</literal>
|
|
and <literal>@?</literal> operators, it has notable
|
|
performance advantages over the default operator
|
|
class <literal>jsonb_ops</literal>. A <literal>jsonb_path_ops</literal>
|
|
index is usually much smaller than a <literal>jsonb_ops</literal>
|
|
index over the same data, and the specificity of searches is better,
|
|
particularly when queries contain keys that appear frequently in the
|
|
data. Therefore search operations typically perform better
|
|
than with the default operator class.
|
|
</para>
|
|
|
|
<para>
|
|
The technical difference between a <literal>jsonb_ops</literal>
|
|
and a <literal>jsonb_path_ops</literal> GIN index is that the former
|
|
creates independent index items for each key and value in the data,
|
|
while the latter creates index items only for each value in the
|
|
data.
|
|
<footnote>
|
|
<para>
|
|
For this purpose, the term <quote>value</quote> includes array elements,
|
|
though JSON terminology sometimes considers array elements distinct
|
|
from values within objects.
|
|
</para>
|
|
</footnote>
|
|
Basically, each <literal>jsonb_path_ops</literal> index item is
|
|
a hash of the value and the key(s) leading to it; for example to index
|
|
<literal>{"foo": {"bar": "baz"}}</literal>, a single index item would
|
|
be created incorporating all three of <literal>foo</literal>, <literal>bar</literal>,
|
|
and <literal>baz</literal> into the hash value. Thus a containment query
|
|
looking for this structure would result in an extremely specific index
|
|
search; but there is no way at all to find out whether <literal>foo</literal>
|
|
appears as a key. On the other hand, a <literal>jsonb_ops</literal>
|
|
index would create three index items representing <literal>foo</literal>,
|
|
<literal>bar</literal>, and <literal>baz</literal> separately; then to do the
|
|
containment query, it would look for rows containing all three of
|
|
these items. While GIN indexes can perform such an AND search fairly
|
|
efficiently, it will still be less specific and slower than the
|
|
equivalent <literal>jsonb_path_ops</literal> search, especially if
|
|
there are a very large number of rows containing any single one of the
|
|
three index items.
|
|
</para>
|
|
|
|
<para>
|
|
A disadvantage of the <literal>jsonb_path_ops</literal> approach is
|
|
that it produces no index entries for JSON structures not containing
|
|
any values, such as <literal>{"a": {}}</literal>. If a search for
|
|
documents containing such a structure is requested, it will require a
|
|
full-index scan, which is quite slow. <literal>jsonb_path_ops</literal> is
|
|
therefore ill-suited for applications that often perform such searches.
|
|
</para>
|
|
|
|
<para>
|
|
<type>jsonb</type> also supports <literal>btree</literal> and <literal>hash</literal>
|
|
indexes. These are usually useful only if it's important to check
|
|
equality of complete JSON documents.
|
|
The <literal>btree</literal> ordering for <type>jsonb</type> datums is seldom
|
|
of great interest, but for completeness it is:
|
|
<synopsis>
|
|
<replaceable>Object</replaceable> > <replaceable>Array</replaceable> > <replaceable>Boolean</replaceable> > <replaceable>Number</replaceable> > <replaceable>String</replaceable> > <replaceable>Null</replaceable>
|
|
|
|
<replaceable>Object with n pairs</replaceable> > <replaceable>object with n - 1 pairs</replaceable>
|
|
|
|
<replaceable>Array with n elements</replaceable> > <replaceable>array with n - 1 elements</replaceable>
|
|
</synopsis>
|
|
Objects with equal numbers of pairs are compared in the order:
|
|
<synopsis>
|
|
<replaceable>key-1</replaceable>, <replaceable>value-1</replaceable>, <replaceable>key-2</replaceable> ...
|
|
</synopsis>
|
|
Note that object keys are compared in their storage order;
|
|
in particular, since shorter keys are stored before longer keys, this
|
|
can lead to results that might be unintuitive, such as:
|
|
<programlisting>
|
|
{ "aa": 1, "c": 1} > {"b": 1, "d": 1}
|
|
</programlisting>
|
|
Similarly, arrays with equal numbers of elements are compared in the
|
|
order:
|
|
<synopsis>
|
|
<replaceable>element-1</replaceable>, <replaceable>element-2</replaceable> ...
|
|
</synopsis>
|
|
Primitive JSON values are compared using the same
|
|
comparison rules as for the underlying
|
|
<productname>PostgreSQL</productname> data type. Strings are
|
|
compared using the default database collation.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Transforms</title>
|
|
|
|
<para>
|
|
Additional extensions are available that implement transforms for the
|
|
<type>jsonb</type> type for different procedural languages.
|
|
</para>
|
|
|
|
<para>
|
|
The extensions for PL/Perl are called <literal>jsonb_plperl</literal> and
|
|
<literal>jsonb_plperlu</literal>. If you use them, <type>jsonb</type>
|
|
values are mapped to Perl arrays, hashes, and scalars, as appropriate.
|
|
</para>
|
|
|
|
<para>
|
|
The extensions for PL/Python are called <literal>jsonb_plpythonu</literal>,
|
|
<literal>jsonb_plpython2u</literal>, and
|
|
<literal>jsonb_plpython3u</literal> (see <xref
|
|
linkend="plpython-python23"/> for the PL/Python naming convention). If you
|
|
use them, <type>jsonb</type> values are mapped to Python dictionaries,
|
|
lists, and scalars, as appropriate.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="datatype-jsonpath">
|
|
<title>jsonpath Type</title>
|
|
|
|
<indexterm zone="datatype-jsonpath">
|
|
<primary>jsonpath</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The <type>jsonpath</type> type implements support for the SQL/JSON path language
|
|
in <productname>PostgreSQL</productname> to effectively query JSON data.
|
|
It provides a binary representation of the parsed SQL/JSON path
|
|
expression that specifies the items to be retrieved by the path
|
|
engine from the JSON data for further processing with the
|
|
SQL/JSON query functions.
|
|
</para>
|
|
|
|
<para>
|
|
The SQL/JSON path language is fully integrated into the SQL engine:
|
|
the semantics of its predicates and operators generally follow SQL.
|
|
At the same time, to provide a most natural way of working with JSON data,
|
|
SQL/JSON path syntax uses some of the JavaScript conventions:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Dot <literal>.</literal> is used for member access.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Square brackets <literal>[]</literal> are used for array access.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
SQL/JSON arrays are 0-relative, unlike regular SQL arrays that start from 1.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
An SQL/JSON path expression is an SQL character string literal,
|
|
so it must be enclosed in single quotes when passed to an SQL/JSON
|
|
query function. Following the JavaScript
|
|
conventions, character string literals within the path expression
|
|
must be enclosed in double quotes. Any single quotes within this
|
|
character string literal must be escaped with a single quote
|
|
by the SQL convention.
|
|
</para>
|
|
|
|
<para>
|
|
A path expression consists of a sequence of path elements,
|
|
which can be the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Path literals of JSON primitive types:
|
|
Unicode text, numeric, true, false, or null.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Path variables listed in <xref linkend="type-jsonpath-variables"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Accessor operators listed in <xref linkend="type-jsonpath-accessors"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<type>jsonpath</type> operators and methods listed
|
|
in <xref linkend="functions-sqljson-path-operators"/>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Parentheses, which can be used to provide filter expressions
|
|
or define the order of path evaluation.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
For details on using <type>jsonpath</type> expressions with SQL/JSON
|
|
query functions, see <xref linkend="functions-sqljson-path"/>.
|
|
</para>
|
|
|
|
<table id="type-jsonpath-variables">
|
|
<title><type>jsonpath</type> Variables</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Variable</entry>
|
|
<entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><literal>$</literal></entry>
|
|
<entry>A variable representing the JSON text to be queried
|
|
(the <firstterm>context item</firstterm>).
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>$varname</literal></entry>
|
|
<entry>
|
|
A named variable. Its value can be set by the parameter
|
|
<parameter>vars</parameter> of several JSON processing functions.
|
|
See <xref linkend="functions-json-processing-table"/> and
|
|
its notes for details.
|
|
<!-- TODO: describe PASSING clause once implemented !-->
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>@</literal></entry>
|
|
<entry>A variable representing the result of path evaluation
|
|
in filter expressions.
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table id="type-jsonpath-accessors">
|
|
<title><type>jsonpath</type> Accessors</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Accessor Operator</entry>
|
|
<entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>.<replaceable>key</replaceable></literal>
|
|
</para>
|
|
<para>
|
|
<literal>."$<replaceable>varname</replaceable>"</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Member accessor that returns an object member with
|
|
the specified key. If the key name is a named variable
|
|
starting with <literal>$</literal> or does not meet the
|
|
JavaScript rules of an identifier, it must be enclosed in
|
|
double quotes as a character string literal.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>.*</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Wildcard member accessor that returns the values of all
|
|
members located at the top level of the current object.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>.**</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Recursive wildcard member accessor that processes all levels
|
|
of the JSON hierarchy of the current object and returns all
|
|
the member values, regardless of their nesting level. This
|
|
is a <productname>PostgreSQL</productname> extension of
|
|
the SQL/JSON standard.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>.**{<replaceable>level</replaceable>}</literal>
|
|
</para>
|
|
<para>
|
|
<literal>.**{<replaceable>start_level</replaceable> to
|
|
<replaceable>end_level</replaceable>}</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Same as <literal>.**</literal>, but with a filter over nesting
|
|
levels of JSON hierarchy. Nesting levels are specified as integers.
|
|
Zero level corresponds to the current object. To access the lowest
|
|
nesting level, you can use the <literal>last</literal> keyword.
|
|
This is a <productname>PostgreSQL</productname> extension of
|
|
the SQL/JSON standard.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>[<replaceable>subscript</replaceable>, ...]</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Array element accessor.
|
|
<literal><replaceable>subscript</replaceable></literal> can be
|
|
given in two forms: <literal><replaceable>index</replaceable></literal>
|
|
or <literal><replaceable>start_index</replaceable> to <replaceable>end_index</replaceable></literal>.
|
|
The first form returns a single array element by its index. The second
|
|
form returns an array slice by the range of indexes, including the
|
|
elements that correspond to the provided
|
|
<replaceable>start_index</replaceable> and <replaceable>end_index</replaceable>.
|
|
</para>
|
|
<para>
|
|
The specified <replaceable>index</replaceable> can be an integer, as
|
|
well as an expression returning a single numeric value, which is
|
|
automatically cast to integer. Zero index corresponds to the first
|
|
array element. You can also use the <literal>last</literal> keyword
|
|
to denote the last array element, which is useful for handling arrays
|
|
of unknown length.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<para>
|
|
<literal>[*]</literal>
|
|
</para>
|
|
</entry>
|
|
<entry>
|
|
<para>
|
|
Wildcard array element accessor that returns all array elements.
|
|
</para>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
</sect2>
|
|
</sect1>
|