Update type-coercion discussions to reflect current reality.

2025-07-28 23:42:10 +03:00 · 2000-12-17 05:55:26 +00:00
parent c501e9aabe
commit 7b9fe96812
2 changed files with 204 additions and 143 deletions
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@ -1,4 +1,4 @@
-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.43 2000/12/16 19:33:23 tgl Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.44 2000/12/17 05:55:26 tgl Exp $ -->
 <chapter id="functions">
 <title>Functions and Operators</title>
@ -770,7 +770,7 @@
  <para>
   There are two separate approaches to pattern matching provided by
-   <productname>Postgres</productname>:  The <acronym>SQL</acronym>
+   <productname>Postgres</productname>:  the <acronym>SQL</acronym>
   <function>LIKE</function> operator and
   <acronym>POSIX</acronym>-style regular expressions.
  </para>
@ -2562,8 +2562,9 @@ END
  </informalexample>
  <para>
-   The data types of all possible <replaceable>result</replaceable>
+   The data types of all the <replaceable>result</replaceable>
-   expressions must match.
+   expressions must be coercible to a single output type.
   See <xref linkend="typeconv-union-case"> for more detail.
  </para>
 <synopsis>
--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@ -12,16 +12,17 @@ evaluating mixed-type expressions.
 In many cases a user will not need
 to understand the details of the type conversion mechanism.
 However, the implicit conversions done by <productname>Postgres</productname>
-can affect the apparent results of a query, and these results
+can affect the results of a query.  When necessary, these results
 can be tailored by a user or programmer
 using <emphasis>explicit</emphasis> type coercion.
 </para>
 <para>
 This chapter introduces the <productname>Postgres</productname>
- type conversion mechanisms and conventions.
+type conversion mechanisms and conventions.
 Refer to the relevant sections in the User's Guide and Programmer's Guide
-for more information on specific data types and allowed functions and operators.
+for more information on specific data types and allowed functions and
 operators.
 </para>
 <para>
@ -43,12 +44,13 @@ mixed-type expressions to be meaningful, even with user-defined types.
 </para>
 <para>
-The <productname>Postgres</productname> scanner/parser decodes lexical elements
+The <productname>Postgres</productname> scanner/parser decodes lexical
-into only five fundamental categories: integers, floats, strings, names, and keywords.
+elements into only five fundamental categories: integers, floats, strings,
-Most extended types are first tokenized into strings. The <acronym>SQL</acronym>
+names, and keywords.  Most extended types are first tokenized into
-language definition allows specifying type names with strings, and this mechanism
+strings. The <acronym>SQL</acronym> language definition allows specifying type
-is used by <productname>Postgres</productname>
+names with strings, and this mechanism can be used in
-to start the parser down the correct path. For example, the query
+<productname>Postgres</productname> to start the parser down the correct
 path. For example, the query
 <programlisting>
 tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
@ -59,8 +61,9 @@ tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
 </programlisting>
 has two strings, of type <type>text</type> and <type>point</type>.
-If a type is not specified, then the placeholder type <type>unknown</type>
+If a type is not specified for a string, then the placeholder type
-is assigned initially, to be resolved in later stages as described below.
+<firstterm>unknown</firstterm> is assigned initially, to be resolved in later
 stages as described below.
 </para>
 <para>
@ -88,9 +91,13 @@ Function calls
 </term>
 <listitem>
 <para>
-Much of the <productname>Postgres</productname> type system is built around a rich set of
+Much of the <productname>Postgres</productname> type system is built around a
-functions. Function calls have one or more arguments which, for any specific query,
+rich set of functions. Function calls have one or more arguments which, for
-must be matched to the functions available in the system catalog.
+any specific query, must be matched to the functions available in the system
 catalog.  Since <productname>Postgres</productname> permits function
 overloading, the function name alone does not uniquely identify the function
 to be called --- the parser must select the right function based on the data
 types of the supplied arguments.
 </para>
 </listitem>
 </varlistentry>
@ -100,19 +107,23 @@ Query targets
 </term>
 <listitem>
 <para>
-<acronym>SQL</acronym> INSERT statements place the results of query into a table. The expressions
+<acronym>SQL</acronym> INSERT and UPDATE statements place the results of
-in the query must be matched up with, and perhaps converted to, the target columns of the insert.
+expressions into a table. The expressions in the query must be matched up
 with, and perhaps converted to, the types of the target columns.
 </para>
 </listitem>
 </varlistentry>
 <varlistentry>
 <term>
-UNION queries
+UNION and CASE constructs
 </term>
 <listitem>
 <para>
-Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
+Since all select results from a UNION SELECT statement must appear in a single
 set of columns, the types of the results
 of each SELECT clause must be matched up and converted to a uniform set.
 Similarly, the result expressions of a CASE construct must be coerced to
 a common type so that the CASE expression as a whole has a known output type.
 </para>
 </listitem>
 </varlistentry>
@ -129,7 +140,7 @@ conventions for the <acronym>SQL92</acronym> standard native types such as
 <para>
 The <productname>Postgres</productname> parser uses the convention that all
 type conversion functions take a single argument of the source type and are
-named with the same name as the target type. Any function meeting this
+named with the same name as the target type. Any function meeting these
 criteria is considered to be a valid conversion function, and may be used
 by the parser as such. This simple assumption gives the parser the power
 to explore type conversion possibilities without hardcoding, allowing
@ -139,19 +150,16 @@ extended user-defined types to use these same features transparently.
 <para>
 An additional heuristic is provided in the parser to allow better guesses
 at proper behavior for <acronym>SQL</acronym> standard types. There are
-five categories of types defined: boolean, string, numeric, geometric,
+several basic <firstterm>type categories</firstterm> defined: boolean,
 numeric, string, bitstring, datetime, timespan, geometric, network,
 and user-defined. Each category, with the exception of user-defined, has
-a "preferred type" which is used to resolve ambiguities in candidates.
+a <firstterm>preferred type</firstterm> which is preferentially selected
-Each "user-defined" type is its own "preferred type", so ambiguous
+when there is ambiguity.
-expressions (those with multiple candidate parsing solutions)
+In the user-defined category, each type is its own preferred type.
-with only one user-defined type can resolve to a single best choice, while those with
+Ambiguous expressions (those with multiple candidate parsing solutions)
-multiple user-defined types will remain ambiguous and throw an error.
+can often be resolved when there are multiple possible built-in types, but
-</para>
+they will raise an error when there are multiple choices for user-defined
-
+types.
 <para>
 Ambiguous expressions which have candidate solutions within only one type category are
 likely to resolve, while ambiguous expressions with candidates spanning multiple
 categories are likely to throw an error and ask for clarification from the user.
 </para>
 <sect2>
@ -207,12 +215,8 @@ should use this new function and will no longer do the implicit conversion using
 <sect1 id="typeconv-oper">
 <title>Operators</title>
 <sect2>
 <title>Conversion Procedure</title>
 <procedure>
-<title>Operator Evaluation</title>
+<title>Operator Type Resolution</title>
 <step performance="required">
 <para>
@ -222,15 +226,10 @@ Check for an exact match in the pg_operator system catalog.
 <substeps>
 <step performance="optional">
 <para>
-If one argument of a binary operator is <type>unknown</type>,
+If one argument of a binary operator is <type>unknown</type> type,
-then assume it is the same type as the other argument.
+then assume it is the same type as the other argument for this check.
-</para>
+Other cases involving <type>unknown</type> will never find a match at
-</step>
+this step.
 <step performance="required">
 <para>
 Reverse the arguments, and look for an exact match with an operator which
 points to itself as being commutative.
 If found, then reverse the arguments in the parse tree and use this operator.
 </para>
 </step>
 </substeps>
@ -241,46 +240,63 @@ If found, then reverse the arguments in the parse tree and use this operator.
 Look for the best match.
 </para>
 <substeps>
-<step performance="optional">
+<step performance="required">
 <para>
-Make a list of all operators of the same name.
+Make a list of all operators of the same name for which the input types
 match or can be coerced to match.  (<type>unknown</type> literals are
 assumed to be coercible to anything for this purpose.)  If there is only
 one, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one operator is in the list, use it if the input type can be coerced,
+Run through all candidates and keep those with the most exact matches
-and throw an error if the type cannot be coerced.
+on input types.  Keep all candidates if none have any exact matches.
 If only one candidate remains, use it; else continue to the next step.
 </para>
 <step performance="required">
 <para>
 Run through all candidates and keep those with the most exact or
 binary-compatible matches on input types.  Keep all candidates if none have
 any exact or binary-compatible matches.
 If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-Keep all operators with the most explicit matches for types. Keep all if there
+Run through all candidates and keep those which accept preferred types at
-are no explicit matches and move to the next step.
+the most positions where type coercion will be required.
-If only one candidate remains, use it if the type can be coerced.
+Keep all candidates if none accept preferred types.
 If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If any input arguments are "unknown", categorize the input candidates as
+If any input arguments are "unknown", check the type categories accepted
-boolean, numeric, string, geometric, or user-defined. If there is a mix of
+at those argument positions by the remaining candidates.  At each position,
-categories, or more than one user-defined type, throw an error because
+select "string"
-the correct choice cannot be deduced without more clues.
+category if any candidate accepts that category (this bias towards string
-If only one category is present, then assign the "preferred type"
+is appropriate since an unknown-type literal does look like a string).
-to the input column which had been previously "unknown".
+Otherwise, if all the remaining candidates accept the same type category,
 select that category; otherwise raise an error because
 the correct choice cannot be deduced without more clues.  Also note whether
 any of the candidates accept a preferred datatype within the selected category.
 Now discard operator candidates that do not accept the selected type category;
 furthermore, if any candidate accepts a preferred type at a given argument
 position, discard candidates that accept non-preferred types for that
 argument.
 </para>
 </step>
 <step performance="required">
 <para>
-Choose the candidate with the most exact type matches, and which matches
+If only one candidate remains, use it.  If no candidate or more than one
-the "preferred type" for each column category from the previous step.
+candidate remains,
-If there is still more than one candidate, or if there are none,
+then raise an error.
 then throw an error.
 </para>
 </step>
 </substeps>
 </step>
 </procedure>
 </sect2>
 <sect2>
 <title>Examples</title>
@ -372,17 +388,12 @@ tgl=> SELECT 'abc' || 'def' AS "Unspecified";
 <para>
 In this case there is no initial hint for which type to use, since no types
 are specified in the query. So, the parser looks for all candidate operators
-and finds that all arguments for all the candidates are string types. It chooses
+and finds that there are candidates accepting both string-category and
-the "preferred type" for strings, <type>text</type>, for this query.
+bitstring-category inputs.  Since string category is preferred when available,
 that category is selected, and then the 
 "preferred type" for strings, <type>text</type>, is used as the specific
 type to resolve the unknown literals to.
 </para>
 <note>
 <para>
 If a user defines a new type and defines an operator "<literal>||</literal>" to work
 with it, then this query would no longer succeed as written. The parser would
 now have candidate types from two categories, and could not decide which to use.
 </para>
 </note>
 </sect3>
 <sect3>
@ -423,11 +434,13 @@ will try to oblige.
 <title>Functions</title>
 <procedure>
-<title>Function Evaluation</title>
+<title>Function Call Type Resolution</title>
 <step performance="required">
 <para>
 Check for an exact match in the pg_proc system catalog.
 (Cases involving <type>unknown</type> will never find a match at
 this step.)
 </para></step>
 <step performance="required">
 <para>
@ -436,38 +449,63 @@ Look for the best match.
 <substeps>
 <step performance="required">
 <para>
-Make a list of all functions of the same name with the same number of arguments.
+Make a list of all functions of the same name with the same number of
-</para></step>
+arguments for which the input types
 match or can be coerced to match.  (<type>unknown</type> literals are
 assumed to be coercible to anything for this purpose.)  If there is only
 one, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one function is in the list, use it if the input types can be coerced,
+Run through all candidates and keep those with the most exact matches
-and throw an error if the types cannot be coerced.
+on input types.  Keep all candidates if none have any exact matches.
-</para></step>
+If only one candidate remains, use it; else continue to the next step.
 </para>
 <step performance="required">
 <para>
-Keep all functions with the most explicit matches for types. Keep all if there
+Run through all candidates and keep those with the most exact or
-are no explicit matches and move to the next step.
+binary-compatible matches on input types.  Keep all candidates if none have
-If only one candidate remains, use it if the type can be coerced.
+any exact or binary-compatible matches.
-</para></step>
+If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If any input arguments are "unknown", categorize the input candidate arguments as
+Run through all candidates and keep those which accept preferred types at
-boolean, numeric, string, geometric, or user-defined. If there is a mix of
+the most positions where type coercion will be required.
-categories, or more than one user-defined type, throw an error because
+Keep all candidates if none accept preferred types.
-the correct choice cannot be deduced without more clues.
+If only one candidate remains, use it; else continue to the next step.
-If only one category is present, then assign the "preferred type"
+</para>
-to the input column which had been previously "unknown".
+</step>
 </para></step>
 <step performance="required">
 <para>
-Choose the candidate with the most exact type matches, and which matches
+If any input arguments are "unknown", check the type categories accepted
-the "preferred type" for each column category from the previous step.
+at those argument positions by the remaining candidates.  At each position,
-If there is still more than one candidate, or if there are none,
+select "string"
-then throw an error.
+category if any candidate accepts that category (this bias towards string
-</para></step>
+is appropriate since an unknown-type literal does look like a string).
 Otherwise, if all the remaining candidates accept the same type category,
 select that category; otherwise raise an error because
 the correct choice cannot be deduced without more clues.  Also note whether
 any of the candidates accept a preferred datatype within the selected category.
 Now discard operator candidates that do not accept the selected type category;
 furthermore, if any candidate accepts a preferred type at a given argument
 position, discard candidates that accept non-preferred types for that
 argument.
 </para>
 </step>
 <step performance="required">
 <para>
 If only one candidate remains, use it.  If no candidate or more than one
 candidate remains,
 then raise an error.
 </para>
 </step>
 </substeps>
 </step>
 </procedure>
 <sect2>
 <title>Examples</title>
@ -539,10 +577,10 @@ tgl=> select substr(text(varchar '1234'), 3);
 </para>
 <note>
 <para>
-There are some heuristics in the parser to optimize the relationship between the
+Actually, the parser is aware that <type>text</type> and <type>varchar</type>
-<type>char</type>, <type>varchar</type>, and <type>text</type> types.
+are "binary compatible", meaning that one can be passed to a function that
-For this case, <function>substr</function> is called directly with the <type>varchar</type> string
+accepts the other without doing any physical conversion.  Therefore, no
-rather than inserting an explicit conversion call.
+explicit type conversion call is really inserted in this case.
 </para>
 </note>
@ -564,6 +602,8 @@ tgl=> select substr(text(1234), 3);
     34
 (1 row)
 </programlisting>
 This succeeds because there is a conversion function text(int4) in the
 system catalog.
 </para>
 </sect3>
 </sect2>
@ -573,7 +613,7 @@ tgl=> select substr(text(1234), 3);
 <title>Query Targets</title>
 <procedure>
-<title>Target Evaluation</title>
+<title>Query Target Type Resolution</title>
 <step performance="required">
 <para>
@ -581,15 +621,21 @@ Check for an exact match with the target.
 </para></step>
 <step performance="required">
 <para>
-Try to coerce the expression directly to the target type if necessary.
+Otherwise, try to coerce the expression to the target type.  This will succeed
 if the two types are known binary-compatible, or if there is a conversion
 function.  If the expression is an unknown-type literal, the contents of
 the literal string will be fed to the input conversion routine for the target
 type.
 </para></step>
 <step performance="required">
 <para>
 If the target is a fixed-length type (e.g. <type>char</type> or <type>varchar</type>
-declared with a length) then try to find a sizing function of the same name
+declared with a length) then try to find a sizing function for the target
-as the type taking two arguments, the first the type name and the second an
+type.  A sizing function is a function of the same name as the type,
-integer length.
+taking two arguments of which the first is that type and the second is an
 integer, and returning the same type.  If one is found, it is applied,
 passing the column's declared length as the second parameter.
 </para></step>
 </procedure>
@ -615,30 +661,60 @@ tgl=> SELECT * FROM vv;
 abcd
 (1 row)
 </programlisting>
 What's really happened here is that the two unknown literals are resolved
 to text by default, allowing the <literal>||</literal> operator to be
 resolved as text concatenation.  Then the text result of the operator
 is coerced to varchar to match the target column type.  (But, since the
 parser knows that text and varchar are binary-compatible, this coercion
 is implicit and does not insert any real function call.)  Finally, the
 sizing function <literal>varchar(varchar,int4)</literal> is found in the system
 catalogs and applied to the operator's result and the stored column length.
 This type-specific function performs the desired truncation.
 </para>
 </sect3>
 </sect2>
 </sect1>
-<sect1 id="typeconv-union">
+<sect1 id="typeconv-union-case">
-<title>UNION Queries</title>
+<title>UNION and CASE Constructs</title>
 <para>
-The UNION construct is somewhat different in that it must match up
+The UNION and CASE constructs must match up possibly dissimilar types to
-possibly dissimilar types to become a single result set.
+become a single result set.  The resolution algorithm is applied separately to
 each output column of a UNION.  CASE uses the identical algorithm to match
 up its result expressions.
 </para>
 <procedure>
-<title>UNION Evaluation</title>
+<title>UNION and CASE Type Resolution</title>
 <step performance="required">
 <para>
-Check for identical types for all results.
+If all inputs are of type <type>unknown</type>, resolve as type
 <type>text</type> (the preferred type for string category).
 Otherwise, ignore the <type>unknown</type> inputs while choosing the type.
 </para></step>
 <step performance="required">
 <para>
-Coerce each result from the UNION clauses to match the type of the
+If the non-unknown inputs are not all of the same type category, raise an
-first SELECT clause or the target column.
+error.
 </para></step>
 <step performance="required">
 <para>
 If one or more non-unknown inputs are of a preferred type in that category,
 resolve as that type.
 </para></step>
 <step performance="required">
 <para>
 Otherwise, resolve as the type of the first non-unknown input.
 </para></step>
 <step performance="required">
 <para>
 Coerce all inputs to the selected type.
 </para></step>
 </procedure>
@ -657,6 +733,7 @@ tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
 b
 (2 rows)
 </programlisting>
 Here, the unknown-type literal 'b' will be resolved as type text.
 </para>
 </sect3>
@ -679,43 +756,26 @@ tgl=> SELECT 1.2 AS "Float8" UNION SELECT 1;
 <title>Transposed UNION</title>
 <para>
-The types of the union are forced to match the types of
+Here the output type of the union is forced to match the type of
 the first/top clause in the union:
 <programlisting>
 tgl=> SELECT 1 AS "All integers"
-tgl-> UNION SELECT '2.2'::float4
+tgl-> UNION SELECT '2.2'::float4;
 tgl-> UNION SELECT 3.3;
 All integers
 --------------
            1
            2
-            3
+(2 rows)
 (3 rows)
 </programlisting>
 </para>
 <para>
-An alternate parser strategy could be to choose the "best" type of the bunch, but
+Since float4 is not a preferred type, the parser sees no reason to select it
-this is more difficult because of the nice recursion technique used in the
+over int4, and instead falls back on the use-the-first-alternative rule.
-parser. However, the "best" type is used when selecting <emphasis>into</emphasis>
+This example demonstrates that the preferred-type mechanism doesn't encode
-a table:
+as much information as we'd like.  Future versions of
-
+<productname>Postgres</productname> may support a more general notion of
-<programlisting>
+type preferences.
 tgl=> CREATE TABLE ff (f float);
 CREATE
 tgl=> INSERT INTO ff
 tgl-> SELECT 1
 tgl-> UNION SELECT '2.2'::float4
 tgl-> UNION SELECT 3.3;
 INSERT 0 3
 tgl=> SELECT f AS "Floating point" from ff;
  Floating point
 ------------------
                1
 2.20000004768372
              3.3
 (3 rows)
 </programlisting>
 </para>
 </sect3>
 </sect2>