mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
1738 lines
56 KiB
Plaintext
1738 lines
56 KiB
Plaintext
<!--
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/xfunc.sgml,v 1.39 2001/10/26 19:58:12 tgl Exp $
|
|
-->
|
|
|
|
<chapter id="xfunc">
|
|
<title id="xfunc-title">Extending <acronym>SQL</acronym>: Functions</title>
|
|
|
|
<sect1 id="xfunc-intro">
|
|
<title>Introduction</title>
|
|
|
|
<comment>
|
|
Historically, functions were perhaps considered a tool for creating
|
|
types. Today, few people build their own types but many write
|
|
their own functions. This introduction ought to be changed to
|
|
reflect this.
|
|
</comment>
|
|
|
|
<para>
|
|
As it turns out, part of defining a new type is the
|
|
definition of functions that describe its behavior.
|
|
Consequently, while it is possible to define a new
|
|
function without defining a new type, the reverse is
|
|
not true. We therefore describe how to add new functions
|
|
to <productname>Postgres</productname> before describing
|
|
how to add new types.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> provides four kinds of
|
|
functions:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
query language functions
|
|
(functions written in <acronym>SQL</acronym>)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
procedural language
|
|
functions (functions written in, for example, <application>PL/Tcl</> or <application>PL/pgSQL</>)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
internal functions
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
C language functions
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Every kind
|
|
of function can take a base type, a composite type or
|
|
some combination as arguments (parameters). In addition,
|
|
every kind of function can return a base type or
|
|
a composite type. It's easiest to define <acronym>SQL</acronym>
|
|
functions, so we'll start with those. Examples in this section
|
|
can also be found in <filename>funcs.sql</filename>
|
|
and <filename>funcs.c</filename> in the tutorial directory.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="xfunc-sql">
|
|
<title>Query Language (<acronym>SQL</acronym>) Functions</title>
|
|
|
|
<para>
|
|
SQL functions execute an arbitrary list of SQL statements, returning
|
|
the results of the last query in the list. In the simple (non-set)
|
|
case, the first row of the last query's result will be returned.
|
|
(Bear in mind that <quote>the first row</quote> is not well-defined
|
|
unless you use <literal>ORDER BY</>.) If the last query happens
|
|
to return no rows at all, NULL will be returned.
|
|
</para>
|
|
|
|
<para>
|
|
Alternatively, an SQL function may be declared to return a set,
|
|
by specifying the function's return type
|
|
as <literal>SETOF</literal> <replaceable>sometype</>. In this case
|
|
all rows of the last query's result are returned. Further details
|
|
appear below.
|
|
</para>
|
|
|
|
<para>
|
|
The body of an SQL function should be a list of one or more SQL
|
|
statements separated by semicolons. Note that because the syntax
|
|
of the <command>CREATE FUNCTION</command> command requires the body of the
|
|
function to be enclosed in single quotes, single quote marks
|
|
(<literal>'</>) used
|
|
in the body of the function must be escaped, by writing two single
|
|
quotes (<literal>''</>) or a backslash (<literal>\'</>) where each
|
|
quote is desired.
|
|
</para>
|
|
|
|
<para>
|
|
Arguments to the SQL function may be referenced in the function
|
|
body using the syntax <literal>$<replaceable>n</></>: $1 refers to
|
|
the first argument, $2 to the second, and so on. If an argument
|
|
is of a composite type, then the <quote>dot notation</quote>,
|
|
e.g., <literal>$1.emp</literal>, may be used to access attributes
|
|
of the argument.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Examples</title>
|
|
|
|
<para>
|
|
To illustrate a simple SQL function, consider the following,
|
|
which might be used to debit a bank account:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION tp1 (integer, numeric) RETURNS integer AS '
|
|
UPDATE bank
|
|
SET balance = balance - $2
|
|
WHERE accountno = $1;
|
|
SELECT 1;
|
|
' LANGUAGE SQL;
|
|
</programlisting>
|
|
|
|
A user could execute this function to debit account 17 by $100.00 as
|
|
follows:
|
|
|
|
<programlisting>
|
|
SELECT tp1(17, 100.0);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
In practice one would probably like a more useful result from the
|
|
function than a constant <quote>1</>, so a more likely definition
|
|
is
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION tp1 (integer, numeric) RETURNS numeric AS '
|
|
UPDATE bank
|
|
SET balance = balance - $2
|
|
WHERE accountno = $1;
|
|
SELECT balance FROM bank WHERE accountno = $1;
|
|
' LANGUAGE SQL;
|
|
</programlisting>
|
|
|
|
which adjusts the balance and returns the new balance.
|
|
</para>
|
|
|
|
<para>
|
|
Any collection of commands in the <acronym>SQL</acronym>
|
|
language can be packaged together and defined as a function.
|
|
The commands can include data modification (i.e.,
|
|
<command>INSERT</command>, <command>UPDATE</command>, and
|
|
<command>DELETE</command>) as well
|
|
as <command>SELECT</command> queries. However, the final command
|
|
must be a <command>SELECT</command> that returns whatever is
|
|
specified as the function's return type.
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION clean_EMP () RETURNS integer AS '
|
|
DELETE FROM EMP
|
|
WHERE EMP.salary <= 0;
|
|
SELECT 1 AS ignore_this;
|
|
' LANGUAGE SQL;
|
|
|
|
SELECT clean_EMP();
|
|
</programlisting>
|
|
|
|
<screen>
|
|
x
|
|
---
|
|
1
|
|
</screen>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title><acronym>SQL</acronym> Functions on Base Types</title>
|
|
|
|
<para>
|
|
The simplest possible <acronym>SQL</acronym> function has no arguments and
|
|
simply returns a base type, such as <type>integer</type>:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION one() RETURNS integer AS '
|
|
SELECT 1 as RESULT;
|
|
' LANGUAGE SQL;
|
|
|
|
SELECT one();
|
|
</programlisting>
|
|
|
|
<screen>
|
|
one
|
|
-----
|
|
1
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
Notice that we defined a column alias within the function body for the result of the function
|
|
(with the name <literal>RESULT</>), but this column alias is not visible
|
|
outside the function. Hence, the result is labelled <literal>one</>
|
|
instead of <literal>RESULT</>.
|
|
</para>
|
|
|
|
<para>
|
|
It is almost as easy to define <acronym>SQL</acronym> functions
|
|
that take base types as arguments. In the example below, notice
|
|
how we refer to the arguments within the function as <literal>$1</>
|
|
and <literal>$2</>:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION add_em(integer, integer) RETURNS integer AS '
|
|
SELECT $1 + $2;
|
|
' LANGUAGE SQL;
|
|
|
|
SELECT add_em(1, 2) AS answer;
|
|
</programlisting>
|
|
|
|
<screen>
|
|
answer
|
|
--------
|
|
3
|
|
</screen>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title><acronym>SQL</acronym> Functions on Composite Types</title>
|
|
|
|
<para>
|
|
When specifying functions with arguments of composite
|
|
types, we must not only specify which
|
|
argument we want (as we did above with <literal>$1</> and <literal>$2</literal>) but
|
|
also the attributes of that argument. For example, suppose that
|
|
<type>EMP</type> is a table containing employee data, and therefore
|
|
also the name of the composite type of each row of the table. Here
|
|
is a function <function>double_salary</function> that computes what your
|
|
salary would be if it were doubled:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION double_salary(EMP) RETURNS integer AS '
|
|
SELECT $1.salary * 2 AS salary;
|
|
' LANGUAGE SQL;
|
|
|
|
SELECT name, double_salary(EMP) AS dream
|
|
FROM EMP
|
|
WHERE EMP.cubicle ~= point '(2,1)';
|
|
</programlisting>
|
|
|
|
<screen>
|
|
name | dream
|
|
------+-------
|
|
Sam | 2400
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
Notice the use of the syntax <literal>$1.salary</literal>
|
|
to select one field of the argument row value. Also notice
|
|
how the calling SELECT command uses a table name to denote
|
|
the entire current row of that table as a composite value.
|
|
</para>
|
|
|
|
<para>
|
|
It is also possible to build a function that returns a composite type.
|
|
(However, as we'll see below, there are some
|
|
unfortunate restrictions on how the function may be used.)
|
|
This is an example of a function
|
|
that returns a single <type>EMP</type> row:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION new_emp() RETURNS EMP AS '
|
|
SELECT text ''None'' AS name,
|
|
1000 AS salary,
|
|
25 AS age,
|
|
point ''(2,2)'' AS cubicle;
|
|
' LANGUAGE SQL;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
In this case we have specified each of the attributes
|
|
with a constant value, but any computation or expression
|
|
could have been substituted for these constants.
|
|
Note two important things about defining the function:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The target list order must be exactly the same as
|
|
that in which the columns appear in the table associated
|
|
with the composite type.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
You must typecast the expressions to match the
|
|
definition of the composite type, or you will get errors like this:
|
|
<screen>
|
|
<computeroutput>
|
|
ERROR: function declared to return emp returns varchar instead of text at column 1
|
|
</computeroutput>
|
|
</screen>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
In the present release of <productname>PostgreSQL</productname>
|
|
there are some unpleasant restrictions on how functions returning
|
|
composite types can be used. Briefly, when calling a function that
|
|
returns a row, we cannot retrieve the entire row. We must either
|
|
project a single attribute out of the row or pass the entire row into
|
|
another function. (Trying to display the entire row value will yield
|
|
a meaningless number.) For example,
|
|
|
|
<programlisting>
|
|
SELECT name(new_emp());
|
|
</programlisting>
|
|
|
|
<screen>
|
|
name
|
|
------
|
|
None
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
This example makes use of the
|
|
function notation for projecting attributes. The simple way
|
|
to explain this is that we can usually use the
|
|
notations <literal>attribute(table)</> and <literal>table.attribute</>
|
|
interchangeably:
|
|
|
|
<programlisting>
|
|
--
|
|
-- this is the same as:
|
|
-- SELECT EMP.name AS youngster FROM EMP WHERE EMP.age < 30
|
|
--
|
|
SELECT name(EMP) AS youngster
|
|
FROM EMP
|
|
WHERE age(EMP) < 30;
|
|
</programlisting>
|
|
|
|
<screen>
|
|
youngster
|
|
-----------
|
|
Sam
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
The reason why, in general, we must use the function
|
|
syntax for projecting attributes of function return
|
|
values is that the parser just doesn't understand
|
|
the dot syntax for projection when combined
|
|
with function calls.
|
|
|
|
<screen>
|
|
SELECT new_emp().name AS nobody;
|
|
ERROR: parser: parse error at or near "."
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
Another way to use a function returning a row result is to declare a
|
|
second function accepting a rowtype parameter, and pass the function
|
|
result to it:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION getname(emp) RETURNS text AS
|
|
'SELECT $1.name;'
|
|
LANGUAGE SQL;
|
|
</programlisting>
|
|
|
|
<screen>
|
|
SELECT getname(new_emp());
|
|
getname
|
|
---------
|
|
None
|
|
(1 row)
|
|
</screen>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title><acronym>SQL</acronym> Functions Returning Sets</title>
|
|
|
|
<para>
|
|
As previously mentioned, an SQL function may be declared as
|
|
returning <literal>SETOF</literal> <replaceable>sometype</>.
|
|
In this case the function's final SELECT query is executed to
|
|
completion, and each row it outputs is returned as an element
|
|
of the set.
|
|
</para>
|
|
|
|
<para>
|
|
Functions returning sets may only be called in the target list
|
|
of a SELECT query. For each row that the SELECT generates by itself,
|
|
the function returning set is invoked, and an output row is generated
|
|
for each element of the function's result set. An example:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION listchildren(text) RETURNS SETOF text AS
|
|
'SELECT name FROM nodes WHERE parent = $1'
|
|
LANGUAGE SQL;
|
|
</programlisting>
|
|
|
|
<screen>
|
|
SELECT * FROM nodes;
|
|
name | parent
|
|
-----------+--------
|
|
Top |
|
|
Child1 | Top
|
|
Child2 | Top
|
|
Child3 | Top
|
|
SubChild1 | Child1
|
|
SubChild2 | Child1
|
|
(6 rows)
|
|
|
|
SELECT listchildren('Top');
|
|
listchildren
|
|
--------------
|
|
Child1
|
|
Child2
|
|
Child3
|
|
(3 rows)
|
|
|
|
SELECT name, listchildren(name) FROM nodes;
|
|
name | listchildren
|
|
--------+--------------
|
|
Top | Child1
|
|
Top | Child2
|
|
Top | Child3
|
|
Child1 | SubChild1
|
|
Child1 | SubChild2
|
|
(5 rows)
|
|
</screen>
|
|
|
|
Notice that no output row appears for Child2, Child3, etc.
|
|
This happens because listchildren() returns an empty set
|
|
for those inputs, so no output rows are generated.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="xfunc-pl">
|
|
<title>Procedural Language Functions</title>
|
|
|
|
<para>
|
|
Procedural languages aren't built into the <productname>PostgreSQL</productname> server; they are offered
|
|
by loadable modules. Please refer to the documentation of the
|
|
procedural language in question for details about the syntax and how the function body
|
|
is interpreted for each language.
|
|
</para>
|
|
|
|
<para>
|
|
There are currently four procedural languages available in the
|
|
standard <productname>PostgreSQL</productname> distribution:
|
|
<application>PL/pgSQL</application>, <application>PL/Tcl</application>,
|
|
<application>PL/Perl</application>, and <application>PL/Python</application>. Other languages can be
|
|
defined by users. Refer to <xref linkend="xplang"> for more
|
|
information. The basics of developing a new procedural language are covered in <xref linkend="xfunc-plhandler">.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="xfunc-internal">
|
|
<title>Internal Functions</title>
|
|
|
|
<para>
|
|
Internal functions are functions written in C that have been statically
|
|
linked into the <productname>PostgreSQL</productname> server.
|
|
The <quote>body</quote> of the function definition
|
|
specifies the C-language name of the function, which need not be the
|
|
same as the name being declared for SQL use.
|
|
(For reasons of backwards compatibility, an empty body
|
|
is accepted as meaning that the C-language function name is the
|
|
same as the SQL name.)
|
|
</para>
|
|
|
|
<para>
|
|
Normally, all internal functions present in the
|
|
backend are declared during the initialization of the database cluster (<command>initdb</command>),
|
|
but a user could use <command>CREATE FUNCTION</command>
|
|
to create additional alias names for an internal function.
|
|
Internal functions are declared in <command>CREATE FUNCTION</command>
|
|
with language name <literal>internal</literal>. For instance, to
|
|
create an alias for the <function>sqrt</function> function:
|
|
<programlisting>
|
|
CREATE FUNCTION square_root(double precision) RETURNS double precision
|
|
AS 'dsqrt'
|
|
LANGUAGE INTERNAL
|
|
WITH (isStrict);
|
|
</programlisting>
|
|
(Most internal functions expect to be declared <quote>strict</quote>.)
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Not all <quote>predefined</quote> functions are
|
|
<quote>internal</quote> in the above sense. Some predefined
|
|
functions are written in SQL.
|
|
</para>
|
|
</note>
|
|
</sect1>
|
|
|
|
<sect1 id="xfunc-c">
|
|
<title>C Language Functions</title>
|
|
|
|
<para>
|
|
User-defined functions can be written in C (or a language that can
|
|
be made compatible with C, such as C++). Such functions are
|
|
compiled into dynamically loadable objects (also called shared
|
|
libraries) and are loaded by the server on demand. The dynamic
|
|
loading feature is what distinguishes <quote>C language</> functions
|
|
from <quote>internal</> functions --- the actual coding conventions
|
|
are essentially the same for both. (Hence, the standard internal
|
|
function library is a rich source of coding examples for user-defined
|
|
C functions.)
|
|
</para>
|
|
|
|
<para>
|
|
Two different calling conventions are currently used for C functions.
|
|
The newer <quote>version 1</quote> calling convention is indicated by writing
|
|
a <literal>PG_FUNCTION_INFO_V1()</literal> macro call for the function,
|
|
as illustrated below. Lack of such a macro indicates an old-style
|
|
("version 0") function. The language name specified in <command>CREATE FUNCTION</command>
|
|
is <literal>C</literal> in either case. Old-style functions are now deprecated
|
|
because of portability problems and lack of functionality, but they
|
|
are still supported for compatibility reasons.
|
|
</para>
|
|
|
|
<sect2 id="xfunc-c-dynload">
|
|
<title>Dynamic Loading</title>
|
|
|
|
<para>
|
|
The first time a user-defined function in a particular
|
|
loadable object file is called in a backend session,
|
|
the dynamic loader loads that object file into memory so that the
|
|
function can be called. The <command>CREATE FUNCTION</command>
|
|
for a user-defined C function must therefore specify two pieces of
|
|
information for the function: the name of the loadable
|
|
object file, and the C name (link symbol) of the specific function to call
|
|
within that object file. If the C name is not explicitly specified then
|
|
it is assumed to be the same as the SQL function name.
|
|
</para>
|
|
|
|
<para>
|
|
The following algorithm is used to locate the shared object file
|
|
based on the name given in the <command>CREATE FUNCTION</command>
|
|
command:
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>
|
|
If the name is an absolute path, the given file is loaded.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
If the name starts with the string <literal>$libdir</literal>,
|
|
that part is replaced by the PostgreSQL package library directory
|
|
name, which is determined at build time.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
If the name does not contain a directory part, the file is
|
|
searched for in the path specified by the configuration variable
|
|
<varname>dynamic_library_path</varname>.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Otherwise (the file was not found in the path, or it contains a
|
|
non-absolute directory part), the dynamic loader will try to
|
|
take the name as given, which will most likely fail. (It is
|
|
unreliable to depend on the current working directory.)
|
|
</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
If this sequence does not work, the platform-specific shared
|
|
library file name extension (often <filename>.so</filename>) is
|
|
appended to the given name and this sequence is tried again. If
|
|
that fails as well, the load will fail.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The user id the <application>PostgreSQL</application> server runs
|
|
as must be able to traverse the path to the file you intend to
|
|
load. Making the file or a higher-level directory not readable
|
|
and/or not executable by the <quote>postgres</quote> user is a
|
|
common mistake.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
In any case, the file name that is given in the
|
|
<command>CREATE FUNCTION</command> command is recorded literally
|
|
in the system catalogs, so if the file needs to be loaded again
|
|
the same procedure is applied.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
<application>PostgreSQL</application> will not compile a C function
|
|
automatically. The object file must be compiled before it is referenced
|
|
in a <command>CREATE
|
|
FUNCTION</> command. See <xref linkend="dfunc"> for additional
|
|
information.
|
|
</para>
|
|
</note>
|
|
|
|
<note>
|
|
<para>
|
|
After it is used for the first time, a dynamically loaded object
|
|
file is retained in memory. Future calls in the same session to the
|
|
function(s) in that file will only incur the small overhead of a symbol
|
|
table lookup. If you need to force a reload of an object file, for
|
|
example after recompiling it, use the <command>LOAD</> command or
|
|
begin a fresh session.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
It is recommended to locate shared libraries either relative to
|
|
<literal>$libdir</literal> or through the dynamic library path.
|
|
This simplifies version upgrades if the new installation is at a
|
|
different location. The actual directory that
|
|
<literal>$libdir</literal> stands for can be found out with the
|
|
command <literal>pg_config --pkglibdir</literal>.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Before <application>PostgreSQL</application> release 7.2, only exact
|
|
absolute paths to object files could be specified in <command>CREATE
|
|
FUNCTION</>. This approach is now deprecated since it makes the
|
|
function definition unnecessarily unportable. It's best to specify
|
|
just the shared library name with no path nor extension, and let
|
|
the search mechanism provide that information instead.
|
|
</para>
|
|
</note>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Base Types in C-Language Functions</title>
|
|
|
|
<para>
|
|
<xref linkend="xfunc-c-type-table"> gives the C type required for
|
|
parameters in the C functions that will be loaded into Postgres.
|
|
The <quote>Defined In</quote> column gives the header file that
|
|
needs to be included to get the type definition. (The actual
|
|
definition may be in a different file that is included by the
|
|
listed file. It is recommended that users stick to the defined
|
|
interface.) Note that you should always include
|
|
<filename>postgres.h</filename> first in any source file, because
|
|
it declares a number of things that you will need anyway.
|
|
</para>
|
|
|
|
<table tocentry="1" id="xfunc-c-type-table">
|
|
<title>Equivalent C Types
|
|
for Built-In <productname>PostgreSQL</productname> Types</title>
|
|
<titleabbrev>Equivalent C Types</titleabbrev>
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry>
|
|
SQL Type
|
|
</entry>
|
|
<entry>
|
|
C Type
|
|
</entry>
|
|
<entry>
|
|
Defined In
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><type>abstime</type></entry>
|
|
<entry><type>AbsoluteTime</type></entry>
|
|
<entry><filename>utils/nabstime.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>boolean</type></entry>
|
|
<entry><type>bool</type></entry>
|
|
<entry><filename>postgres.h</filename> (maybe compiler built-in)</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>box</type></entry>
|
|
<entry><type>BOX*</type></entry>
|
|
<entry><filename>utils/geo-decls.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>bytea</type></entry>
|
|
<entry><type>bytea*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>"char"</type></entry>
|
|
<entry><type>char</type></entry>
|
|
<entry>(compiler built-in)</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>character</type></entry>
|
|
<entry><type>BpChar*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>cid</type></entry>
|
|
<entry><type>CommandId</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>date</type></entry>
|
|
<entry><type>DateADT</type></entry>
|
|
<entry><filename>utils/date.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>smallint</type> (<type>int2</type>)</entry>
|
|
<entry><type>int2</type> or <type>int16</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>int2vector</type></entry>
|
|
<entry><type>int2vector*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>integer</type> (<type>int4</type>)</entry>
|
|
<entry><type>int4</type> or <type>int32</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>real</type> (<type>float4</type>)</entry>
|
|
<entry><type>float4*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>double precision</type> (<type>float8</type>)</entry>
|
|
<entry><type>float8*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>interval</type></entry>
|
|
<entry><type>Interval*</type></entry>
|
|
<entry><filename>utils/timestamp.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>lseg</type></entry>
|
|
<entry><type>LSEG*</type></entry>
|
|
<entry><filename>utils/geo-decls.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>name</type></entry>
|
|
<entry><type>Name</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>oid</type></entry>
|
|
<entry><type>Oid</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>oidvector</type></entry>
|
|
<entry><type>oidvector*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>path</type></entry>
|
|
<entry><type>PATH*</type></entry>
|
|
<entry><filename>utils/geo-decls.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>point</type></entry>
|
|
<entry><type>POINT*</type></entry>
|
|
<entry><filename>utils/geo-decls.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>regproc</type></entry>
|
|
<entry><type>regproc</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>reltime</type></entry>
|
|
<entry><type>RelativeTime</type></entry>
|
|
<entry><filename>utils/nabstime.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>text</type></entry>
|
|
<entry><type>text*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>tid</type></entry>
|
|
<entry><type>ItemPointer</type></entry>
|
|
<entry><filename>storage/itemptr.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>time</type></entry>
|
|
<entry><type>TimeADT</type></entry>
|
|
<entry><filename>utils/date.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>time with time zone</type></entry>
|
|
<entry><type>TimeTzADT</type></entry>
|
|
<entry><filename>utils/date.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>timestamp</type></entry>
|
|
<entry><type>Timestamp*</type></entry>
|
|
<entry><filename>utils/timestamp.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>tinterval</type></entry>
|
|
<entry><type>TimeInterval</type></entry>
|
|
<entry><filename>utils/nabstime.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>varchar</type></entry>
|
|
<entry><type>VarChar*</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>xid</type></entry>
|
|
<entry><type>TransactionId</type></entry>
|
|
<entry><filename>postgres.h</filename></entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
Internally, <productname>Postgres</productname> regards a
|
|
base type as a <quote>blob of memory</quote>. The user-defined
|
|
functions that you define over a type in turn define the
|
|
way that <productname>Postgres</productname> can operate
|
|
on it. That is, <productname>Postgres</productname> will
|
|
only store and retrieve the data from disk and use your
|
|
user-defined functions to input, process, and output the data.
|
|
Base types can have one of three internal formats:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
pass by value, fixed-length
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
pass by reference, fixed-length
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
pass by reference, variable-length
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
By-value types can only be 1, 2 or 4 bytes in length
|
|
(also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
|
|
You should be careful
|
|
to define your types such that they will be the same
|
|
size (in bytes) on all architectures. For example, the
|
|
<literal>long</literal> type is dangerous because it
|
|
is 4 bytes on some machines and 8 bytes on others, whereas
|
|
<type>int</type> type is 4 bytes on most
|
|
Unix machines. A reasonable implementation of
|
|
the <type>int4</type> type on Unix
|
|
machines might be:
|
|
|
|
<programlisting>
|
|
/* 4-byte integer, passed by value */
|
|
typedef int int4;
|
|
</programlisting>
|
|
|
|
<productname>PostgreSQL</productname> automatically figures
|
|
things out so that the integer types really have the size they
|
|
advertise.
|
|
</para>
|
|
|
|
<para>
|
|
On the other hand, fixed-length types of any size may
|
|
be passed by-reference. For example, here is a sample
|
|
implementation of a <productname>PostgreSQL</productname> type:
|
|
|
|
<programlisting>
|
|
/* 16-byte structure, passed by reference */
|
|
typedef struct
|
|
{
|
|
double x, y;
|
|
} Point;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Only pointers to such types can be used when passing
|
|
them in and out of <productname>Postgres</productname> functions.
|
|
To return a value of such a type, allocate the right amount of
|
|
memory with <literal>palloc()</literal>, fill in the allocated memory,
|
|
and return a pointer to it. (Alternatively, you can return an input
|
|
value of the same type by returning its pointer. <emphasis>Never</>
|
|
modify the contents of a pass-by-reference input value, however.)
|
|
</para>
|
|
|
|
<para>
|
|
Finally, all variable-length types must also be passed
|
|
by reference. All variable-length types must begin
|
|
with a length field of exactly 4 bytes, and all data to
|
|
be stored within that type must be located in the memory
|
|
immediately following that length field. The
|
|
length field is the total length of the structure
|
|
(i.e., it includes the size of the length field
|
|
itself). We can define the text type as follows:
|
|
|
|
<programlisting>
|
|
typedef struct {
|
|
int4 length;
|
|
char data[1];
|
|
} text;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Obviously, the data field shown here is not long enough to hold
|
|
all possible strings; it's impossible to declare such
|
|
a structure in <acronym>C</acronym>. When manipulating
|
|
variable-length types, we must be careful to allocate
|
|
the correct amount of memory and initialize the length field.
|
|
For example, if we wanted to store 40 bytes in a text
|
|
structure, we might use a code fragment like this:
|
|
|
|
<programlisting>
|
|
#include "postgres.h"
|
|
...
|
|
char buffer[40]; /* our source data */
|
|
...
|
|
text *destination = (text *) palloc(VARHDRSZ + 40);
|
|
destination->length = VARHDRSZ + 40;
|
|
memmove(destination->data, buffer, 40);
|
|
...
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Now that we've gone over all of the possible structures
|
|
for base types, we can show some examples of real functions.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Version-0 Calling Conventions for C-Language Functions</title>
|
|
|
|
<para>
|
|
We present the <quote>old style</quote> calling convention first --- although
|
|
this approach is now deprecated, it's easier to get a handle on
|
|
initially. In the version-0 method, the arguments and result
|
|
of the C function are just declared in normal C style, but being
|
|
careful to use the C representation of each SQL data type as shown
|
|
above.
|
|
</para>
|
|
|
|
<para>
|
|
Here are some examples:
|
|
|
|
<programlisting>
|
|
#include "postgres.h"
|
|
#include <string.h>
|
|
|
|
/* By Value */
|
|
|
|
int
|
|
add_one(int arg)
|
|
{
|
|
return arg + 1;
|
|
}
|
|
|
|
/* By Reference, Fixed Length */
|
|
|
|
float8 *
|
|
add_one_float8(float8 *arg)
|
|
{
|
|
float8 *result = (float8 *) palloc(sizeof(float8));
|
|
|
|
*result = *arg + 1.0;
|
|
|
|
return result;
|
|
}
|
|
|
|
Point *
|
|
makepoint(Point *pointx, Point *pointy)
|
|
{
|
|
Point *new_point = (Point *) palloc(sizeof(Point));
|
|
|
|
new_point->x = pointx->x;
|
|
new_point->y = pointy->y;
|
|
|
|
return new_point;
|
|
}
|
|
|
|
/* By Reference, Variable Length */
|
|
|
|
text *
|
|
copytext(text *t)
|
|
{
|
|
/*
|
|
* VARSIZE is the total size of the struct in bytes.
|
|
*/
|
|
text *new_t = (text *) palloc(VARSIZE(t));
|
|
VARATT_SIZEP(new_t) = VARSIZE(t);
|
|
/*
|
|
* VARDATA is a pointer to the data region of the struct.
|
|
*/
|
|
memcpy((void *) VARDATA(new_t), /* destination */
|
|
(void *) VARDATA(t), /* source */
|
|
VARSIZE(t)-VARHDRSZ); /* how many bytes */
|
|
return new_t;
|
|
}
|
|
|
|
text *
|
|
concat_text(text *arg1, text *arg2)
|
|
{
|
|
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
|
|
text *new_text = (text *) palloc(new_text_size);
|
|
|
|
VARATT_SIZEP(new_text) = new_text_size;
|
|
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
|
|
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
|
|
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
|
|
return new_text;
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Supposing that the above code has been prepared in file
|
|
<filename>funcs.c</filename> and compiled into a shared object,
|
|
we could define the functions to <productname>Postgres</productname>
|
|
with commands like this:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION add_one(int4) RETURNS int4
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE 'c'
|
|
WITH (isStrict);
|
|
|
|
-- note overloading of SQL function name add_one()
|
|
CREATE FUNCTION add_one(float8) RETURNS float8
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs',
|
|
'add_one_float8'
|
|
LANGUAGE 'c' WITH (isStrict);
|
|
|
|
CREATE FUNCTION makepoint(point, point) RETURNS point
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE 'c'
|
|
WITH (isStrict);
|
|
|
|
CREATE FUNCTION copytext(text) RETURNS text
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE 'c'
|
|
WITH (isStrict);
|
|
|
|
CREATE FUNCTION concat_text(text, text) RETURNS text
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/funcs' LANGUAGE 'c'
|
|
WITH (isStrict);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Here <replaceable>PGROOT</replaceable> stands for the full path to
|
|
the <productname>Postgres</productname> source tree. (Better style would
|
|
be to use just <literal>'funcs'</> in the <literal>AS</> clause,
|
|
after having added <replaceable>PGROOT</replaceable><literal>/tutorial</>
|
|
to the search path. In any case, we may omit the system-specific
|
|
extension for a shared library, commonly <literal>.so</literal> or
|
|
<literal>.sl</literal>.)
|
|
</para>
|
|
|
|
<para>
|
|
Notice that we have specified the functions as <quote>strict</quote>,
|
|
meaning that
|
|
the system should automatically assume a NULL result if any input
|
|
value is NULL. By doing this, we avoid having to check for NULL inputs
|
|
in the function code. Without this, we'd have to check for NULLs
|
|
explicitly, for example by checking for a null pointer for each
|
|
pass-by-reference argument. (For pass-by-value arguments, we don't
|
|
even have a way to check!)
|
|
</para>
|
|
|
|
<para>
|
|
Although this calling convention is simple to use,
|
|
it is not very portable; on some architectures there are problems
|
|
with passing smaller-than-int data types this way. Also, there is
|
|
no simple way to return a NULL result, nor to cope with NULL arguments
|
|
in any way other than making the function strict. The version-1
|
|
convention, presented next, overcomes these objections.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Version-1 Calling Conventions for C-Language Functions</title>
|
|
|
|
<para>
|
|
The version-1 calling convention relies on macros to suppress most
|
|
of the complexity of passing arguments and results. The C declaration
|
|
of a version-1 function is always
|
|
<programlisting>
|
|
Datum funcname(PG_FUNCTION_ARGS)
|
|
</programlisting>
|
|
In addition, the macro call
|
|
<programlisting>
|
|
PG_FUNCTION_INFO_V1(funcname);
|
|
</programlisting>
|
|
must appear in the same source file (conventionally it's written
|
|
just before the function itself). This macro call is not needed
|
|
for <literal>internal</>-language functions, since Postgres currently
|
|
assumes all internal functions are version-1. However, it is
|
|
<emphasis>required</emphasis> for dynamically-loaded functions.
|
|
</para>
|
|
|
|
<para>
|
|
In a version-1 function, each actual argument is fetched using a
|
|
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
|
|
macro that corresponds to the argument's datatype, and the result
|
|
is returned using a
|
|
<function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
|
|
macro for the return type.
|
|
</para>
|
|
|
|
<para>
|
|
Here we show the same functions as above, coded in version-1 style:
|
|
|
|
<programlisting>
|
|
#include "postgres.h"
|
|
#include <string.h>
|
|
#include "fmgr.h"
|
|
|
|
/* By Value */
|
|
|
|
PG_FUNCTION_INFO_V1(add_one);
|
|
|
|
Datum
|
|
add_one(PG_FUNCTION_ARGS)
|
|
{
|
|
int32 arg = PG_GETARG_INT32(0);
|
|
|
|
PG_RETURN_INT32(arg + 1);
|
|
}
|
|
|
|
/* By Reference, Fixed Length */
|
|
|
|
PG_FUNCTION_INFO_V1(add_one_float8);
|
|
|
|
Datum
|
|
add_one_float8(PG_FUNCTION_ARGS)
|
|
{
|
|
/* The macros for FLOAT8 hide its pass-by-reference nature */
|
|
float8 arg = PG_GETARG_FLOAT8(0);
|
|
|
|
PG_RETURN_FLOAT8(arg + 1.0);
|
|
}
|
|
|
|
PG_FUNCTION_INFO_V1(makepoint);
|
|
|
|
Datum
|
|
makepoint(PG_FUNCTION_ARGS)
|
|
{
|
|
/* Here, the pass-by-reference nature of Point is not hidden */
|
|
Point *pointx = PG_GETARG_POINT_P(0);
|
|
Point *pointy = PG_GETARG_POINT_P(1);
|
|
Point *new_point = (Point *) palloc(sizeof(Point));
|
|
|
|
new_point->x = pointx->x;
|
|
new_point->y = pointy->y;
|
|
|
|
PG_RETURN_POINT_P(new_point);
|
|
}
|
|
|
|
/* By Reference, Variable Length */
|
|
|
|
PG_FUNCTION_INFO_V1(copytext);
|
|
|
|
Datum
|
|
copytext(PG_FUNCTION_ARGS)
|
|
{
|
|
text *t = PG_GETARG_TEXT_P(0);
|
|
/*
|
|
* VARSIZE is the total size of the struct in bytes.
|
|
*/
|
|
text *new_t = (text *) palloc(VARSIZE(t));
|
|
VARATT_SIZEP(new_t) = VARSIZE(t);
|
|
/*
|
|
* VARDATA is a pointer to the data region of the struct.
|
|
*/
|
|
memcpy((void *) VARDATA(new_t), /* destination */
|
|
(void *) VARDATA(t), /* source */
|
|
VARSIZE(t)-VARHDRSZ); /* how many bytes */
|
|
PG_RETURN_TEXT_P(new_t);
|
|
}
|
|
|
|
PG_FUNCTION_INFO_V1(concat_text);
|
|
|
|
Datum
|
|
concat_text(PG_FUNCTION_ARGS)
|
|
{
|
|
text *arg1 = PG_GETARG_TEXT_P(0);
|
|
text *arg2 = PG_GETARG_TEXT_P(1);
|
|
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
|
|
text *new_text = (text *) palloc(new_text_size);
|
|
|
|
VARATT_SIZEP(new_text) = new_text_size;
|
|
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
|
|
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
|
|
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
|
|
PG_RETURN_TEXT_P(new_text);
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The <command>CREATE FUNCTION</command> commands are the same as
|
|
for the version-0 equivalents.
|
|
</para>
|
|
|
|
<para>
|
|
At first glance, the version-1 coding conventions may appear to
|
|
be just pointless obscurantism. However, they do offer a number
|
|
of improvements, because the macros can hide unnecessary detail.
|
|
An example is that in coding add_one_float8, we no longer need to
|
|
be aware that float8 is a pass-by-reference type. Another
|
|
example is that the GETARG macros for variable-length types hide
|
|
the need to deal with fetching <quote>toasted</quote> (compressed or
|
|
out-of-line) values. The old-style <function>copytext</function>
|
|
and <function>concat_text</function> functions shown above are
|
|
actually wrong in the presence of toasted values, because they
|
|
don't call <function>pg_detoast_datum()</function> on their
|
|
inputs. (The handler for old-style dynamically-loaded functions
|
|
currently takes care of this detail, but it does so less
|
|
efficiently than is possible for a version-1 function.)
|
|
</para>
|
|
|
|
<para>
|
|
One big improvement in version-1 functions is better handling of NULL
|
|
inputs and results. The macro <function>PG_ARGISNULL(n)</function>
|
|
allows a function to test whether each input is NULL (of course, doing
|
|
this is only necessary in functions not declared <quote>strict</>).
|
|
As with the
|
|
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
|
|
the input arguments are counted beginning at zero. Note that one
|
|
should refrain from executing
|
|
<function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
|
|
one has verified that the argument isn't NULL.
|
|
To return a NULL result, execute <function>PG_RETURN_NULL()</function>;
|
|
this works in both strict and non-strict functions.
|
|
</para>
|
|
|
|
<para>
|
|
The version-1 function call conventions make it possible to
|
|
return <quote>set</quote> results and implement trigger functions and
|
|
procedural-language call handlers. Version-1 code is also more
|
|
portable than version-0, because it does not break ANSI C restrictions
|
|
on function call protocol. For more details see
|
|
<filename>src/backend/utils/fmgr/README</filename> in the source
|
|
distribution.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Composite Types in C-Language Functions</title>
|
|
|
|
<para>
|
|
Composite types do not have a fixed layout like C
|
|
structures. Instances of a composite type may contain
|
|
null fields. In addition, composite types that are
|
|
part of an inheritance hierarchy may have different
|
|
fields than other members of the same inheritance hierarchy.
|
|
Therefore, <productname>Postgres</productname> provides
|
|
a procedural interface for accessing fields of composite types
|
|
from C. As <productname>Postgres</productname> processes
|
|
a set of rows, each row will be passed into your
|
|
function as an opaque structure of type <literal>TUPLE</literal>.
|
|
Suppose we want to write a function to answer the query
|
|
|
|
<programlisting>
|
|
SELECT name, c_overpaid(emp, 1500) AS overpaid
|
|
FROM emp
|
|
WHERE name = 'Bill' OR name = 'Sam';
|
|
</programlisting>
|
|
|
|
In the query above, we can define c_overpaid as:
|
|
|
|
<programlisting>
|
|
#include "postgres.h"
|
|
#include "executor/executor.h" /* for GetAttributeByName() */
|
|
|
|
bool
|
|
c_overpaid(TupleTableSlot *t, /* the current row of EMP */
|
|
int32 limit)
|
|
{
|
|
bool isnull;
|
|
int32 salary;
|
|
|
|
salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
|
|
if (isnull)
|
|
return (false);
|
|
return salary > limit;
|
|
}
|
|
|
|
/* In version-1 coding, the above would look like this: */
|
|
|
|
PG_FUNCTION_INFO_V1(c_overpaid);
|
|
|
|
Datum
|
|
c_overpaid(PG_FUNCTION_ARGS)
|
|
{
|
|
TupleTableSlot *t = (TupleTableSlot *) PG_GETARG_POINTER(0);
|
|
int32 limit = PG_GETARG_INT32(1);
|
|
bool isnull;
|
|
int32 salary;
|
|
|
|
salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
|
|
if (isnull)
|
|
PG_RETURN_BOOL(false);
|
|
/* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */
|
|
|
|
PG_RETURN_BOOL(salary > limit);
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<function>GetAttributeByName</function> is the
|
|
<productname>Postgres</productname> system function that
|
|
returns attributes out of the current row. It has
|
|
three arguments: the argument of type <type>TupleTableSlot*</type> passed into
|
|
the function, the name of the desired attribute, and a
|
|
return parameter that tells whether the attribute
|
|
is null. <function>GetAttributeByName</function> returns a Datum
|
|
value that you can convert to the proper datatype by using the
|
|
appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function> macro.
|
|
</para>
|
|
|
|
<para>
|
|
The following query lets <productname>Postgres</productname>
|
|
know about the <function>c_overpaid</function> function:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION c_overpaid(emp, int4)
|
|
RETURNS bool
|
|
AS '<replaceable>PGROOT</replaceable>/tutorial/obj/funcs'
|
|
LANGUAGE 'c';
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
While there are ways to construct new rows or modify
|
|
existing rows from within a C function, these
|
|
are far too complex to discuss in this manual.
|
|
Consult the backend source code for examples.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Writing Code</title>
|
|
|
|
<para>
|
|
We now turn to the more difficult task of writing
|
|
programming language functions. Be warned: this section
|
|
of the manual will not make you a programmer. You must
|
|
have a good understanding of <acronym>C</acronym>
|
|
(including the use of pointers and the malloc memory manager)
|
|
before trying to write <acronym>C</acronym> functions for
|
|
use with <productname>Postgres</productname>. While it may
|
|
be possible to load functions written in languages other
|
|
than <acronym>C</acronym> into <productname>Postgres</productname>,
|
|
this is often difficult (when it is possible at all)
|
|
because other languages, such as <acronym>FORTRAN</acronym>
|
|
and <acronym>Pascal</acronym> often do not follow the same
|
|
<firstterm>calling convention</firstterm>
|
|
as <acronym>C</acronym>. That is, other
|
|
languages do not pass argument and return values
|
|
between functions in the same way. For this reason, we
|
|
will assume that your programming language functions
|
|
are written in <acronym>C</acronym>.
|
|
</para>
|
|
|
|
<para>
|
|
The basic rules for building <acronym>C</acronym> functions
|
|
are as follows:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Use <literal>pg_config --includedir-server</literal> to find
|
|
out where the PostgreSQL server header files are installed on
|
|
your system (or the system that your users will be running
|
|
on). This option is new with PostgreSQL 7.2. For PostgreSQL
|
|
7.1 you should use the option <option>--includedir</option>.
|
|
(<command>pg_config</command> will exit with a non-zero status
|
|
if it encounters an unknown option.) For releases prior to
|
|
7.1 you will have to guess, but since that was before the
|
|
current calling conventions were introduced, it is unlikely
|
|
that you want to support those releases.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
When allocating memory, use the
|
|
<productname>Postgres</productname> routines
|
|
<function>palloc</function> and <function>pfree</function>
|
|
instead of the corresponding <acronym>C</acronym> library
|
|
routines <function>malloc</function> and
|
|
<function>free</function>. The memory allocated by
|
|
<function>palloc</function> will be freed automatically at the
|
|
end of each transaction, preventing memory leaks.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Always zero the bytes of your structures using
|
|
<function>memset</function> or <function>bzero</function>.
|
|
Several routines (such as the hash access method, hash join
|
|
and the sort algorithm) compute functions of the raw bits
|
|
contained in your structure. Even if you initialize all
|
|
fields of your structure, there may be several bytes of
|
|
alignment padding (holes in the structure) that may contain
|
|
garbage values.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Most of the internal <productname>Postgres</productname> types
|
|
are declared in <filename>postgres.h</filename>, while the function
|
|
manager interfaces (<symbol>PG_FUNCTION_ARGS</symbol>, etc.)
|
|
are in <filename>fmgr.h</filename>, so you will need to
|
|
include at least these two files. For portability reasons it's best
|
|
to include <filename>postgres.h</filename> <emphasis>first</>,
|
|
before any other system or user header files.
|
|
Including <filename>postgres.h</filename> will also include
|
|
<filename>elog.h</filename> and <filename>palloc.h</filename>
|
|
for you.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Symbol names defined within object files must not conflict
|
|
with each other or with symbols defined in the
|
|
<productname>PostgreSQL</productname> server executable. You
|
|
will have to rename your functions or variables if you get
|
|
error messages to this effect.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Compiling and linking your object code so that
|
|
it can be dynamically loaded into
|
|
<productname>Postgres</productname>
|
|
always requires special flags.
|
|
See <xref linkend="dfunc">
|
|
for a detailed explanation of how to do it for
|
|
your particular operating system.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect2>
|
|
|
|
&dfunc;
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="xfunc-overload">
|
|
<title>Function Overloading</title>
|
|
|
|
<para>
|
|
More than one function may be defined with the same SQL name, so long
|
|
as the arguments they take are different. In other words,
|
|
function names can be <firstterm>overloaded</firstterm>. When a
|
|
query is executed, the server will determine which function to
|
|
call from the data types and the number of the provided arguments.
|
|
Overloading can also be used to simulate functions with a variable
|
|
number of arguments, up to a finite maximum number.
|
|
</para>
|
|
|
|
<para>
|
|
A function may also have the same name as an attribute. In the case
|
|
that there is an ambiguity between a function on a complex type and
|
|
an attribute of the complex type, the attribute will always be used.
|
|
</para>
|
|
|
|
<para>
|
|
When creating a family of overloaded functions, one should be
|
|
careful not to create ambiguities. For instance, given the
|
|
functions
|
|
<programlisting>
|
|
CREATE FUNCTION test(int, real) RETURNS ...
|
|
CREATE FUNCTION test(smallint, double precision) RETURNS ...
|
|
</programlisting>
|
|
it is not immediately clear which function would be called with
|
|
some trivial input like <literal>test(1, 1.5)</literal>. The
|
|
currently implemented resolution rules are described in the
|
|
<citetitle>User's Guide</citetitle>, but it is unwise to design a
|
|
system that subtly relies on this behavior.
|
|
</para>
|
|
|
|
<para>
|
|
When overloading C language functions, there is an additional
|
|
constraint: The C name of each function in the family of
|
|
overloaded functions must be different from the C names of all
|
|
other functions, either internal or dynamically loaded. If this
|
|
rule is violated, the behavior is not portable. You might get a
|
|
run-time linker error, or one of the functions will get called
|
|
(usually the internal one). The alternative form of the
|
|
<literal>AS</> clause for the SQL <command>CREATE
|
|
FUNCTION</command> command decouples the SQL function name from
|
|
the function name in the C source code. E.g.,
|
|
<programlisting>
|
|
CREATE FUNCTION test(int) RETURNS int
|
|
AS '<replaceable>filename</>', 'test_1arg'
|
|
LANGUAGE C;
|
|
CREATE FUNCTION test(int, int) RETURNS int
|
|
AS '<replaceable>filename</>', 'test_2arg'
|
|
LANGUAGE C;
|
|
</programlisting>
|
|
The names of the C functions here reflect one of many possible conventions.
|
|
</para>
|
|
|
|
<para>
|
|
Prior to <productname>PostgreSQL</productname> 7.0, this
|
|
alternative syntax did not exist. There is a trick to get around
|
|
the problem, by defining a set of C functions with different names
|
|
and then define a set of identically-named SQL function wrappers
|
|
that take the appropriate argument types and call the matching C
|
|
function.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="xfunc-plhandler">
|
|
<title>Procedural Language Handlers</title>
|
|
|
|
<para>
|
|
All calls to functions that are written in a language other than
|
|
the current <quote>version 1</quote> interface for compiled
|
|
languages (this includes functions in user-defined procedural languages,
|
|
functions written in SQL, and functions using the version 0 compiled
|
|
language interface), go through a <firstterm>call handler</firstterm>
|
|
function for the specific language. It is the responsibility of
|
|
the call handler to execute the function in a meaningful way, such
|
|
as by interpreting the supplied source text. This section
|
|
describes how a language call handler can be written. This is not
|
|
a common task, in fact, it has only been done a handful of times
|
|
in the history of <productname>PostgreSQL</productname>, but the
|
|
topic naturally belongs in this chapter, and the material might
|
|
give some insight into the extensible nature of the
|
|
<productname>PostgreSQL</productname> system.
|
|
</para>
|
|
|
|
<para>
|
|
The call handler for a procedural language is a
|
|
<quote>normal</quote> function, which must be written in a
|
|
compiled language such as C and registered with
|
|
<productname>PostgreSQL</productname> as taking no arguments and
|
|
returning the <type>opaque</type> type, a placeholder for
|
|
unspecified or undefined types. This prevents the call handler
|
|
from being called directly as a function from queries. (However,
|
|
arguments may be supplied in the actual call to the handler when a
|
|
function in the language offered by the handler is to be
|
|
executed.)
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
In <productname>PostgreSQL</productname> 7.1 and later, call
|
|
handlers must adhere to the <quote>version 1</quote> function
|
|
manager interface, not the old-style interface.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
The call handler is called in the same way as any other function:
|
|
It receives a pointer to a
|
|
<structname>FunctionCallInfoData</structname> struct containing
|
|
argument values and information about the called function, and it
|
|
is expected to return a <type>Datum</type> result (and possibly
|
|
set the <structfield>isnull</structfield> field of the
|
|
<structname>FunctionCallInfoData</structname> struct, if it wishes
|
|
to return an SQL NULL result). The difference between a call
|
|
handler and an ordinary callee function is that the
|
|
<structfield>flinfo->fn_oid</structfield> field of the
|
|
<structname>FunctionCallInfoData</structname> struct will contain
|
|
the OID of the actual function to be called, not of the call
|
|
handler itself. The call handler must use this field to determine
|
|
which function to execute. Also, the passed argument list has
|
|
been set up according to the declaration of the target function,
|
|
not of the call handler.
|
|
</para>
|
|
|
|
<para>
|
|
It's up to the call handler to fetch the
|
|
<classname>pg_proc</classname> entry and to analyze the argument
|
|
and return types of the called procedure. The AS clause from the
|
|
<command>CREATE FUNCTION</command> of the procedure will be found
|
|
in the <literal>prosrc</literal> attribute of the
|
|
<classname>pg_proc</classname> table entry. This may be the source
|
|
text in the procedural language itself (like for PL/Tcl), a
|
|
path name to a file, or anything else that tells the call handler
|
|
what to do in detail.
|
|
</para>
|
|
|
|
<para>
|
|
Often, the same function is called many times per SQL statement.
|
|
A call handler can avoid repeated lookups of information about the
|
|
called function by using the
|
|
<structfield>flinfo->fn_extra</structfield> field. This will
|
|
initially be NULL, but can be set by the call handler to point at
|
|
information about the PL function. On subsequent calls, if
|
|
<structfield>flinfo->fn_extra</structfield> is already non-NULL
|
|
then it can be used and the information lookup step skipped. The
|
|
call handler must be careful that
|
|
<structfield>flinfo->fn_extra</structfield> is made to point at
|
|
memory that will live at least until the end of the current query,
|
|
since an <structname>FmgrInfo</structname> data structure could be
|
|
kept that long. One way to do this is to allocate the extra data
|
|
in the memory context specified by
|
|
<structfield>flinfo->fn_mcxt</structfield>; such data will
|
|
normally have the same lifespan as the
|
|
<structname>FmgrInfo</structname> itself. But the handler could
|
|
also choose to use a longer-lived context so that it can cache
|
|
function definition information across queries.
|
|
</para>
|
|
|
|
<para>
|
|
When a PL function is invoked as a trigger, no explicit arguments
|
|
are passed, but the
|
|
<structname>FunctionCallInfoData</structname>'s
|
|
<structfield>context</structfield> field points at a
|
|
<structname>TriggerData</structname> node, rather than being NULL
|
|
as it is in a plain function call. A language handler should
|
|
provide mechanisms for PL functions to get at the trigger
|
|
information.
|
|
</para>
|
|
|
|
<para>
|
|
This is a template for a PL handler written in C:
|
|
<programlisting>
|
|
#include "postgres.h"
|
|
#include "executor/spi.h"
|
|
#include "commands/trigger.h"
|
|
#include "utils/elog.h"
|
|
#include "fmgr.h"
|
|
#include "access/heapam.h"
|
|
#include "utils/syscache.h"
|
|
#include "catalog/pg_proc.h"
|
|
#include "catalog/pg_type.h"
|
|
|
|
PG_FUNCTION_INFO_V1(plsample_call_handler);
|
|
|
|
Datum
|
|
plsample_call_handler(PG_FUNCTION_ARGS)
|
|
{
|
|
Datum retval;
|
|
|
|
if (CALLED_AS_TRIGGER(fcinfo))
|
|
{
|
|
/*
|
|
* Called as a trigger procedure
|
|
*/
|
|
TriggerData *trigdata = (TriggerData *) fcinfo->context;
|
|
|
|
retval = ...
|
|
}
|
|
else {
|
|
/*
|
|
* Called as a function
|
|
*/
|
|
|
|
retval = ...
|
|
}
|
|
|
|
return retval;
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Only a few thousand lines of code have to be added instead of the
|
|
dots to complete the call handler. See <xref linkend="xfunc-c">
|
|
for information on how to compile it into a loadable module.
|
|
</para>
|
|
|
|
<para>
|
|
The following commands then register the sample procedural
|
|
language:
|
|
<programlisting>
|
|
CREATE FUNCTION plsample_call_handler () RETURNS opaque
|
|
AS '/usr/local/pgsql/lib/plsample'
|
|
LANGUAGE C;
|
|
CREATE LANGUAGE plsample
|
|
HANDLER plsample_call_handler;
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|