mirror of
https://github.com/postgres/postgres.git
synced 2025-05-02 11:44:50 +03:00
544 lines
18 KiB
Plaintext
544 lines
18 KiB
Plaintext
<!-- $PostgreSQL: pgsql/doc/src/sgml/plpython.sgml,v 1.33 2006/09/03 22:23:58 tgl Exp $ -->
|
|
|
|
<chapter id="plpython">
|
|
<title>PL/Python - Python Procedural Language</title>
|
|
|
|
<indexterm zone="plpython"><primary>PL/Python</></>
|
|
<indexterm zone="plpython"><primary>Python</></>
|
|
|
|
<para>
|
|
The <application>PL/Python</application> procedural language allows
|
|
<productname>PostgreSQL</productname> functions to be written in the
|
|
<ulink url="http://www.python.org">Python language</ulink>.
|
|
</para>
|
|
|
|
<para>
|
|
To install PL/Python in a particular database, use
|
|
<literal>createlang plpythonu <replaceable>dbname</></literal>.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
If a language is installed into <literal>template1</>, all subsequently
|
|
created databases will have the language installed automatically.
|
|
</para>
|
|
</tip>
|
|
|
|
<para>
|
|
As of <productname>PostgreSQL</productname> 7.4, PL/Python is only
|
|
available as an <quote>untrusted</> language (meaning it does not
|
|
offer any way of restricting what users can do in it). It has
|
|
therefore been renamed to <literal>plpythonu</>. The trusted
|
|
variant <literal>plpython</> may become available again in future,
|
|
if a new secure execution mechanism is developed in Python.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Users of source packages must specially enable the build of
|
|
PL/Python during the installation process. (Refer to the
|
|
installation instructions for more information.) Users of binary
|
|
packages might find PL/Python in a separate subpackage.
|
|
</para>
|
|
</note>
|
|
|
|
<sect1 id="plpython-funcs">
|
|
<title>PL/Python Functions</title>
|
|
|
|
<para>
|
|
Functions in PL/Python are declared via the standard <xref
|
|
linkend="sql-createfunction" endterm="sql-createfunction-title">
|
|
syntax:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION <replaceable>funcname</replaceable> (<replaceable>argument-list</replaceable>)
|
|
RETURNS <replaceable>return-type</replaceable>
|
|
AS $$
|
|
# PL/Python function body
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The body of a function is simply a Python script. When the function
|
|
is called, all unnamed arguments are passed as elements to the array
|
|
<varname>args[]</varname> and named arguments as ordinary variables to the
|
|
Python script. The result is returned from the Python code in the usual way,
|
|
with <literal>return</literal> or <literal>yield</literal> (in case of
|
|
a resultset statement).
|
|
</para>
|
|
|
|
<para>
|
|
For example, a function to return the greater of two integers can be
|
|
defined as:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION pymax (a integer, b integer)
|
|
RETURNS integer
|
|
AS $$
|
|
if a > b:
|
|
return a
|
|
return b
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
|
|
The Python code that is given as the body of the function definition
|
|
is transformed into a Python function. For example, the above results in
|
|
|
|
<programlisting>
|
|
def __plpython_procedure_pymax_23456():
|
|
if a > b:
|
|
return a
|
|
return b
|
|
</programlisting>
|
|
|
|
assuming that 23456 is the OID assigned to the function by
|
|
<productname>PostgreSQL</productname>.
|
|
</para>
|
|
|
|
<para>
|
|
The <productname>PostgreSQL</> function parameters are available in
|
|
the global <varname>args</varname> list. In the
|
|
<function>pymax</function> example, <varname>args[0]</varname> contains
|
|
whatever was passed in as the first argument and
|
|
<varname>args[1]</varname> contains the second argument's value. Alternatively,
|
|
one can use named parameters as shown in the example above. This greatly simplifies
|
|
the reading and writing of <application>PL/Python</application> code.
|
|
</para>
|
|
|
|
<para>
|
|
If an SQL null value<indexterm><primary>null value</primary><secondary
|
|
sortas="PL/Python">PL/Python</secondary></indexterm> is passed to a
|
|
function, the argument value will appear as <symbol>None</symbol> in
|
|
Python. The above function definition will return the wrong answer for null
|
|
inputs. We could add <literal>STRICT</literal> to the function definition
|
|
to make <productname>PostgreSQL</productname> do something more reasonable:
|
|
if a null value is passed, the function will not be called at all,
|
|
but will just return a null result automatically. Alternatively,
|
|
we could check for null inputs in the function body:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION pymax (a integer, b integer)
|
|
RETURNS integer
|
|
AS $$
|
|
if (a is None) or (b is None):
|
|
return None
|
|
if a > b:
|
|
return a
|
|
return b
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
|
|
As shown above, to return an SQL null value from a PL/Python
|
|
function, return the value <symbol>None</symbol>. This can be done whether the
|
|
function is strict or not.
|
|
</para>
|
|
|
|
<para>
|
|
Composite-type arguments are passed to the function as Python mappings. The
|
|
element names of the mapping are the attribute names of the composite type.
|
|
If an attribute in the passed row has the null value, it has the value
|
|
<symbol>None</symbol> in the mapping. Here is an example:
|
|
|
|
<programlisting>
|
|
CREATE TABLE employee (
|
|
name text,
|
|
salary integer,
|
|
age integer
|
|
);
|
|
|
|
CREATE FUNCTION overpaid (e employee)
|
|
RETURNS boolean
|
|
AS $$
|
|
if e["salary"] > 200000:
|
|
return True
|
|
if (e["age"] < 30) and (e["salary"] > 100000):
|
|
return True
|
|
return False
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
There are multiple ways to return row or composite types from a Python
|
|
scripts. In following examples we assume to have:
|
|
|
|
<programlisting>
|
|
CREATE TABLE named_value (
|
|
name text,
|
|
value integer
|
|
);
|
|
</programlisting>
|
|
or
|
|
<programlisting>
|
|
CREATE TYPE named_value AS (
|
|
name text,
|
|
value integer
|
|
);
|
|
</programlisting>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>Sequence types (tuple or list), but not <literal>set</literal> (because
|
|
it is not indexable)</term>
|
|
<listitem>
|
|
<para>
|
|
Returned sequence objects must have the same number of items as
|
|
composite types have fields. Item with index 0 is assigned to the first field
|
|
of the composite type, 1 to second and so on. For example:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION make_pair (name text, value integer)
|
|
RETURNS named_value
|
|
AS $$
|
|
return [ name, value ]
|
|
# or alternatively, as tuple: return ( name, value )
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
|
|
To return SQL null in any column, insert <symbol>None</symbol> at
|
|
the corresponding position.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Mapping (dictionary)</term>
|
|
<listitem>
|
|
<para>
|
|
Value for a composite type's column is retrieved from the mapping with
|
|
the column name as key. Example:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION make_pair (name text, value integer)
|
|
RETURNS named_value
|
|
AS $$
|
|
return { "name": name, "value": value }
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
|
|
Additional dictionary key/value pairs are ignored. Missing keys are
|
|
treated as errors, i.e. to return an SQL null value for any column, insert
|
|
<symbol>None</symbol> with the corresponding column name as the key.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Object (any object providing method <literal>__getattr__</literal>)</term>
|
|
<listitem>
|
|
<para>
|
|
Example:
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION make_pair (name text, value integer)
|
|
RETURNS named_value
|
|
AS $$
|
|
class named_value:
|
|
def __init__ (self, n, v):
|
|
self.name = n
|
|
self.value = v
|
|
return named_value(name, value)
|
|
|
|
# or simply
|
|
class nv: pass
|
|
nv.name = name
|
|
nv.value = value
|
|
return nv
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
|
|
<para>
|
|
If you do not provide a return value, Python returns the default
|
|
<symbol>None</symbol>. <application>PL/Python</application> translates
|
|
Python's <symbol>None</symbol> into the SQL null value.
|
|
</para>
|
|
|
|
<para>
|
|
A <application>PL/Python</application> function can also return sets of
|
|
scalar or composite types. There are serveral ways to achieve this because
|
|
the returned object is internally turned into an iterator. For following
|
|
examples, let's assume to have composite type:
|
|
|
|
<programlisting>
|
|
CREATE TYPE greeting AS (
|
|
how text,
|
|
who text
|
|
);
|
|
</programlisting>
|
|
|
|
Currently known iterable types are:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>Sequence types (tuple, list, set)</term>
|
|
<listitem>
|
|
<para>
|
|
<programlisting>
|
|
CREATE FUNCTION greet (how text)
|
|
RETURNS SETOF greeting
|
|
AS $$
|
|
# return tuple containing lists as composite types
|
|
# all other combinations work also
|
|
return ( [ how, "World" ], [ how, "PostgreSQL" ], [ how, "PL/Python" ] )
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Iterator (any object providing <symbol>__iter__</symbol> and
|
|
<symbol>next</symbol> methods)</term>
|
|
<listitem>
|
|
<para>
|
|
<programlisting>
|
|
CREATE FUNCTION greet (how text)
|
|
RETURNS SETOF greeting
|
|
AS $$
|
|
class producer:
|
|
def __init__ (self, how, who):
|
|
self.how = how
|
|
self.who = who
|
|
self.ndx = -1
|
|
|
|
def __iter__ (self):
|
|
return self
|
|
|
|
def next (self):
|
|
self.ndx += 1
|
|
if self.ndx == len(self.who):
|
|
raise StopIteration
|
|
return ( self.how, self.who[self.ndx] )
|
|
|
|
return producer(how, [ "World", "PostgreSQL", "PL/Python" ])
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Generator (<literal>yield</literal>)</term>
|
|
<listitem>
|
|
<para>
|
|
<programlisting>
|
|
CREATE FUNCTION greet (how text)
|
|
RETURNS SETOF greeting
|
|
AS $$
|
|
for who in [ "World", "PostgreSQL", "PL/Python" ]:
|
|
yield ( how, who )
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
|
|
<warning>
|
|
<para>
|
|
Currently, due to Python
|
|
<ulink url="http://sourceforge.net/tracker/index.php?func=detail&aid=1483133&group_id=5470&atid=105470">bug #1483133</ulink>,
|
|
some debug versions of Python 2.4
|
|
(configured and compiled with option <literal>--with-pydebug</literal>)
|
|
are known to crash the <productname>PostgreSQL</productname> server.
|
|
Unpatched versions of Fedora 4 contain this bug.
|
|
It does not happen in production version of Python or on patched
|
|
versions of Fedora 4.
|
|
</para>
|
|
</warning>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
Whenever new iterable types are added to Python language,
|
|
<application>PL/Python</application> is ready to use it.
|
|
</para>
|
|
|
|
<para>
|
|
The global dictionary <varname>SD</varname> is available to store
|
|
data between function calls. This variable is private static data.
|
|
The global dictionary <varname>GD</varname> is public data,
|
|
available to all Python functions within a session. Use with
|
|
care.<indexterm><primary>global data</><secondary>in
|
|
PL/Python</></indexterm>
|
|
</para>
|
|
|
|
<para>
|
|
Each function gets its own execution environment in the
|
|
Python interpreter, so that global data and function arguments from
|
|
<function>myfunc</function> are not available to
|
|
<function>myfunc2</function>. The exception is the data in the
|
|
<varname>GD</varname> dictionary, as mentioned above.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="plpython-trigger">
|
|
<title>Trigger Functions</title>
|
|
|
|
<indexterm zone="plpython-trigger">
|
|
<primary>trigger</primary>
|
|
<secondary>in PL/Python</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
When a function is used as a trigger, the dictionary
|
|
<literal>TD</literal> contains trigger-related values. The trigger
|
|
rows are in <literal>TD["new"]</> and/or <literal>TD["old"]</>
|
|
depending on the trigger event. <literal>TD["event"]</> contains
|
|
the event as a string (<literal>INSERT</>, <literal>UPDATE</>,
|
|
<literal>DELETE</>, or <literal>UNKNOWN</>).
|
|
<literal>TD["when"]</> contains one of <literal>BEFORE</>,
|
|
<literal>AFTER</>, and <literal>UNKNOWN</>.
|
|
<literal>TD["level"]</> contains one of <literal>ROW</>,
|
|
<literal>STATEMENT</>, and <literal>UNKNOWN</>.
|
|
<literal>TD["name"]</> contains the trigger name,
|
|
<literal>TD["table_name"]</> contains the name of the table on which the trigger occurred,
|
|
<literal>TD["table_schema"]</> contains the schema of the table on which the trigger occurred,
|
|
<literal>TD["name"]</> contains the trigger name, and
|
|
<literal>TD["relid"]</> contains the OID of the table on
|
|
which the trigger occurred. If the <command>CREATE TRIGGER</> command
|
|
included arguments, they are available in <literal>TD["args"][0]</> to
|
|
<literal>TD["args"][(<replaceable>n</>-1)]</>.
|
|
</para>
|
|
|
|
<para>
|
|
If <literal>TD["when"]</literal> is <literal>BEFORE</>, you may
|
|
return <literal>None</literal> or <literal>"OK"</literal> from the
|
|
Python function to indicate the row is unmodified,
|
|
<literal>"SKIP"</> to abort the event, or <literal>"MODIFY"</> to
|
|
indicate you've modified the row.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="plpython-database">
|
|
<title>Database Access</title>
|
|
|
|
<para>
|
|
The PL/Python language module automatically imports a Python module
|
|
called <literal>plpy</literal>. The functions and constants in
|
|
this module are available to you in the Python code as
|
|
<literal>plpy.<replaceable>foo</replaceable></literal>. At present
|
|
<literal>plpy</literal> implements the functions
|
|
<literal>plpy.debug(<replaceable>msg</>)</literal>,
|
|
<literal>plpy.log(<replaceable>msg</>)</literal>,
|
|
<literal>plpy.info(<replaceable>msg</>)</literal>,
|
|
<literal>plpy.notice(<replaceable>msg</>)</literal>,
|
|
<literal>plpy.warning(<replaceable>msg</>)</literal>,
|
|
<literal>plpy.error(<replaceable>msg</>)</literal>, and
|
|
<literal>plpy.fatal(<replaceable>msg</>)</literal>.<indexterm><primary>elog</><secondary>in PL/Python</></indexterm>
|
|
<function>plpy.error</function> and
|
|
<function>plpy.fatal</function> actually raise a Python exception
|
|
which, if uncaught, propagates out to the calling query, causing
|
|
the current transaction or subtransaction to be aborted.
|
|
<literal>raise plpy.ERROR(<replaceable>msg</>)</literal> and
|
|
<literal>raise plpy.FATAL(<replaceable>msg</>)</literal> are
|
|
equivalent to calling
|
|
<function>plpy.error</function> and
|
|
<function>plpy.fatal</function>, respectively.
|
|
The other functions only generate messages of different
|
|
priority levels.
|
|
Whether messages of a particular priority are reported to the client,
|
|
written to the server log, or both is controlled by the
|
|
<xref linkend="guc-log-min-messages"> and
|
|
<xref linkend="guc-client-min-messages"> configuration
|
|
variables. See <xref linkend="runtime-config"> for more information.
|
|
</para>
|
|
|
|
<para>
|
|
Additionally, the <literal>plpy</literal> module provides two
|
|
functions called <function>execute</function> and
|
|
<function>prepare</function>. Calling
|
|
<function>plpy.execute</function> with a query string and an
|
|
optional limit argument causes that query to be run and the result
|
|
to be returned in a result object. The result object emulates a
|
|
list or dictionary object. The result object can be accessed by
|
|
row number and column name. It has these additional methods:
|
|
<function>nrows</function> which returns the number of rows
|
|
returned by the query, and <function>status</function> which is the
|
|
<function>SPI_execute()</function> return value. The result object
|
|
can be modified.
|
|
</para>
|
|
|
|
<para>
|
|
For example,
|
|
<programlisting>
|
|
rv = plpy.execute("SELECT * FROM my_table", 5)
|
|
</programlisting>
|
|
returns up to 5 rows from <literal>my_table</literal>. If
|
|
<literal>my_table</literal> has a column
|
|
<literal>my_column</literal>, it would be accessed as
|
|
<programlisting>
|
|
foo = rv[i]["my_column"]
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>preparing a query</><secondary>in PL/Python</></indexterm>
|
|
The second function, <function>plpy.prepare</function>, prepares
|
|
the execution plan for a query. It is called with a query string
|
|
and a list of parameter types, if you have parameter references in
|
|
the query. For example:
|
|
<programlisting>
|
|
plan = plpy.prepare("SELECT last_name FROM my_users WHERE first_name = $1", [ "text" ])
|
|
</programlisting>
|
|
<literal>text</literal> is the type of the variable you will be
|
|
passing for <literal>$1</literal>. After preparing a statement, you
|
|
use the function <function>plpy.execute</function> to run it:
|
|
<programlisting>
|
|
rv = plpy.execute(plan, [ "name" ], 5)
|
|
</programlisting>
|
|
The third argument is the limit and is optional.
|
|
</para>
|
|
|
|
<para>
|
|
When you prepare a plan using the PL/Python module it is
|
|
automatically saved. Read the SPI documentation (<xref
|
|
linkend="spi">) for a description of what this means.
|
|
In order to make effective use of this across function calls
|
|
one needs to use one of the persistent storage dictionaries
|
|
<literal>SD</literal> or <literal>GD</literal> (see
|
|
<xref linkend="plpython-funcs">). For example:
|
|
<programlisting>
|
|
CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
|
|
if SD.has_key("plan"):
|
|
plan = SD["plan"]
|
|
else:
|
|
plan = plpy.prepare("SELECT 1")
|
|
SD["plan"] = plan
|
|
# rest of function
|
|
$$ LANGUAGE plpythonu;
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
<![IGNORE[
|
|
<!-- NOT CURRENTLY SUPPORTED -->
|
|
|
|
<sect1 id="plpython-trusted">
|
|
<title>Restricted Environment</title>
|
|
|
|
<para>
|
|
The current version of <application>PL/Python</application>
|
|
functions as a trusted language only; access to the file system and
|
|
other local resources is disabled. Specifically,
|
|
<application>PL/Python</application> uses the Python restricted
|
|
execution environment, further restricts it to prevent the use of
|
|
the file <function>open</> call, and allows only modules from a
|
|
specific list to be imported. Presently, that list includes:
|
|
<literal>array</>, <literal>bisect</>, <literal>binascii</>,
|
|
<literal>calendar</>, <literal>cmath</>, <literal>codecs</>,
|
|
<literal>errno</>, <literal>marshal</>, <literal>math</>, <literal>md5</>,
|
|
<literal>mpz</>, <literal>operator</>, <literal>pcre</>,
|
|
<literal>pickle</>, <literal>random</>, <literal>re</>, <literal>regex</>,
|
|
<literal>sre</>, <literal>sha</>, <literal>string</>, <literal>StringIO</>,
|
|
<literal>struct</>, <literal>time</>, <literal>whrandom</>, and
|
|
<literal>zlib</>.
|
|
</para>
|
|
</sect1>
|
|
|
|
]]>
|
|
|
|
</chapter>
|