mirror of
https://github.com/postgres/postgres.git
synced 2025-08-05 07:41:25 +03:00
Update README with proposed new method for determining calling convention
of user-defined functions (forget 'C' vs 'newC', instead require an info function to be present for new-style functions). Also update some other out-of-date commentary.
This commit is contained in:
@@ -1,4 +1,4 @@
|
|||||||
Proposal for function-manager redesign 24-May-2000
|
Proposal for function-manager redesign 19-Nov-2000
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
We know that the existing mechanism for calling Postgres functions needs
|
We know that the existing mechanism for calling Postgres functions needs
|
||||||
@@ -24,10 +24,6 @@ can be done on an incremental file-by-file basis --- we won't need a
|
|||||||
written in the old style can be left in place indefinitely, to provide
|
written in the old style can be left in place indefinitely, to provide
|
||||||
backward compatibility for user-written C functions.
|
backward compatibility for user-written C functions.
|
||||||
|
|
||||||
Note that neither the old function manager nor the redesign are intended
|
|
||||||
to handle functions that accept or return sets. Those sorts of functions
|
|
||||||
need to be handled by special querytree structures.
|
|
||||||
|
|
||||||
|
|
||||||
Changes in pg_proc (system data about a function)
|
Changes in pg_proc (system data about a function)
|
||||||
-------------------------------------------------
|
-------------------------------------------------
|
||||||
@@ -37,7 +33,8 @@ This is a boolean value which will be TRUE if the function is "strict",
|
|||||||
that is it always returns NULL when any of its inputs are NULL. The
|
that is it always returns NULL when any of its inputs are NULL. The
|
||||||
function manager will check this field and skip calling the function when
|
function manager will check this field and skip calling the function when
|
||||||
it's TRUE and there are NULL inputs. This allows us to remove explicit
|
it's TRUE and there are NULL inputs. This allows us to remove explicit
|
||||||
NULL-value tests from many functions that currently need them. A function
|
NULL-value tests from many functions that currently need them (not to
|
||||||
|
mention fixing many more that need them but don't have them). A function
|
||||||
that is not marked "strict" is responsible for checking whether its inputs
|
that is not marked "strict" is responsible for checking whether its inputs
|
||||||
are NULL or not. Most builtin functions will be marked "strict".
|
are NULL or not. Most builtin functions will be marked "strict".
|
||||||
|
|
||||||
@@ -67,7 +64,9 @@ typedef struct
|
|||||||
Oid fn_oid; /* OID of function (NOT of handler, if any) */
|
Oid fn_oid; /* OID of function (NOT of handler, if any) */
|
||||||
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
|
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
|
||||||
bool fn_strict; /* function is "strict" (NULL in => NULL out) */
|
bool fn_strict; /* function is "strict" (NULL in => NULL out) */
|
||||||
|
bool fn_retset; /* function returns a set (over multiple calls) */
|
||||||
void *fn_extra; /* extra space for use by handler */
|
void *fn_extra; /* extra space for use by handler */
|
||||||
|
MemoryContext fn_mcxt; /* memory context to store fn_extra in */
|
||||||
} FmgrInfo;
|
} FmgrInfo;
|
||||||
|
|
||||||
For an ordinary built-in function, fn_addr is just the address of the C
|
For an ordinary built-in function, fn_addr is just the address of the C
|
||||||
@@ -79,8 +78,9 @@ to denote a not-yet-initialized FmgrInfo struct. fn_extra will always
|
|||||||
be NULL when an FmgrInfo is first filled by the function lookup code, but
|
be NULL when an FmgrInfo is first filled by the function lookup code, but
|
||||||
a function handler could set it to avoid making repeated lookups of its
|
a function handler could set it to avoid making repeated lookups of its
|
||||||
own when the same FmgrInfo is used repeatedly during a query.) fn_nargs
|
own when the same FmgrInfo is used repeatedly during a query.) fn_nargs
|
||||||
is the number of arguments expected by the function, and fn_strict is
|
is the number of arguments expected by the function, fn_strict is its
|
||||||
its strictness flag.
|
strictness flag, and fn_retset shows whether it returns a set; all of
|
||||||
|
these values come from the function's pg_proc entry.
|
||||||
|
|
||||||
FmgrInfo already exists in the current code, but has fewer fields. This
|
FmgrInfo already exists in the current code, but has fewer fields. This
|
||||||
change should be transparent at the source-code level.
|
change should be transparent at the source-code level.
|
||||||
@@ -109,15 +109,17 @@ context is NULL for an "ordinary" function call, but may point to additional
|
|||||||
info when the function is called in certain contexts. (For example, the
|
info when the function is called in certain contexts. (For example, the
|
||||||
trigger manager will pass information about the current trigger event here.)
|
trigger manager will pass information about the current trigger event here.)
|
||||||
If context is used, it should point to some subtype of Node; the particular
|
If context is used, it should point to some subtype of Node; the particular
|
||||||
kind of context can then be indicated by the node type field. (A callee
|
kind of context is indicated by the node type field. (A callee should
|
||||||
should always check the node type before assuming it knows what kind of
|
always check the node type before assuming it knows what kind of context is
|
||||||
context is being passed.) fmgr itself puts no other restrictions on the use
|
being passed.) fmgr itself puts no other restrictions on the use of this
|
||||||
of this field.
|
field.
|
||||||
|
|
||||||
resultinfo is NULL when calling any function from which a simple Datum
|
resultinfo is NULL when calling any function from which a simple Datum
|
||||||
result is expected. It may point to some subtype of Node if the function
|
result is expected. It may point to some subtype of Node if the function
|
||||||
returns more than a Datum. Like the context field, resultinfo is a hook
|
returns more than a Datum. (For example, resultinfo is used when calling a
|
||||||
for expansion; fmgr itself doesn't constrain the use of the field.
|
function that returns a set, as discussed below.) Like the context field,
|
||||||
|
resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
|
||||||
|
of the field.
|
||||||
|
|
||||||
nargs, arg[], and argnull[] hold the arguments being passed to the function.
|
nargs, arg[], and argnull[] hold the arguments being passed to the function.
|
||||||
Notice that all the arguments passed to a function (as well as its result
|
Notice that all the arguments passed to a function (as well as its result
|
||||||
@@ -257,27 +259,15 @@ types. Modules or header files that define specialized SQL datatypes
|
|||||||
(eg, timestamp) should define appropriate macros for those types, so that
|
(eg, timestamp) should define appropriate macros for those types, so that
|
||||||
functions manipulating the types can be coded in the standard style.
|
functions manipulating the types can be coded in the standard style.
|
||||||
|
|
||||||
For non-primitive data types (particularly variable-length types) it
|
For non-primitive data types (particularly variable-length types) it won't
|
||||||
probably won't be very practical to hide the pass-by-reference nature of
|
be very practical to hide the pass-by-reference nature of the data type,
|
||||||
the data type, so the PG_GETARG and PG_RETURN macros for those types
|
so the PG_GETARG and PG_RETURN macros for those types won't do much more
|
||||||
probably won't do more than DatumGetPointer/PointerGetDatum plus the
|
than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
|
||||||
appropriate typecast. Functions returning such types will need to
|
TOAST discussion, below). Functions returning such types will need to
|
||||||
palloc() their result space explicitly. I recommend naming the GETARG
|
palloc() their result space explicitly. I recommend naming the GETARG and
|
||||||
and RETURN macros for such types to end in "_P", as a reminder that they
|
RETURN macros for such types to end in "_P", as a reminder that they
|
||||||
produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *".
|
produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *".
|
||||||
|
|
||||||
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
|
|
||||||
data value. There might be a few cases where the still-toasted value is
|
|
||||||
wanted, but I am having a hard time coming up with examples. For the
|
|
||||||
moment I'd say that any such code could use a lower-level macro that is
|
|
||||||
just ((struct varlena *) DatumGetPointer(fcinfo->arg[n])).
|
|
||||||
|
|
||||||
Note: the above examples assume that arguments will be counted starting at
|
|
||||||
zero. We could have the ARG macros subtract one from the argument number,
|
|
||||||
so that arguments are counted starting at one. I'm not sure if that would be
|
|
||||||
more or less confusing. Does anyone have a strong feeling either way about
|
|
||||||
it?
|
|
||||||
|
|
||||||
When a function needs to access fcinfo->flinfo or one of the other auxiliary
|
When a function needs to access fcinfo->flinfo or one of the other auxiliary
|
||||||
fields of FunctionCallInfo, it should just do it. I doubt that providing
|
fields of FunctionCallInfo, it should just do it. I doubt that providing
|
||||||
syntactic-sugar macros for these cases is useful.
|
syntactic-sugar macros for these cases is useful.
|
||||||
@@ -319,10 +309,6 @@ that this style of coding cannot pass a NULL input value nor cope with
|
|||||||
a NULL result (it couldn't before, either!). We can make the helper
|
a NULL result (it couldn't before, either!). We can make the helper
|
||||||
routines elog an error if they see that the function returns a NULL.
|
routines elog an error if they see that the function returns a NULL.
|
||||||
|
|
||||||
(Note: direct calls like this will have to be changed at the same time
|
|
||||||
that their called routines are changed to the new style. But that will
|
|
||||||
still be a lot less of a constraint than a "big bang" conversion.)
|
|
||||||
|
|
||||||
When invoking a function that has a known argument signature, we have
|
When invoking a function that has a known argument signature, we have
|
||||||
usually written either
|
usually written either
|
||||||
result = fmgr(targetfuncOid, ... args ... );
|
result = fmgr(targetfuncOid, ... args ... );
|
||||||
@@ -349,6 +335,68 @@ have to change in the first step of implementation, but they can
|
|||||||
continue to support the same external appearance.
|
continue to support the same external appearance.
|
||||||
|
|
||||||
|
|
||||||
|
Support for TOAST-able data types
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
|
||||||
|
data value. There might be a few cases where the still-toasted value is
|
||||||
|
wanted, but the vast majority of cases want the de-toasted result, so
|
||||||
|
that will be the default. To get the argument value without causing
|
||||||
|
de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
|
||||||
|
|
||||||
|
Some functions require a modifiable copy of their input values. In these
|
||||||
|
cases, it's silly to do an extra copy step if we copied the data anyway
|
||||||
|
to de-TOAST it. Therefore, each toastable datatype has an additional
|
||||||
|
fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
|
||||||
|
guaranteed-fresh copy, combining this with the detoasting step if possible.
|
||||||
|
|
||||||
|
There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
|
||||||
|
pointer if and only if it is different from the original value of the n'th
|
||||||
|
argument. This can be used to free the de-toasted value of the n'th
|
||||||
|
argument, if it was actually de-toasted. Currently, doing this is not
|
||||||
|
necessary for the majority of functions because the core backend code
|
||||||
|
releases temporary space periodically, so that memory leaked in function
|
||||||
|
execution isn't a big problem. However, as of 7.1 memory leaks in
|
||||||
|
functions that are called by index searches will not be cleaned up until
|
||||||
|
end of transaction. Therefore, functions that are listed in pg_amop or
|
||||||
|
pg_amproc should be careful not to leak detoasted copies, and so these
|
||||||
|
functions do need to use PG_FREE_IF_COPY() for toastable inputs.
|
||||||
|
|
||||||
|
A function should never try to re-TOAST its result value; it should just
|
||||||
|
deliver an untoasted result that's been palloc'd in the current memory
|
||||||
|
context. When and if the value is actually stored into a tuple, the
|
||||||
|
tuple toaster will decide whether toasting is needed.
|
||||||
|
|
||||||
|
|
||||||
|
Functions accepting or returning sets
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
As of 7.1, Postgres has limited support for functions returning sets;
|
||||||
|
this is presently handled only in SELECT output expressions, and the
|
||||||
|
behavior is to generate a separate output tuple for each set element.
|
||||||
|
There is no direct support for functions accepting sets; instead, the
|
||||||
|
function will be called multiple times, once for each element of the
|
||||||
|
input set. This behavior will very likely be changed in future releases,
|
||||||
|
but here is how it works now:
|
||||||
|
|
||||||
|
If a function is marked in pg_proc as returning a set, then it is called
|
||||||
|
with fcinfo->resultinfo pointing to a node of type ReturnSetInfo. A
|
||||||
|
function that desires to return a set should raise an error "called in
|
||||||
|
context that does not accept a set result" if resultinfo is NULL or does
|
||||||
|
not point to a ReturnSetInfo node. ReturnSetInfo contains a single field
|
||||||
|
"isDone", which should be set to one of these values:
|
||||||
|
|
||||||
|
ExprSingleResult /* expression does not return a set */
|
||||||
|
ExprMultipleResult /* this result is an element of a set */
|
||||||
|
ExprEndResult /* there are no more elements in the set */
|
||||||
|
|
||||||
|
A function returning set returns one set element per call, setting
|
||||||
|
fcinfo->resultinfo->isDone to ExprMultipleResult for each element.
|
||||||
|
After all elements have been returned, the next call should set
|
||||||
|
isDone to ExprEndResult and return a null result. (Note it is possible
|
||||||
|
to return an empty set by doing this on the first call.)
|
||||||
|
|
||||||
|
|
||||||
Notes about function handlers
|
Notes about function handlers
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
@@ -361,49 +409,91 @@ function is invoked many times. (fn_extra can only be used as a hint,
|
|||||||
since callers are not required to re-use an FmgrInfo struct.
|
since callers are not required to re-use an FmgrInfo struct.
|
||||||
But in performance-critical paths they normally will do so.)
|
But in performance-critical paths they normally will do so.)
|
||||||
|
|
||||||
Issue: in what context should a handler allocate memory that it intends
|
If the handler wants to allocate memory to hold fn_extra data, it should
|
||||||
to use for fn_extra data? The current palloc context when the handler
|
NOT do so in CurrentMemoryContext, since the current context may well be
|
||||||
is actually called might be considerably shorter-lived than the FmgrInfo
|
much shorter-lived than the context where the FmgrInfo is. Instead,
|
||||||
struct, which would lead to dangling-pointer problems at the next use
|
allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
|
||||||
of the FmgrInfo. Perhaps FmgrInfo should also store a memory context
|
context. fn_mcxt normally points at the context that was
|
||||||
identifier that the handler could use to allocate space of the right
|
CurrentMemoryContext at the time the FmgrInfo structure was created;
|
||||||
lifespan. (Having fmgr_info initialize this to CurrentMemoryContext
|
in any case it is required to be a context at least as long-lived as the
|
||||||
should work in nearly all cases, though a few places might have to
|
FmgrInfo itself.
|
||||||
set it differently.) At the moment I have not done this, since the
|
|
||||||
existing PL handlers only need to set fn_extra to point at long-lived
|
|
||||||
structures (data in their own caches) and don't really care which
|
|
||||||
context the FmgrInfo is in anyway.
|
|
||||||
|
|
||||||
Are there any other things needed by the call handlers for PL/pgsql and
|
|
||||||
other languages?
|
|
||||||
|
|
||||||
During the conversion process, support for old-style builtin functions
|
|
||||||
and old-style user-written C functions will be provided by appropriate
|
|
||||||
function handlers. For example, the handler for old-style builtins
|
|
||||||
looks roughly like fmgr_c() used to.
|
|
||||||
|
|
||||||
|
|
||||||
System table updates
|
Telling the difference between old- and new-style functions
|
||||||
--------------------
|
-----------------------------------------------------------
|
||||||
|
|
||||||
In the initial phase, two new entries will be added to pg_language
|
During the conversion process, we carried two different pg_language
|
||||||
for language types "newinternal" and "newC", corresponding to
|
entries, "internal" and "newinternal", for internal functions. The
|
||||||
builtin and dynamically-loaded functions having the new calling
|
function manager used the language code to distinguish which calling
|
||||||
convention.
|
convention to use. (Old-style internal functions were supported via
|
||||||
|
a function handler.) As of Nov. 2000, no old-style internal functions
|
||||||
|
remain, so we can drop support for them. We will remove the old "internal"
|
||||||
|
pg_language entry and rename "newinternal" to "internal".
|
||||||
|
|
||||||
There will also be a change to pg_proc to add the new "proisstrict"
|
The interim solution for dynamically-loaded compiled functions has been
|
||||||
column.
|
similar: two pg_language entries "C" and "newC". This naming convention
|
||||||
|
is not desirable for the long run, and yet we cannot stop supporting
|
||||||
|
old-style user functions. Instead, it seems better to use just one
|
||||||
|
pg_language entry "C", and require the dynamically-loaded library to
|
||||||
|
provide additional information that identifies new-style functions.
|
||||||
|
This avoids compatibility problems --- for example, existing dump
|
||||||
|
scripts will identify PL language handlers as being in language "C",
|
||||||
|
which would be wrong under the "newC" convention. Also, this approach
|
||||||
|
should generalize more conveniently for future extensions to the function
|
||||||
|
interface specification.
|
||||||
|
|
||||||
Then pg_proc entries will be changed from language code "internal" to
|
Given a dynamically loaded function named "foo" (note that the name being
|
||||||
"newinternal" piecemeal, as the associated routines are rewritten.
|
considered here is the link-symbol name, not the SQL-level function name),
|
||||||
(This will imply several rounds of forced initdbs as the contents of
|
the function manager will look for another function in the same dynamically
|
||||||
pg_proc change, but I think we can live with that.)
|
loaded library named "pg_finfo_foo". If this second function does not
|
||||||
|
exist, then foo is assumed to be called old-style, thus ensuring backwards
|
||||||
|
compatibility with existing libraries. If the info function does exist,
|
||||||
|
it is expected to have the signature
|
||||||
|
|
||||||
The old language names "internal" and "C" will continue to refer to
|
Pg_finfo_record * pg_finfo_foo (void);
|
||||||
functions with the old calling convention. We should deprecate
|
|
||||||
old-style functions because of their portability problems, but the
|
|
||||||
support for them will only be one small function handler routine,
|
|
||||||
so we can leave them in place for as long as necessary.
|
|
||||||
|
|
||||||
The expected calling convention for PL call handlers will need to change
|
The info function will be called by the fmgr, and must return a pointer
|
||||||
all-at-once, but fortunately there are not very many of them to fix.
|
to a Pg_finfo_record struct. (The returned struct will typically be a
|
||||||
|
statically allocated constant in the dynamic-link library.) The current
|
||||||
|
definition of the struct is just
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
int api_version;
|
||||||
|
} Pg_finfo_record;
|
||||||
|
|
||||||
|
where api_version is 0 to indicate old-style or 1 to indicate new-style
|
||||||
|
calling convention. In future releases, additional fields may be defined
|
||||||
|
after api_version, but these additional fields will only be used if
|
||||||
|
api_version is greater than 2.
|
||||||
|
|
||||||
|
These details will be hidden from the author of a dynamically loaded
|
||||||
|
function by using a macro. To define a new-style dynamically loaded
|
||||||
|
function named foo, write
|
||||||
|
|
||||||
|
PG_FUNCTION_INFO_V1(foo);
|
||||||
|
|
||||||
|
Datum
|
||||||
|
foo(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
The function itself is written using the same conventions as for new-style
|
||||||
|
internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
|
||||||
|
Note that old-style and new-style functions can be intermixed in the same
|
||||||
|
library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
|
||||||
|
each one.
|
||||||
|
|
||||||
|
The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
|
||||||
|
foo ... LANGUAGE 'C' regardless of whether it is old- or new-style.
|
||||||
|
|
||||||
|
New-style dynamic functions will be invoked directly by fmgr, and will
|
||||||
|
therefore have the same performance as internal functions after the initial
|
||||||
|
pg_proc lookup overhead. Old-style dynamic functions will be invoked via
|
||||||
|
a handler, and will therefore have a small performance penalty.
|
||||||
|
|
||||||
|
To allow old-style dynamic functions to work safely on toastable datatypes,
|
||||||
|
the handler for old-style functions will automatically detoast toastable
|
||||||
|
arguments before passing them to the old-style function. A new-style
|
||||||
|
function is expected to take care of toasted arguments by using the
|
||||||
|
standard argument access macros defined above.
|
||||||
|
Reference in New Issue
Block a user