Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
For any interesting JIT target, fields inside structs need to be
accessed. b96d550e contains infrastructure for syncing the definition
of types between postgres C code and runtime code generation with
LLVM. But that doesn't sync the number or names of fields inside
structs, just the types (including padding etc).
One option would be to hardcode the offset numbers in the JIT code,
but that'd be hard to keep in sync. Instead add macros indicating the
field offset to the fields that need to be accessed. Not pretty, but
manageable.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This has a performance benefit on own, although not hugely so. The
primary benefit is that it will allow for to JIT tuple deforming and
comparator invocations.
Large parts of this were previously committed (773aec7aa), but the
commit contained an omission around cross-type comparisons and was
thus reverted.
Author: Andres Freund
Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
This reverts commit 773aec7aa98abd38d6d9435913bb8e14e392c274.
There's an unresolved issue in the reverted commit: It only creates
one comparator function, but in for the nodeSubplan.c case we need
more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in
nodeSubplan.c).
This isn't too difficult to fix, but it's not entirely trivial
either. The fact that the issue only causes breakage on 32bit systems
shows that the current test coverage isn't that great. To avoid
turning half the buildfarm red till those two issues are addressed,
revert.
Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluating input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:
- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
commonly there are multiple aggregates in a query, resulting in the
same call-stack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
transition function invocation
Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.
This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.
There's potential for further improvement:
- It'd be nice if we could simplify the somewhat expensive
aggstate->all_pergroups lookups.
- right now there's still an advance_transition_function invocation in
nodeAgg.c, leading to some code duplication.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
This allows us to add stack-depth checks the first time an executor
node is called, and skip that overhead on following
calls. Additionally it yields a nice speedup.
While it'd probably have been a good idea to have that check all
along, it has become more important after the new expression
evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth
check in common paths anymore now. We previously relied on
ExecEvalExpr() being executed somewhere.
We should move towards that model for further routines, but as this is
required for v10, it seems better to only do the necessary (which
already is quite large).
Author: Andres Freund, Tom Lane
Reported-By: Julien Rouhaud
Discussion:
https://postgr.es/m/22833.1490390175@sss.pgh.pa.ushttps://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.
ExecReScan's second parameter is now unused, so it's removed.
bits indicating which optional capabilities can actually be exercised
at runtime. This will allow Sort and Material nodes, and perhaps later
other nodes, to avoid unnecessary overhead in common cases.
This commit just adds the infrastructure and arranges to pass the correct
flag values down to plan nodes; none of the actual optimizations are here
yet. I'm committing this separately in case anyone wants to measure the
added overhead. (It should be negligible.)
Simon Riggs and Tom Lane
look at the actual aggregate transition datatypes and the actual overhead
needed by nodeAgg.c, instead of using pessimistic round numbers.
Per a discussion with Michael Tiemann.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
to plan nodes, not vice-versa. All executor state nodes now inherit from
struct PlanState. Copying of plan trees has been simplified by not
storing a list of SubPlans in Plan nodes (eliminating duplicate links).
The executor still needs such a list, but it can build it during
ExecutorStart since it has to scan the plan tree anyway.
No initdb forced since no stored-on-disk structures changed, but you
will need a full recompile because of node-numbering changes.
entries, per pghackers discussion. This fixes aggregates to live in
namespaces, and also simplifies/speeds up lookup in parse_func.c.
Also, add a 'proimplicit' flag to pg_proc that controls whether a type
coercion function may be invoked implicitly, or only explicitly. The
current settings of these flags are more permissive than I would like,
but we will need to debate and refine the behavior; for now, I avoided
breaking regression tests as much as I could.