mirror of
https://github.com/postgres/postgres.git
synced 2025-12-01 12:18:01 +03:00
Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
src/backend/nodes/README
Node Structures
===============
Andrew Yu (11/94)
Introduction
------------
The current node structures are plain old C structures. "Inheritance" is
achieved by convention. No additional functions will be generated. Functions
that manipulate node structures reside in this directory.
FILES IN THIS DIRECTORY (src/backend/nodes/)
General-purpose node manipulation functions:
copyfuncs.c - copy a node tree
equalfuncs.c - compare two node trees
outfuncs.c - convert a node tree to text representation
readfuncs.c - convert text representation back to a node tree
makefuncs.c - creator functions for some common node types
nodeFuncs.c - some other general-purpose manipulation functions
Specialized manipulation functions:
bitmapset.c - Bitmapset support
list.c - generic list support
params.c - Param support
tidbitmap.c - TIDBitmap support
value.c - support for value nodes
FILES IN src/include/nodes/
Node definitions:
nodes.h - define node tags (NodeTag)
primnodes.h - primitive nodes
parsenodes.h - parse tree nodes
pathnodes.h - path tree nodes and planner internal structures
plannodes.h - plan tree nodes
execnodes.h - executor nodes
memnodes.h - memory nodes
pg_list.h - generic list
Steps to Add a Node
-------------------
Suppose you want to define a node Foo:
1. Add a tag (T_Foo) to the enum NodeTag in nodes.h. (If you insert the
tag in a way that moves the numbers associated with existing tags,
you'll need to recompile the whole tree after doing this. It doesn't
force initdb though, because the numbers never go to disk.)
2. Add the structure definition to the appropriate include/nodes/???.h file.
If you intend to inherit from, say a Plan node, put Plan as the first field
of your struct definition.
3. If you intend to use copyObject, equal, nodeToString or stringToNode,
add an appropriate function to copyfuncs.c, equalfuncs.c, outfuncs.c
and readfuncs.c accordingly. (Except for frequently used nodes, don't
bother writing a creator function in makefuncs.c) The header comments
in those files give general rules for whether you need to add support.
4. Add cases to the functions in nodeFuncs.c as needed. There are many
other places you'll probably also need to teach about your new node
type. Best bet is to grep for references to one or two similar existing
node types to find all the places to touch.
Historical Note
---------------
Prior to the current simple C structure definitions, the Node structures
used a pseudo-inheritance system which automatically generated creator and
accessor functions. Since every node inherited from LispValue, the whole thing
was a mess. Here's a little anecdote:
LispValue definition -- class used to support lisp structures
in C. This is here because we did not want to totally rewrite
planner and executor code which depended on lisp structures when
we ported postgres V1 from lisp to C. -cim 4/23/90