The usage of windows functions when all tables were optimized away
by min/max optimization were not supported. As result a result,
the queries that used window functions with min/max aggregation
over the whole table returned wrong result sets.
The patch fixed this problem.
The function Item::split_sum_func2() incorrectly processed the function
items with window functions that were not window functions themselfes
and were used as arguments of other functions.
The bug was not visible in current HEAD. Introduced test case to catch
regressions. Also improve error messages regarding distinct usage in
window functions.
Window functions need to be computed after applying the HAVING clause.
An optimization that we have for regular, non-window function, cases is
to apply having only during sending of the rows to the client. This
allows rows that should be filtered from the temporary table used to
store aggregation results to be stored there.
This behaviour is undesireable for window functions, as we have to
compute window functions on the result-set after HAVING is applied.
Storing extra rows in the table leads to wrong values as the frame
bounds might capture those -to be filtered afterwards- rows.
These are different bugs, but the fixing code is the same:
if window functions are used over implicit grouping then
now the execution should follow the general path calling
the function set in JOIN::first_select.
Due to this bug many queries that contained a window function
with MIN/MAX aggregation returned wrong results.
Calculation of a MIN/MAX aggregate function uses cache objects
and a comparator object that are created and set up in
Item_sum_hybrid::fix_fields () by a call of Item_sum_hybrid::setup_hybrid().
The latter binds the objects to the first argument of the
MIN/MAX function. Meanwhile window function perform aggregation
over fields of a temporary table. So binding must be done rather to
these fields. The earliest moment when setup the objects used in
MIN/max functions can be done is after all calls of the method
split_sum_func().
This patch introduces this late setup, but only for aggregate
functions used in window functions.
Probably it makes sense to use this late setup for all MIN/MAX
objects.
This patch complements the patch for bug 11138.
Without this patch some table-less queries with window functions
could cause crashes due to a memory overwrite.
The method Item_sum::print did not print opening '(' after the name
of simple window functions (like rank, dense_rank etc).
As a result the view definitions with such window functions
were formed invalid in .frm files.
Using window functions over results of implicit groupings
required special handling in JOIN::make_aggr_tables_info.
The patch made sure that the result of implicit grouping
was written into a temporary table properly.
If a window function with aggregation is over the result
set of a grouping query then the argument of the aggregate
function from the window function is allowed to be an
aggregate function itself.
This bug happens due to a conflict in the construct window_spec.
(win_ref conflicts with the non-reserved key word ROWS).
The standard SQL-2003 says that ROWS is a reserved key word.
Made this key word reserved in our grammar and removed
the conflict.
There was no implementation of the virtual method print()
for the Item_window_func class. As a result for a view
containing window function an invalid view definition could
be written in the frm file. When a query that refers to
this view was executed a syntax error was reported.
Fix window function expressions such as win_func() <operator> expr.
The problem was found in 2 places.
First, when we have complex expressions containing window functions, we
can only compute their final value _after_ we have computed the window
function's values. These values must be stored within the temporary
table that we are using, before sending them off.
This is done by performing an extra copy_funcs call before the final
end_send() call.
Second, such expressions need to have their inner arguments,
changed such that the references within those arguments point to fields within
the temporary table.
Ex: sum(t.a) over (order by t.b) + sum(t.a) over (order by t.b)
Before this fix, t.a pointed to the original table's a field. In order
to compute the sum function's value correctly, it needs to point to the
copy of this field inside the temp table.
This is done by calling split_sum_func for each argument in the
expression in turn.
The win.test results have also been updated as they contained wrong
values for such a use case.
When specifying a RANGE type frame that exceeds the partition size, both
for the top and bottom cursors we end up removing more rows than added
to the aggregate function. This happens because our TOP range cursor,
which removes values from the aggregate function, would be allowed to breach
partition boundaries, while the BOTTOM range cursor would not.
To prevent this from happening, force the TOP range cursor to only move
within the current partition, as does the BOTTOM range cursor.
When join output is just one row, we still need to compute window
function values for it. We could skip invoking filesort for it,
but it doesn't seem to be worth it to do such optimization.
Make Frame_range_current_row_bottom to take into account partition bounds.
Other partition bounds that could potentially hit the end of partition are
Frame_range_n_bottom, Frame_n_rows_following, Frame_unbounded_following,
and they all had end-of-partition protection.
To simplify the code, factored out end-of-partition checks into
class Partition_read_cursor.
This bug revealed a serious problem: if the same partition list
was used in two window specifications then the temporary table created
to calculate window functions contained fields for two identical
partitions. This problem was fixed as well.
The bug was caused by a weird behaviour in test_if_group_changed, not
returning true when testing for the first time after initializing
the Cached_item list.
n=0 in "ROWS 0 PRECEDING" is valid, add handling for it:
- Adjust the assert
- Bottom bound of 'ROW 0 PRECEDING' is actually looking at the current
row, that is, it needs to process partition's first row directly in
Frame_n_rows_preceding::next_partition().
- Added testcases
Window functions need to have their own column in the work (temp) table,
like aggregate functions do.
They don't need val_int() -> val_int_result() conversion though, so they
should be wrapped with Item_direct_ref, not Item_aggregate_ref.
- When window functions are present, JOIN::simple_order should be set
to FALSE. (Otherwise, the optimizer may attempt to do a "pre-sorting"
on the first join_tab. Which can work in some cases, but generally
isn't)
- filesort tries to only read table fields that it requires. Window
function requires its temp.table field. In order to pass this info
to filesort, added an implementation of Item_window_func::
register_field_in_read_map.
" The sort order for the sub-sequence of window functions starting
from the element marked by SORTORDER_CHANGE_FLAG up to the next
element marked by SORTORDER_CHANGE_FLAG must be taken from the
last element of the sub-sequence (not from the first one)."
- Rename Window_funcs_computation to Window_funcs_computation_step
- Introduce Window_func_sort which invokes filesort and then
invokes computation of all window functions that use this ordering.
- Expose Window functions' sort operations in EXPLAIN|ANALYZE FORMAT=JSON
that the call-back comparison function returns a positive
number when arg1 < arg2, and a negative number when arg1 > arg2.
This is not in line with other implementation of sorting
algorithm.
Changed bubble_sort: now a negative result from the comparison
function means that arg1 < arg2, and positive result means
that arg1 > arg2.
Changed accordingly all call-back functions that are used as
parameters in the call of bubble_sort.
Added a test case to check the proper sorting of window functions.
- Hook window function computation into the right location.
- Add a testcase which shows that HAVING is now checked before
the window function computation step.