This is a subtask of MCOL-4525 Implement select_handler=AUTO.
Server performs outer join to inner join conversion using simplify_joins()
in sql/sql_select.cc, by updating the TABLE_LIST::outer_join variable.
In order to perform this conversion, permanent changes are made in some
cases to the SELECT_LEX::JOIN::conds and/or TABLE_LIST::on_expr.
This is undesirable for MCOL-4525 which will attemp to fallback and execute
the query inside the server, in case the query execution fails in ColumnStore
using the select_handler.
For a query such as:
SELECT * FROM t1 LEFT JOIN t2 ON expr1 LEFT JOIN t3 ON expr2
In some cases, server can update the original SELECT_LEX::JOIN::conds
and/or TABLE_LIST::on_expr and create new Item_cond_and objects
(e.g. with 2 Item's expr1 and expr2 in Item_cond_and::list).
Instead of making changes to the original query structs, we use
gp_walk_info::tableOnExprList and gp_walk_info::condList. 2 Item's,
expr1 and expr2, in the condList, mean Item_cond_and(expr1, expr2), and
hence avoid permanent transformations to the SELECT_LEX.
We also define a new member variable
ha_columnstore_select_handler::tableOuterJoinMap
which saves the original TABLE_LIST::outer_join values before they are
updated. This member variable will be used later on to restore to the original
state of TABLE_LIST::outer_join in case of a query fallback to server execution.
The original simplify_joins() implementation in the server also performs a
flattening of the JOIN nest, however we don't perform this operation in
convertOuterJoinToInnerJoin() since it is not required for ColumnStore.
Main theme of the patch is to fix joins processing in the plugin
code. We now use SELECT_LEX::top_join_list and process the nested
joins recursively, instead of SELECT_LEX::table_list struct which
we earlier used to build the join filters. The earlier approach
did not process certain nested join ON expressions, causing certain
queries to incorrectly error out such as that described in MCOL-4680.
In addition, some legacy code is also removed.
We earlier leveraged the server functionality provided by
Item_in_subselect::create_in_to_exists_cond and
Item_in_subselect::inject_in_to_exists_cond
to create and inject the in-to-exists predicate into an IN
subquery's JOIN struct. With this patch, we leave the IN subquery's
JOIN unaltered and instead directly perform this predicate creation
and injection into ColumnStore's select execution plan.
For CHAR/VARCHAR/TEXT fields, the buffer size of a field represents
the field size in bytes, which can be bigger than the field size in
number of characters, for multi-byte character sets such as utf8,
utf8mb4 etc. The buffer also contains a byte length prefix which can be
up to 65532 bytes for a VARCHAR field, and much higher for a TEXT
field (we process a maximum byte length for a TEXT field which fits in
4 bytes, which is 2^32 - 1 = 4GB!).
There is also special processing for a TEXT field defined with a default
length like so:
CREATE TABLE cs1 (a TEXT CHARACTER SET utf8)
Here, the byte length is a fixed 65535, irrespective of the character
set used. This is different from a case such as:
CREATE TABLE cs1 (a TEXT(65535) CHARACTER SET utf8), where the byte length
for the field will be 65535*3.
The change for MCOL-4264 erroneously added the "lock_type" member
to cal_connection_info, which is shared between multiple tables.
So some tables that were opened for write erroneously identified
themselves as read only.
Moving the member to ha_mcs instead.
Problem:
When processing cross-engine queries like:
update cstab1 set a=100 where a not in (select a from innotab1 where a=11);
delete from innotab1 where a not in (select a from cstab1 where a=1);
the ColumnStore plugin erroneously executed the whole query inside
ColumnStore.
Fix:
- Adding a new member cal_connection_info::lock_type and setting it
inside ha_mcs_impl_external_lock() to the value passed in the parameter
"lock_type".
- Adding a method cal_connection_info::isReadOnly() to test
if the last table lock made in ha_mcs_impl_external_lock()
for done for reading.
- Adding a new condition checking cal_connection_info::isReadOnly() inside
ha_mcs_impl_rnd_init(). If the current table was locked last time for reading,
then doUpdateDelete() should not be executed.
For certain queries, such as:
update cs1 set i = 41 where i = 42 or (i is null and 42 is null);
the SELECT_LEX.where does not contain the required where conditions.
Server sends the where conditions in the call to cond_push(), so
we are storing them in a handler data member, condStack, and later
push them down to getSelectPlan() for UPDATES/DELETEs.
Original SH implementation sends the result set back to the client
thus it can't be used in INSERT..SELECT, SELECT INTO OUTFILE,CREATE
TABLE AS SELECT etc.
CLX-77 feature has been backported into MDB to enable SH to run
query part of the mentioned queries.
Direct update/delete executed doUpdateDelete as well as the regular
execution route for doUpdateDelete.
This patch only executes doUpdateDelete the first time and direct
update/delete collects the counts.
Disabled 4th if block in buildOuterJoin to handle non-optimized MDB query
structures.
Broke getSelectPlan into pieces: processFrom, processWhere.
MCOL-3593 UNION processing depends on two flags isUnion that comes as
arg of getSelectPlan and unionSel that is a local variable in
getSelectPlan. Modularization of getSelectPlan broke the mechanizm.
This patch is supposed to partially fix it.
MCOL-3593 Removed unused if condition from buildOuterJoin that allows
unsupported construct subquery in ON expression.
Fixed an improper if condition that ignors tableMap entries w/o condition
in external_lock thus external_lock doesn't clean up when the query
finishes.
Fixed wrong logging for queries processed in tableMode. Now rnd_init
properly sends queryText down to ExeMgr to be logged.
MCOL-3593 Unused attribute FromSubQuery::fFromSub was removed.
getSelectPlan has been modularized into: setExecutionParams,
processFrom, processWhere. SELECT, HAVING, GROUP BY, ORDER BY
still lives in getSelectPlan.
Copied optimization function simplify_joins_ into our pushdown
code to provide the plugin code with some rewrites from MDB it
expects.
The columnstore_processing_handlers_fallback session variable
has been removed thus CS can't fallback from SH to partial
execution paths, e.g. DH, GBH or plugin API.
MCOL-3602 Moved MDB optimizer rewrites into a separate file.
Add SELECT_LEX::optimize_unflattened_subqueries() call to fix IN
into EXISTS rewrite for semi-JOINs with subqueries.
disable_indices_for_CEJ() add index related hints to disable
index access methods in Cross Engine Joins.
create_SH() now flattens JOIN that has both physical tables and
views. This fixes most of views related tests in the regression.
This patch fixes:
MCOL-3557 - Row Based Replication events to ColumnStore tables will no
longer cause MariaDB to crash, it will error instead.
MCOL-3556 - Remove the Columnstore.xml variable to turn on ColumnStore
tables applying replication events and instead make it a system variable
that can be set in my.cnf called "columnstore_replication_slave". This
allows it to be set per-UM.