MDEV-27106 added REMOTE_TABLE, REMOTE_DATABASE, REMOTE_SERVER spider
table options. In this commit, we add all remaining options for table
params that are not marked to be deprecated.
All these options are parsed as strings from sql statements and have
string values at the sql level, so that we can determine whether it is
specified by checking its nullness.
The string values are further parsed by Spider into their actual types
in the SPIDER_SHARE, including string list, bounded nonnegative int,
bounded nonnegative int list, nonnegative longlong, boolean, and key
hints. Except for string lists, all other types are validated during
this parsing process.
Most of the options are backward compatible, i.e. they accept any
values that is accepted by there corresponding param parser. The only
exception is the index hint IDX which corresponds to the idxNNN param
name. For example,
'idx000 "f PRIMARY", idx001 "u k1"'
translates to
IDX="f PRIMARY u k1".
We include a test with all options specified, and tests involving
spider table options of all actual types.
Any table options, if present, will cause comments to be ignored with
a warning. The warning can be disabled by setting a new spider
global/session system variable spider_suppress_comment_ignored_warning
to 1.
Another global/session variable introduced is spider_ignore_comments,
which if set to 1, will cause COMMENT and CONNECTION strings to be
ignored unconditionally, whether or not table options are specified.
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
The system variable spider_disable_group_by_handler, if on, will
disable the spider group by handler (gbh), and such disablement serves
as workaround to bugs caused by gbh, labelled with spider-gbh on jira,
including MDEV-26247, MDEV-28998, MDEV-29163, MDEV-30392, MDEV-31645.
Tests for these tickets are added accordingly with the workaround in
place.
The direct aggregate mechanism sems to be only intended to work when
otherwise a full table scan query will be executed from the spider
node and the aggregation done at the spider node too. Typically this
happens in sub_select(). In the test spider.direct_aggregate_part
direct aggregate allows to send COUNT statements directly to the data
nodes and adds up the results at the spider node, instead of iterating
over the rows one by one at the spider node.
By contrast, the group by handler (GBH) typically sends aggregated
queries directly to data nodes, in which case DA does not improve the
situation here.
That is why we should fix it by disabling DA when GBH is used.
There are other reasons supporting this change. First, the creation of
GBH results in a call to change_to_use_tmp_fields() (as opposed to
setup_copy_fields()) which causes the spider DA function
spider_db_fetch_for_item_sum_funcs() to work on wrong items. Second,
the spider DA function only calls direct_add() on the items, and the
follow-up add() needs to be called by the sql layer code. In
do_select(), after executing the query with the GBH, it seems that the
required add() would not necessarily be called.
Disabling DA when GBH is used does fix the bug. There are a few
other things included in this commit to improve the situation with
spider DA:
1. Add a session variable that allows user to disable DA completely,
this will help as a temporary measure if/when further bugs with DA
emerge.
2. Move the increment of direct_aggregate_count to the spider DA
function. Currently this is done in rather bizarre and random
locations.
3. Fix the spider_db_mbase_row creation so that the last of its row
field (sentinel) is NULL. The code is already doing a null check, but
somehow the sentinel field is on an invalid address, causing the
segfaults. With a correct implementation of the row creation, we can
avoid such segfaults.
Extracted out common subroutines, gave more meaningful names etc,
added comments etc.
Also:
- Documented active servers load balancing reads, and other fields in
SPIDER_SHARE etc.
- Removed commented out code
- Documented and refactored self-reference check
- Removed some unnecessary functions
- Renamed unhelpful roop_count
- Refactored spider_get_{sts,crd}, where we turn get_type into an enum
- Cleaned up spider_mbase_handler::show_table_status() and
spider_mbase_handler::show_index()
Delete the deprecated variable, spider_use_handler and related code.
Spider now does not supports accessing data nodes via handler
statements. Thus, the notion of SQL kinds are no longer useful.
We too discard it.
"#ifdef WITH_PARTITION_STORAGE_ENGINE ... #endif" appears frequently
in the Spider code base. However, there is no need to maintain such
ifdefs because Spider is disabled if the partitioning engine is disabled.
Remove the dead-code, in Spider, which is related to the Spider's
HandlerSocket support. The code has been disabled for a long time
and it is unlikely that the code will be enabled.
Add the following parameter.
- spider_sync_sql_mode
Local sql_mode synchronous existence to remote server.
0 : It doesn't synchronize.
1 : It synchronizes.
The default value is 1
Add the following parameters.
- spider_remote_wait_timeout
Set remote wait_timeout at connecting for improvement performance of
connection if you know.
-1,0 : No set.
1 or more : Seconds.
The default value is -1
- spider_wait_timeout
The wait time to remote servers.
-1,0 : No set.
1 or more : Seconds.
The default value is 604800
Add a system variable spider_slave_trx_isolation.
- spider_slave_trx_isolation
The transaction isolation level when Spider table is used by slave SQL thread.
-1 : OFF
0 : READ UNCOMMITTED
1 : READ COMMITTED
2 : REPEATABLE READ
3 : SERIALIZABLE
The default value is -1
Miscellaneous Spider typos
Change default value of the followings
quick_mode 0 -> 3
quick_page_size 100 -> 1024
Add the following parameter for limiting result page size by byte
- quick_page_byte(qpb)
Number of bytes in a page when acquisition one by one.
When quick_mode is 1 or 2, Spider stores at least 1 record even if
quick_page_byte is smaller than 1 record. When quick_mode is 3,
quick_page_byte is used for judging using temporary table.
That is given to priority when server parameter spider_quick_page_byte
is set.
The default value is 10485760
Fix "out of sync" issue at using quick_mode = 1 or 2
Add a system variable spider_slave_trx_isolation.
- spider_slave_trx_isolation
The transaction isolation level when Spider table is used by slave SQL thread.
-1 : OFF
0 : READ UNCOMMITTED
1 : READ COMMITTED
2 : REPEATABLE READ
3 : SERIALIZABLE
The default value is -1
Miscellaneous Spider typos
Change default value of the followings
quick_mode 0 -> 3
quick_page_size 100 -> 1024
Add the following parameter for limiting result page size by byte
- quick_page_byte(qpb)
Number of bytes in a page when acquisition one by one.
When quick_mode is 1 or 2, Spider stores at least 1 record even if
quick_page_byte is smaller than 1 record. When quick_mode is 3,
quick_page_byte is used for judging using temporary table.
That is given to priority when server parameter spider_quick_page_byte
is set.
The default value is 10485760
Fix "out of sync" issue at using quick_mode = 1 or 2
The problem occurred because the Spider node was incorrectly handling
timestamp values sent to and received from the data nodes.
The problem has been corrected as follows:
- Added logic to set and maintain the UTC time zone on the data nodes.
To prevent timestamp ambiguity, it is necessary for the data nodes to use
a time zone such as UTC which does not have daylight savings time.
- Removed the spider_sync_time_zone configuration variable, which did not
solve the problem and which interfered with the solution.
- Added logic to convert to the UTC time zone all timestamp values sent to
and received from the data nodes. This is done for both unique and
non-unique timestamp columns. It is done for WHERE clauses, applying to
SELECT, UPDATE and DELETE statements, and for UPDATE columns.
- Disabled Spider's use of direct update when any of the columns to update is
a timestamp column. This is necessary to prevent false duplicate key value
errors.
- Added a new test spider.timestamp to thoroughly test Spider's handling of
timestamp values.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged:
Commit 97cc9d3 on branch bb-10.3-MDEV-16246
The problem occurred because the Spider node was incorrectly handling
timestamp values sent to and received from the data nodes.
The problem has been corrected as follows:
- Added logic to set and maintain the UTC time zone on the data nodes.
To prevent timestamp ambiguity, it is necessary for the data nodes to use
a time zone such as UTC which does not have daylight savings time.
- Removed the spider_sync_time_zone configuration variable, which did not
solve the problem and which interfered with the solution.
- Added logic to convert to the UTC time zone all timestamp values sent to
and received from the data nodes. This is done for both unique and
non-unique timestamp columns. It is done for WHERE clauses, applying to
SELECT, UPDATE and DELETE statements, and for UPDATE columns.
- Disabled Spider's use of direct update when any of the columns to update is
a timestamp column. This is necessary to prevent false duplicate key value
errors.
- Added a new test spider.timestamp to thoroughly test Spider's handling of
timestamp values.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 97cc9d3 on branch bb-10.3-MDEV-16246
The remote users need the SUPER privilege because by default Spider sends a
'SET SQL_LOG_OFF' statement to the data nodes. This is controlled by the
spider_internal_sql_log_off configuration setting on the Spider node, which
can only be set to 0 or 1, with a default value of 1.
I have fixed the problem by changing this configuration setting so that if it
is NOT SET, which is the most likely case, the Spider node DOES NOT SEND the
'SET SQL_LOG_OFF' statement to the data nodes. However if the
spider_internal_sql_log_off setting IS EXPLICITLY SET to either 0 or 1, then
the Spider node DOES SEND the 'SET SQL_LOG_OFF' statement, requiring a remote
user with the SUPER privilege. The Spider documentation will be updated to
reflect this change.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 72f0efa on branch bb-10.3-MDEV-15697
The remote users need the SUPER privilege because by default Spider sends a
'SET SQL_LOG_OFF' statement to the data nodes. This is controlled by the
spider_internal_sql_log_off configuration setting on the Spider node, which
can only be set to 0 or 1, with a default value of 1.
I have fixed the problem by changing this configuration setting so that if it
is NOT SET, which is the most likely case, the Spider node DOES NOT SEND the
'SET SQL_LOG_OFF' statement to the data nodes. However if the
spider_internal_sql_log_off setting IS EXPLICITLY SET to either 0 or 1, then
the Spider node DOES SEND the 'SET SQL_LOG_OFF' statement, requiring a remote
user with the SUPER privilege. The Spider documentation will be updated to
reflect this change.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.