Fix partitioning for trx_id-versioned tables.
`partition by hash`, `range` and others now work.
`partition by system_time` is forbidden.
Currently we cannot use row_start and row_end in `partition by`, because
insertion of versioned field is done by engine's handler, as well as
row_start/row_end's value set up, which is a transaction id -- so it's
also forbidden.
The drawback is that it's now impossible to use `partition by key()`
without parameters for such tables, because it references row_start and
row_end implicitly.
* add handler::vers_can_native()
* drop Table_scope_and_contents_source_st::vers_native()
* drop partition_element::find_engine_flag as unused
* forbid versioning partitioning for trx_id as not supported
* adopt vers tests for trx_id partitioning
* forbid any row_end referencing in `partition by` clauses,
including implicit `by key()`
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c32f4d9eb87536886c6e893377bdd07.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.
We don't have many bits left, no need to add another InnoDB-specific flag.
Instead, we say that HA_REQUIRE_PRIMARY_KEY does not apply to SEQUENCE.
Meaning, if the engine declares HA_CAN_TABLES_WITHOUT_ROLLBACK (required
for SEQUENCE) it *must* support tables without a primary key.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
Fixed by adding table flag HA_WANTS_PRIMARY_KEY, which is like
HA_REQUIRE_PRIMARY_KEY but tells SQL upper layer that the storage engine
internally can handle tables without primary keys (for example for
sequences or trough user variables)
When using buffered sort in `UPDATE`, keyread is used. In this case,
`TABLE::update_virtual_field` should be aborted, but it actually isn't,
because it is called not with a top-level handler, but with the one that
is actually going to access the disk. Here the problemm is issued with
partitioning, so the solution is to recursively mark for keyread all the
underlying partition handlers.
* ha_partition: update keyread state for child partitions
Closes#800
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit eb2ca3d on branch bb-10.2-MDEV-16912
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
The problem occurred because the Spider node was incorrectly handling
timestamp values sent to and received from the data nodes.
The problem has been corrected as follows:
- Added logic to set and maintain the UTC time zone on the data nodes.
To prevent timestamp ambiguity, it is necessary for the data nodes to use
a time zone such as UTC which does not have daylight savings time.
- Removed the spider_sync_time_zone configuration variable, which did not
solve the problem and which interfered with the solution.
- Added logic to convert to the UTC time zone all timestamp values sent to
and received from the data nodes. This is done for both unique and
non-unique timestamp columns. It is done for WHERE clauses, applying to
SELECT, UPDATE and DELETE statements, and for UPDATE columns.
- Disabled Spider's use of direct update when any of the columns to update is
a timestamp column. This is necessary to prevent false duplicate key value
errors.
- Added a new test spider.timestamp to thoroughly test Spider's handling of
timestamp values.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 97cc9d3 on branch bb-10.3-MDEV-16246
Main reason was to make it easier to print the above structures in
a debugger. Additional benefits is that I was able to use same
defines for both structures, which simplifes some code.
Most of the code is just removing Alter_info:: and Alter_inplace_info::
from alter table flags.
Following renames was done:
HA_ALTER_FLAGS -> alter_table_operations
CHANGE_CREATE_OPTION -> ALTER_CHANGE_CREATE_OPTION
Alter_info::ADD_INDEX -> ALTER_ADD_INDEX
DROP_INDEX -> ALTER_DROP_INDEX
ADD_UNIQUE_INDEX -> ALTER_ADD_UNIQUE_INDEX
DROP_UNIQUE_INDEx -> ALTER_DROP_UNIQUE_INDEX
ADD_PK_INDEX -> ALTER_ADD_PK_INDEX
DROP_PK_INDEX -> ALTER_DROP_PK_INDEX
Alter_info:ALTER_ADD_COLUMN -> ALTER_PARSE_ADD_COLUMN
Alter_info:ALTER_DROP_COLUMN -> ALTER_PARSE_DROP_COLUMN
Alter_inplace_info::ADD_INDEX -> ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX
Alter_inplace_info::DROP_INDEX -> ALTER_DROP_NON_UNIQUE_NON_PRIM_INDEX
Other things:
- Added typedef alter_table_operatons for alter table flags
- DROP CHECK CONSTRAINT can now be done online
- Added checks for Aria tables in alter_table_online.test
- alter_table_flags now takes an ulonglong as argument.
- Don't support online operations if checksum option is used.
- sql_lex.cc doesn't add ALTER_ADD_INDEX if index is not created
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
Now we don't open partitions if it was explicitly cpecified.
ha_partition::m_opened_partition bitmap added to track
partitions that were actually opened.
Includes Spider patches
- 062_mariadb-10.2.0.direct_join_1and3.diff
- 063_mariadb-10.2.0.direct_join_for_single_partition.diff
- Test cases from Kentoku
Allows Spider to push full joins to the Spider engine trough the
create_group_by interface.
Other things:
- Increased MYSQL_VERSION_ID to check for 10211 (latest 10.2 version)
- Fix for const_table at calling create_group_by().
Original author: Kentoku SHIBA
Add support for direct update and direct delete requests for spider.
A direct update/delete request handles all qualified rows in a single
operation rather than one row at a time.
Contains Spiral patches:
006_mariadb-10.2.0.direct_update_rows.diff MDEV-7704
008_mariadb-10.2.0.partition_direct_update.diff MDEV-7706
010_mariadb-10.2.0.direct_update_rows2.diff MDEV-7708
011_mariadb-10.2.0.aggregate.diff MDEV-7709
027_mariadb-10.2.0.force_bulk_update.diff MDEV-7724
061_mariadb-10.2.0.mariadb-10.1.8.diff MDEV-12870
- The differences compared to the original patches:
- Most of the parameters of the new functions are unnecessary. The
unnecessary parameters have been removed.
- Changed bit positions for new handler flags upon consideration of
handler flags not needed by other Spiral patches and handler flags
merged from MySQL.
- Added info_push() (Was originally part of bulk access patch)
- Didn't include code related to handler socket
- Added HA_CAN_DIRECT_UPDATE_AND_DELETE
Original author: Kentoku SHIBA
First reviewer: Jacob Mathew
Second reviewer: Michael Widenius
- In Spider, calling cmp_ref() can be very expensive. In ha_partition.cc
we don't anymore sort rows according to position for the Spider
engine.
- Removed Spider specific call info(HA_EXTRA_STARTING_ORDERED_INDEX_SCAN)
from handle_ordered_index_scan(). It's caused performance issues and
does not change results for queries with ORDER BY.
- The visible effect of this patch is that for some storage engines,
rows may be returned in a different order if there is no ORDER BY clause.
- Based in Spiral Patch 052:
052_mariadb-10.2.0.add_partition_skip_pk_sort_for_non_clustered_index
MDEV-7748
- The major difference from original patch is that there is no variable to
get the old behaviour.
Other things:
- Optimized ha_partition::cmp_ref() and cmp_part_ids() to make them
simpler and faster.
- Changed arguments to cmp_key_part_id() to be same as
cmp_key_rowid_part_id to simplify code.
Original author: Kentoku SHIBA
First reviewer: Jacob Mathew
Second reviewer: Michael Widenius
Contains Spiral patches:
022_mariadb-10.2.0.auto_increment.diff MDEV-7720
030: 030_mariadb-10.2.0.partition_auto_inc_init.diff MDEV-7726
These patches have the following differences compared to the original
patches:
- Added the new #defines for the feature in spd_environ.h instead of in
handler.h because these #defines are needed by Spider and are not needed
by the server.
- Cleaned up code related to the removed variable m_need_info_for_auto_inc
. Changed variable assignment in lock_auto_increment() and
unlock_auto_increment() so that the assignments are done under locks.
- Added a test case.
- Added test result changes resulting from a bug that was fixed by these
patches.
Original author: Kentoku SHIBA
First reviewer: Jacob Mathew
Second reviewer: Michael Widenius
Contains Spiral patches:
007_mariadb-10.2.0.partition_fulltext.diff MDEV-7705
038_mariadb-10.2.0.partition_fulltext2.diff MDEV-7734
This commit has the following differences compared to the original
patches:
- Added necessary full text search cleanup at the storage engine layer
that was omitted in the original patch.
- Added test case.
- A lot of code cleanups to make the code notable smaller.
- Changed SQL code to use ha_ft_end() instead of ft_end()
Original author: Kentoku SHIBA
First reviewer: Jacob Mathew
Second reviewer: Michael Widenius
Other things:
- Cleanup of allocated bitmaps done in open(), which
simplifies init_partition_bitmaps()
- Add needed defines in ha_spider.cc to enable new spider code
- Fixed some DBUG_PRINT() to be consistent with normal code
- Removed end space
- The changes in test cases partition_innodb, partition_range,
partition_pruning etc are becasue partitions can now more exactly
calculate the number of rows in a range.
Contains spider patches:
014,015,023,033,035,037,040,042,044,045,049,050,051,053,059
Added Spider patches:
003_mariadb-10.0.15.vp.diff
060_mariadb-10.2.0.partition_reset_top_table_fields.diff
- Support HA_EXTRA_ADD_CHILDREN_LIST,HA_EXTRA_ATTACH_CHILDREN,
HA_EXTRA_IS_ATTACHED_CHILDREN and HA_EXTRA_DETACH_CHILDREN
in partition handler for handlers that has HA_CAN_MULTISTEPL_MERGE flag
- Added HA_CAN_MULTISTEPL_MERGE to MERGE handler.
- Added handler::get_child_handlers()
- Change m_num_lock to contain total number of locks. This was needed as
we now adjust number of locks when extra(HA_EXTRA_ATTACH_CHILDREN) is
called.
013_mariadb-10.0.15.vp_handler.diff
034_mariadb-10.0.15.vp_handler2.diff
005_mariadb-10.0.15.hs.diff
041_mariadb-10.0.15.vp_handler2.diff
+ Fixes from Kentoku
+ Added handler/suite.pm and handler/suite.opt to be able to run test cases
in spider/handler
Don't write to a temporary file, use String.
Remove strange one-liner "helpers", use String methods.
Don't use current_thd, don't allocate memory for 1-byte strings, etc.
* based on RANGE pruning by COLUMNS (sys_trx_end) condition
* removed DEFAULT; AS OF NOW is always last; current VERSIONING as last non-empty (or first empty)
* Min/Max stats in TABLE_SHARE
* ALTER TABLE ADD PARTITION adds before AS OF NOW partition
* one `AS OF NOW`, multiple `VERSIONING` partitions;
* rotation of `VERSIONING` partitions by record count, time period;
* rotation is multi-threaded;
* conventional subpartitions as bottom level for versioned partitions;
* `DEFAULT` keyword selects first `VERSIONING` partition;
* ALTER TABLE ADD/DROP partition;
* REBUILD PARTITION basic operation.
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)