1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00
Commit Graph

2051 Commits

Author SHA1 Message Date
Marko Mäkelä
7924158496 MDEV-19780 Remove the TokuDB storage engine
The TokuDB storage engine has been deprecated by upstream
Percona Server 8.0 in favor of MyRocks and will not be available
in subsequent major upstream releases.

Let us remove it from MariaDB Server as well.
MyRocks is actively maintained, and it can be used instead.
2020-05-14 10:11:47 +03:00
Sergei Golubchik
13038e4705 Merge branch '10.4' into 10.5 2020-05-09 20:43:36 +02:00
Sergei Petrunia
8d85715d50 MDEV-21794: Optimizer flag rowid_filter leads to long query
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).

Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
  blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
  CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.

All of the above is implemented in this patch
2020-05-07 12:27:17 +02:00
Sergei Golubchik
67aaf51cf9 cleanup: ha_external_unlock() helper
as mentioned in f9f33b85be and generally to make it
easier to talk about
2020-05-05 19:41:12 +02:00
Marko Mäkelä
496d0372ef Merge 10.4 into 10.5 2020-04-29 15:40:51 +03:00
Marko Mäkelä
0632b8034b Merge 10.3 into 10.4 2020-04-29 09:05:15 +03:00
Thirunarayanan Balathandayuthapani
c755974775 MDEV-19611 INPLACE ALTER does not fail on bad implicit default value
- Inplace alter shouldn't set default date column as '0000-00-00' when
table is not empty. So mysql_inplace_alter_table() copied
alter_ctx.error_if_not_empty to a new field of Alter_inplace_info.
In ha_innobase::check_if_supported_inplace_alter() should check the
error_if_not_empty flag and return INPLACE_NOT_SUPPORTED if the table
is not empty
2020-04-28 11:59:32 +05:30
Monty
eca5c2c67f Added support for more functions when using partitioned S3 tables
MDEV-22088 S3 partitioning support

All ALTER PARTITION commands should now work on S3 tables except

REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION

In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.

The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file

Things in more detail

- The .frm and .par files of partitioned tables are stored in S3 and kept
  in sync.
- Added hton callback create_partitioning_metadata to inform handler
  that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
  a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
  table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
  don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
  for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
  critical tracing.
2020-04-19 17:33:51 +03:00
Marko Mäkelä
af91266498 Merge 10.3 into 10.4
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
2020-04-16 12:12:26 +03:00
Marko Mäkelä
84db10f27b Merge 10.2 into 10.3 2020-04-15 09:56:03 +03:00
Sergei Golubchik
fcd84da5f1 MDEV-22218 InnoDB: Failing assertion: node->pcur->rel_pos == BTR_PCUR_ON upon LOAD DATA with NO_BACKSLASH_ESCAPES in SQL_MODE and unique blob in table
`inited == NONE` at the initialization time does not always mean
that it'll be `NONE` later, at the execution time. Use a more complex
caller-specific logic to decide whether to create a cloned lookup handler.

Besides LOAD (as in the original bug report) make sure that all
prepare_for_insert() invocations are covered by tests. Add tests for
CREATE ... SELECT, multi-UPDATE, and multi-DELETE.

Don't enable write cache with long uniques.
2020-04-12 22:10:57 +02:00
Vlad Lesin
5836191c8f MDEV-21168: Active XA transactions stop slave from working after backup
was restored.

Optionally rollback prepared XA's on "mariabackup --prepare".

The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
2020-04-07 15:05:38 +03:00
Nikita Malyavin
259fb1cbed MDEV-16978 Application-time periods: WITHOUT OVERLAPS
* The overlaps check is implemented on a handler level per row command.
  It creates a separate cursor (actually, another handler instance) and
  caches it inside the original handler, when ha_update_row or
  ha_insert_row is issued. Cursor closes on unlocking the handler.

* Containing the same key in index means unique constraint violation
  even in usual terms. So we fetch left and right neighbours and check
  that they have same key prefix, excluding from the key only the period part.
  If it doesnt match, then there's no such neighbour, and the check passes.
  Otherwise, we check if this neighbour intersects with the considered key.

* The check does not introduce new error and fails with ER_DUPP_KEY error.
  This might break REPLACE workflow and should be fixed separately
2020-03-31 17:42:34 +02:00
Sergei Golubchik
0515577d12 cleanup: prepare "update_handler" for WITHOUT OVERLAPS
* rename to a generic name
* move remaning initializations from query exec to prepare time
* simplify/unify key handling in open_table_from_share and delayed
* remove dead code
* move tests where they belong
2020-03-31 17:42:34 +02:00
Sergei Golubchik
27bf97aa00 cleanup: dead code, comments, avoid current_thd 2020-03-31 17:42:33 +02:00
Monty
eb483c5181 Updated optimizer costs in multi_range_read_info_const() and sql_select.cc
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
  not 1.0.  In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
  TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
  keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
  The default keyread_time() was optimized for blocks and not suitable for
  HEAP. The effect was the HEAP prefered table scans over ranges for btree
  indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
  Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
  we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
  as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
  HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
  This was to ensure that handler::keyread_time() doesn't give
  higher cost for heap tables than for normal tables. One effect of
  this is that heap and derived tables stored in heap will prefer
  key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
  multi_range_read_info_const(). All index cost calculation is now
  done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
  so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
2020-03-27 03:58:32 +02:00
Monty
f36ca142f7 Added page_range to records_in_range() to improve range statistics
Prototype change:
-  virtual ha_rows records_in_range(uint inx, key_range *min_key,
-                                   key_range *max_key)
+  virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+                                   const key_range *max_key,
+                                   page_range *res)

The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
  key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
  This can be used for a better calculation of IO seeks when we
  estimate the cost of a range index scan.

The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
2020-03-27 03:54:45 +02:00
Monty
37393bea23 Replace handler::primary_key_is_clustered() with handler::pk_is_clustering_key()
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.

As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
2020-03-24 21:00:04 +02:00
Monty
91ab42a823 Clean up and speed up interfaces for binary row logging
MDEV-21605 Clean up and speed up interfaces for binary row logging
MDEV-21617 Bug fix for previous version of this code

The intention is to have as few 'if' as possible in ha_write() and
related functions. This is done by pre-calculating once per statement the
row_logging state for all tables.

Benefits are simpler and faster code both when binary logging is disabled
and when it's enabled.

Changes:
- Added handler->row_logging to make it easy to check it table should be
  row logged. This also made it easier to disabling row logging for system,
  internal and temporary tables.
- The tables row_logging capabilities are checked once per "statements
  that updates tables" in THD::binlog_prepare_for_row_logging() which
  is called when needed from THD::decide_logging_format().
- Removed most usage of tmp_disable_binlog(), reenable_binlog() and
  temporary saving and setting of thd->variables.option_bits.
- Moved checks that can't change during a statement from
  check_table_binlog_row_based() to check_table_binlog_row_based_internal()
- Removed flag row_already_logged (used by sequence engine)
- Moved binlog_log_row() to a handler::
- Moved write_locked_table_maps() to THD::binlog_write_table_maps() as
  most other related binlog functions are in THD.
- Removed binlog_write_table_map() and binlog_log_row_internal() as
  they are now obsolete as 'has_transactions()' is pre-calculated in
  prepare_for_row_logging().
- Remove 'is_transactional' argument from binlog_write_table_map() as this
  can now be read from handler.
- Changed order of 'if's in handler::external_lock() and wsrep_mysqld.h
  to first evaluate fast and likely cases before more complex ones.
- Added error checking in ha_write_row() and related functions if
  binlog_log_row() failed.
- Don't clear check_table_binlog_row_based_result in
  clear_cached_table_binlog_row_based_flag() as it's not needed.
- THD::clear_binlog_table_maps() has been replaced with
  THD::reset_binlog_for_next_statement()
- Added 'MYSQL_OPEN_IGNORE_LOGGING_FORMAT' flag to open_and_lock_tables()
  to avoid calculating of binary log format for internal opens. This flag
  is also used to avoid reading statistics tables for internal tables.
- Added OPTION_BINLOG_LOG_OFF as a simple way to turn of binlog temporary
  for create (instead of using THD::sql_log_bin_off.
- Removed flag THD::sql_log_bin_off (not needed anymore)
- Speed up THD::decide_logging_format() by remembering if blackhole engine
  is used and avoid a loop over all tables if it's not used
  (the common case).
- THD::decide_logging_format() is not called anymore if no tables are used
  for the statement. This will speed up pure stored procedure code with
  about 5%+ according to some simple tests.
- We now get annotated events on slave if a CREATE ... SELECT statement
  is transformed on the slave from statement to row logging.
- In the original code, the master could come into a state where row
  logging is enforced for all future events if statement could be used.
  This is now partly fixed.

Other changes:
- Ensure that all tables used by a statement has query_id set.
- Had to restore the row_logging flag for not used tables in
  THD::binlog_write_table_maps (not normal scenario)
- Removed injector::transaction::use_table(server_id_type sid, table tbl)
  as it's not used.
- Cleaned up set_slave_thread_options()
- Some more DBUG_ENTER/DBUG_RETURN, code comments and minor indentation
  changes.
- Ensure we only call THD::decide_logging_format_low() once in
  mysql_insert() (inefficiency).
- Don't annotate INSERT DELAYED
- Removed zeroing pos_in_table_list in THD::open_temporary_table() as it's
  already 0
2020-03-24 21:00:03 +02:00
Monty
4ef437558a Improve update handler (long unique keys on blobs)
MDEV-21606 Improve update handler (long unique keys on blobs)
MDEV-21470 MyISAM and Aria start_bulk_insert doesn't work with long unique
MDEV-21606 Bug fix for previous version of this code
MDEV-21819 2 Assertion `inited == NONE || update_handler != this'

- Move update_handler from TABLE to handler
- Move out initialization of update handler from ha_write_row() to
  prepare_for_insert()
- Fixed that INSERT DELAYED works with update handler
- Give an error if using long unique with an autoincrement column
- Added handler function to check if table has long unique hash indexes
- Disable write cache in MyISAM and Aria when using update_handler as
  if cache is used, the row will not be inserted until end of statement
  and update_handler would not find conflicting rows.
- Removed not used handler argument from
  check_duplicate_long_entries_update()
- Syntax cleanups
  - Indentation fixes
  - Don't use single character indentifiers for arguments
2020-03-24 21:00:02 +02:00
Monty
736998cb75 Cleanups & indentation changes
- Only indentation changes in sql_rename.cc
- Ignore some WSREP error messages when there isn't a internet connection
- Force restart of stat_tables_part.test to make result stable
- Fixed compiler warnings in CONNECT
2020-03-24 21:00:02 +02:00
Monty
6a9e24d046 Added support for replication for S3
MDEV-19964 S3 replication support

Added new configure options:
s3_slave_ignore_updates
"If the slave has shares same S3 storage as the master"

s3_replicate_alter_as_create_select
"When converting S3 table to local table, log all rows in binary log"

This allows on to configure slaves to have the S3 storage shared or
independent from the master.

Other thing:
Added new session variable '@@sql_if_exists' to force IF_EXIST to DDL's.
2020-03-24 21:00:02 +02:00
Sergey Vojtovich
da82e75901 handler::rebind()
- rename PFS specific rebind_psi() to generic rebind()
- call rebind independently of PFS compilation status
- allow rebind() return an error
2020-03-24 20:47:41 +02:00
Monty
305cffebab merge 10.4 to 10.5 2020-03-18 12:00:38 +02:00
Monty
1242eb3d32 Removed double records_in_range calls from multi_range_read_info_const
This was to remove a performance regression between 10.3 and 10.4
In 10.5 we will have a better implementation of records_in_range
that will enable us to get more statistics.
This change was not done in 10.4 because the 10.5 will be part of
a larger change that is not suitable for the GA 10.4 version

Other things:
- Changed default handler block_size to 8192 to fix things statistics
  for engines that doesn't set the block size.
- Fixed a bug in spider when using multiple part const ranges
  (Patch from Kentoku)
2020-03-17 02:16:48 +02:00
Andrei Elkin
c8ae357341 MDEV-742 XA PREPAREd transaction survive disconnect/server restart
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.

Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of

    - binary logging extension to write prepared XA part of
      transaction signified with
      its XID in a new XA_prepare_log_event. The concusion part -
      with Commit or Rollback decision - is logged separately as
      Query_log_event.
      That is in the binlog the XA consists of two separate group of
      events.

      That makes the whole XA possibly interweaving in binlog with
      other XA:s or regular transaction but with no harm to
      replication and data consistency.

      Gtid_log_event receives two more flags to identify which of the
      two XA phases of the transaction it represents. With either flag
      set also XID info is added to the event.

      When binlog is ON on the server XID::formatID is
      constrained to 4 bytes.

    - engines are made aware of the server policy to keep up user
      prepared XA:s so they (Innodb, rocksdb) don't roll them back
      anymore at their disconnect methods.

    - slave applier is refined to cope with two phase logged XA:s
      including parallel modes of execution.

This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.

CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc

Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
   completion with ER_XA_RBROLLBACK.

2. binlog-* filtered XA when it changes engine data is regarded as
   loggable even when nothing got cached for binlog.  An empty
   XA-prepare group is recorded. Consequent Commit-or-Rollback
   succeeds in the Engine(s) as well as recorded into binlog.

3. The same applies to the non-transactional engine XA.

4. @@skip_log_bin=OFF does not record anything at XA-prepare
   (obviously), but the completion event is recorded into binlog to
   admit inconsistency with slave.

The following actions are taken by the patch.

At XA-prepare:
   when empty binlog cache - don't do anything to binlog if RO,
   otherwise write empty XA_prepare (assert(binlog-filter case)).

At Disconnect:
   when Prepared && RO (=> no binlogging was done)
     set Xid_cache_element::error := ER_XA_RBROLLBACK
     *keep* XID in the cache, and rollback the transaction.

At XA-"complete":
   Discover the error, if any don't binlog the "complete",
   return the error to the user.

Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
2020-03-14 22:45:48 +02:00
Sergei Golubchik
91d1588d30 Merge branch 'github/10.5' into 10.5 2020-03-14 09:52:35 +01:00
Marko Mäkelä
d82ac8d374 MDEV-21907: Fix some -Wconversion outside InnoDB
Some .c and .cc files are compiled as part of Mariabackup.
Enabling -Wconversion for InnoDB would also enable it for
Mariabackup. The .h files are being included during
InnoDB or Mariabackup compilation.

Notably, GCC 5 (but not GCC 4 or 6 or later versions)
would report -Wconversion for x|=y when the type is
unsigned char. So, we will either write x=(uchar)(x|y)
or disable the -Wconversion warning for GCC 5.

bitmap_set_bit(), bitmap_flip_bit(), bitmap_clear_bit(), bitmap_is_set():
Always implement as inline functions.
2020-03-12 19:44:52 +02:00
Oleksandr Byelkin
fad47df995 Merge branch '10.4' into 10.5 2020-03-11 17:52:49 +01:00
Sergei Golubchik
cbede21d0d cleanup: pass trxid by value 2020-03-10 19:24:23 +01:00
Sergei Golubchik
0d837e8153 perfschema table io instrumentation related changes 2020-03-10 19:24:23 +01:00
Eugene Kosov
7ccc1710a0 cleanup: key parts comparison
Engine specific code moved to engine.
2020-02-18 22:53:28 +03:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Faustin Lammler
2df2238cb8 Lintian complains on spelling error
The lintian check complains on spelling error:
https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/95739
2019-12-02 12:41:13 +02:00
Aleksey Midenkov
8ed646f071 Merge 10.4 into 10.5 2019-12-02 13:35:54 +03:00
Kentoku
e066723a41 MDEV-18973 CLIENT_FOUND_ROWS wrong in spider
Get count from last_used_con->info
Contributed by willhan at Tencent Games
2019-11-29 23:23:57 +09:00
Aleksey Midenkov
5130f5206c MDEV-20480 Obsolete internal parser for FK in InnoDB
Currently InnoDB uses internal parser for adding foreign keys. Remove
internal parser and use data parsed by SQL parser (sql_yacc) for
adding foreign keys.

- create_table_info_t::create_foreign_keys() replacement for
  dict_create_foreign_constraints_low();
- Pass constraint name via Foreign_key object.

Temporary until MDEV-20865:

- Pass alter_info as part of create_info.
2019-11-20 13:18:31 +03:00
Vladislav Vaintroub
7e08dd85d6 MDEV-16264 prerequisite patch, ha_preshutdown.
This is a prerequisite patch required to remove Innodb's
thd_destructor_proxy thread.

The patch implement pre-shutdown functionality for handlers.

A storage engine might need to perform some work after all user
connections are shut down, but before killing off the plugins.

The reason is that an SE could still be using some of the
server infrastructure. In case of Innodb this would be purge threads,
that call into the server to calculate results of virtual function,
acquire MDL locks on tables, or possibly also use the audit plugins.
2019-11-15 16:50:22 +01:00
Sergei Petrunia
68ed3a81f2 MDEV-20854: ANALYZE for statements: not clear where the time is spent
Count the "gap" time between table accesses and display it as
r_other_time_ms in the "table" element.

* The advantage of this approach is that it doesn't add any new
  my_timer_cycles() calls.
* The disadvantage is that the definition of what is done during
  "other time" is not that clear: it includes checking the WHERE
  (for this table), constructing index lookup tuple (for the next table)
  writing to GROUP BY temporary table (as we dont account for that time
  separately [yet], etc)
2019-11-12 14:40:00 +03:00
Marko Mäkelä
d04f2de80a Merge 10.4 into 10.5 2019-10-11 08:41:36 +03:00
Marko Mäkelä
09afd3da1a Merge 10.3 into 10.4 2019-10-10 21:30:40 +03:00
Aleksey Midenkov
3c78d1b640 Fix Mroonga compilation
Fixed warnings: -Woverloaded-virtual, -Winconsistent-missing-override

Caused by c9cba59749
2019-10-10 11:13:05 +03:00
Aleksey Midenkov
c9cba59749 MDEV-17333 Assertion in update_auto_increment() upon exotic LOAD
While `handler::next_insert_id` is restored on duplicate key errors
`part_share->next_auto_inc_val` is not restored which causes
discrepancy.
2019-10-10 00:20:34 +03:00
Aleksey Midenkov
58fdf5b2fa MDEV-16144 Default TIMESTAMP clause for SELECT from versioned
1. Removed TIMESTAMP/TRANSACTION unit auto-detection in favor of default TIMESTAMP.

Reasons:

1.1. rare practical use and doubtful advantage of such auto-detection;
1.2. it conflicts with MDEV-16226 (TRX_ID-based versioned tables performance improvement).

Needless check_unit membership removed.

2. SQL: versioning type handling refactoring

Vers_type_handler hierarchy stores versioning properties of type.

virtual Type_handler::vers() accesses specialization of
Vers_type_handler for specific type.

virtual Vers_type_handler::kind() returns versioning kind
(timestamp/trx_id).

Removed Type_handler::Vers_history_point_check_unit() in favor of
Type_handler::vers().

Renames:

require_timestamp() -> require_timestamp_error()
require_trx_id() -> require_trx_id_error()

EDIT by Alexander Barkov (@abarkov):

check_sys_fields() moved to Vers_type_handler::check_sys_fields()
2019-09-30 14:05:09 +03:00
Marko Mäkelä
d28686ada6 Merge 10.4 into 10.5 2019-09-12 16:36:46 +03:00
Marko Mäkelä
60c04be659 Merge 10.3 into 10.4 2019-09-12 12:16:40 +03:00
Nikita Malyavin
f6a7730c45 MDEV-16490: It's possible to make a system versioned table without any versioning field
* do not allow versioned table to be without versioned (non-system) fields
* prohibit changing field versioning, when removing table versioning
* handle CREATE...SELECT as well
2019-09-09 20:14:47 +03:00
Marko Mäkelä
4081b7b27a Merge 10.4 into 10.5 2019-09-06 17:16:40 +03:00
Sergei Golubchik
244f0e6dd8 Merge branch '10.3' into 10.4 2019-09-06 11:53:10 +02:00