1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-08 00:28:29 +03:00
Commit Graph

4584 Commits

Author SHA1 Message Date
Monty
6f65e08277 MDEV-33118 optimizer_adjust_secondary_key_costs variable
optimizer-adjust_secondary_key_costs is added to provide 2 small
adjustments to the 10.x optimizer cost model. This can be used in the
case where the optimizer wrongly uses a secondary key instead of a
clustered primary key.

The reason behind this change is that MariaDB 10.x does not take into
account that for engines like InnoDB, that scanning a primary key can be
up to 7x faster than scanning a secondary key + read the row data trough
the primary key.

The different values for optimizer_adjust_secondary_key_costs are:

optimizer_adjust_secondary_key_costs=0
- No changes to current model

optimizer_adjust_secondary_key_costs=1
- Ensure that the cost of of secondary indexes has a cost of at
  least 5x times the cost of a clustered primary key (if one exists).
  This disables part of the worst_seek optimization described below.

optimizer_adjust_secondary_key_costs=2
- Disable "worst_seek optimization" and adjust filter cost slightly
  (add cost of 1 if filter is used).

The idea behind 'worst_seek optimization' is that we limit the
cost for all non clustered ref access to the least of:
- best-rows-by-range (or all rows in no range found) / 10
- scan-time-table (roughly number of file blocks to scan table) * 3

In addition we also do not try to use rowid_filter if number of rows
estimated for 'ref' access is less than the worst_seek limitation.

The idea is that worst_seek is trying to take into account that if
we do a lot of accesses through a key, this is likely to be cached.
However it only does this for secondary keys, and not for clustered
keys or index only reads.

The effect of the worst_seek are:
- In some cases 'ref' will have a much lower cost than range or using
  a clustered key.
- Some possible rowid filters for secondary keys will be ignored.

When implementing optimizer_adjust_secondary_key_costs=2, I noticed
that there is a slightly different costs for how ref+filter and
range+filter are calculated.  This caused a lot of range and
range+filter to change to ref+filter, which is not good as
range+filter provides the optimizer a better estimate of how many
accepted rows there will be in the result set.
Adding a extra small cost (1 seek) when using filter mitigated the
above problems in almost all cases.

This patch should not be applied to MariaDB 11.0 as worst_seeks is
removed in 11.0 and the cost calculation for clustered keys, secondary
keys, index scan and filter is more exact.

Test case changes for --optimizer-adjust_secondary_key_costs=1
(Fix secondary key costs to be 5x of primary key):

- stat_tables_innodb:
  - Complex change (probably ok as number of rows are really small)
    - ref over 1 row changed to range over 10 rows with join buffer
    - ref over 5 rows changed to eq_ref
    - secondary ref over 1 row changed to ref of primary key over 4 rows
    - Change of key to use longer key with index pushdown (a little
      bit worse but not significant).
  - Change to use secondary (1 row) -> primary (4 rows)
- rowid_filter_innodb:
  - index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref.

Test case changes for --optimizer-adjust_secondary_key_costs=2
(remove of worst_seeks & adjust filter cost):

- stat_tables_innodb:
  - Join order change (probably ok as number of rows are really small)
  - ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer)
    & eq_ref.
- selectivity_innodb:
  - ref -> ref|filter  (ok)
- rowid_filter_innodb:
  - ref -> ref|filter (ok)
  - range|filter (64 rows) changed to ref|filter (128 rows).
    ok as ref|filter outputs wrong number of rows in explain.
- range, range_mrr_icp:
  -ref (500 rows -> ALL (1000 rows) (ok)
- select_pkeycache, select, select_jcl6:
  - ref|filter (2 rows) -> ref (2 rows) (ok)
- selectivity:
  - ref -> ref_filter (ok)
- range:
  - Change of 'filtered' but no stat or plan change (ok)
- selectivity:
 - ref -> ref+filter (ok)
 - Change of filtered but no plan change (ok)
- join_nested_jcl6:
  - range -> ref|filter (ok as only 2 rows)
- subselect3, subselect3_jcl6:
  - ref_or_null (4 rows) -> ALL (10 rows) (ok)
  - Index_subquery (4 rows) -> ALL (10 rows)  (ok)
- partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb:
  - Uses ALL instead of REF for a key value that is the same for > 50%
    of rows.  (good)
order_by_innodb:
  - range (200 rows) -> ref (20 rows)+filesort (ok)
- subselect_sj2_mat:
  - One test changed. One ALL removed and replaced with eq_ref. Likely
    to be better.
- join_cache:
  - Changed ref over 60% of the rows to use hash join (ok)
- opt_tvc:
  - Changed to use eq_ref instead of ref with plan change (probably ok)
- opt_trace:
  - No worst/max seeks clipping (good).
  - Almost double range_scan_time and index_scan_time (ok).
- rowid_filter:
  - ref -> ref|filtered (ok)
  - range|filter (77 rows) changed to ref|filter (151 rows).  Proably
    ok as ref|filter outputs wrong number of rows in explain.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2024-01-23 13:03:11 +02:00
Michael Widenius
7af50e4df4 MDEV-32551: "Read semi-sync reply magic number error" warnings on master
rpl_semi_sync_slave_enabled_consistent.test and the first part of
the commit message comes from Brandon Nesterenko.

A test to show how to induce the "Read semi-sync reply magic number
error" message on a primary. In short, if semi-sync is turned on
during the hand-shake process between a primary and replica, but
later a user negates the rpl_semi_sync_slave_enabled variable while
the replica's IO thread is running; if the io thread exits, the
replica can skip a necessary call to kill_connection() in
repl_semisync_slave.slave_stop() due to its reliance on a global
variable. Then, the replica will send a COM_QUIT packet to the
primary on an active semi-sync connection, causing the magic number
error.

The test in this patch exits the IO thread by forcing an error;
though note a call to STOP SLAVE could also do this, but it ends up
needing more synchronization. That is, the STOP SLAVE command also
tries to kill the VIO of the replica, which makes a race with the IO
thread to try and send the COM_QUIT before this happens (which would
need more debug_sync to get around). See THD::awake_no_mutex for
details as to the killing of the replica’s vio.

Notes:
- The MariaDB documentation does not make it clear that when one
  enables semi-sync replication it does not matter if one enables
  it first in the master or slave. Any order works.

Changes done:
- The rpl_semi_sync_slave_enabled variable is now a default value for
  when semisync is started. The variable does not anymore affect
  semisync if it is already running. This fixes the original reported
  bug.  Internally we now use repl_semisync_slave.get_slave_enabled()
  instead of rpl_semi_sync_slave_enabled. To check if semisync is
  active on should check the @@rpl_semi_sync_slave_status variable (as
  before).
- The semisync protocol conflicts in the way that the original
  MySQL/MariaDB client-server protocol was designed (client-server
  send and reply packets are strictly ordered and includes a packet
  number to allow one to check if a packet is lost). When using
  semi-sync the master and slave can send packets at 'any time', so
  packet numbering does not work. The 'solution' has been that each
  communication starts with packet number 1, but in some cases there
  is still a chance that the packet number check can fail.  Fixed by
  adding a flag (pkt_nr_can_be_reset) in the NET struct that one can
  use to signal that packet number checking should not be done. This
  is flag is set when semi-sync is used.
- Added Master_info::semi_sync_reply_enabled to allow one to configure
  some slaves with semisync and other other slaves without semisync.
  Removed global variable semi_sync_need_reply that would not work
  with multi-master.
- Repl_semi_sync_master::report_reply_packet() can now recognize
  the COM_QUIT packet from semisync slave and not give a
  "Read semi-sync reply magic number error" error for this case.
  The slave will be removed from the Ack listener.
- On Windows, don't stop semisync Ack listener just because one
  slave connection is using socket_id > FD_SETSIZE.
- Removed busy loop in Ack_receiver::run() by using
 "Self-pipe trick" to signal new slave and stop Ack_receiver.
- Changed some Repl_semi_sync_slave functions that always returns 0
  from int to void.
- Added Repl_semi_sync_slave::slave_reconnect().
- Removed dummy_function Repl_semi_sync_slave::reset_slave().
- Removed some duplicate semisync notes from the error log.
- Add test of "if (get_slave_enabled() && semi_sync_need_reply)"
  before calling Repl_semi_sync_slave::slave_reply().
  (Speeds up the code as we can skip all initializations).
- If epl_semisync_slave.slave_reply() fails, we disable semisync
  for that connection.
- We do not call semisync.switch_off() if there are no active slaves.
  Instead we check in Repl_semi_sync_master::commit_trx() if there are
  no active threads. This simplices the code.
- Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is
  flushed in case of asserts.
- Removed the internal rpl_semi_sync_slave_status as it is not needed
  anymore. The @@rpl_semi_sync_slave_status status variable is now
  mapped to rpl_semi_sync_enabled.
- Removed rpl_semi_sync_slave_enabled  as it is not needed anymore.
  Repl_semi_sync_slave::get_slave_enabled() contains the active status.
- Added checking that we do not add a slave twice with
  Ack_receiver::add_slave(). This could happen with old code.
- Removed Repl_semi_sync_master::check_and_switch() as it is not
  needed anymore.
- Ensure that when we call Ack_receiver::remove_slave() that the slave
  is removed from the listener before function returns.
- Call listener.listen_on_sockets() outside of mutex for better
  performance and less contested mutex.
- Ensure that listening is ignoring newly added slaves when checking for
  responses.
- Fixed the master ack_receiver listener is not killed if there are no
  connected slaves (and thus stop semisync handling of future
  connections). This could happen if all slaves sockets where would be
  marked as unreliable.
- Added unlink() to base_ilist_iterator and remove() to
  I_List_iterator. This enables us to remove 'dead' slaves in
  Ack_recever::run().
- kill_zombie_dump_threads() now does killing of dump threads properly.
  - It can now kill several threads (should be impossible but could
    happen if IO slaves reconnects very fast).
  - We now wait until the dump thread is done before starting the
    dump.
- Added an error if kill_zombie_dump_threads() fails.
- Set thd->variables.server_id before calling
  kill_zombie_dump_threads(). This simplies the code.
- Added a lot of comments both in code and tests.
- Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used.

Test changes:
- rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with
  semisync enabled.
- Some timings changed slight with startup of slave which caused
  rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the
  error log file before the slave had started properly. Fixed by
  adding wait_for_pattern_in_file.inc that allows waiting for the
  pattern to appear in the log file.
- Tests have been updated so that we first set
  rpl_semi_sync_master_enabled on the master and then set
  rpl_semi_sync_slave_enabled on the slaves (this is according to how
  the MariaDB documentation document how to setup semi-sync).
- Error text "Master server does not have semi-sync enabled" has been
  replaced with "Master server does not support semi-sync" for the
  case when the master supports semi-sync but semi-sync is not
  enabled.

Other things:
- Some trivial cleanups in Repl_semi_sync_master::update_sync_header().
- We should in 11.3 changed the default value for
  rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE
  does not make much sense as default. The main difference with using
  FALSE is that we do not wait for semisync Ack if there are no slave
  threads.  In the case of TRUE we wait once, which did not bring any
  notable benefits except slower startup of master configured for
  using semisync.

Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>

This solves the problem reported in MDEV-32960 where a new
slave may not be registered in time and the master disables
semi sync because of that.
2024-01-23 13:03:11 +02:00
Marko Mäkelä
3a96eba25f Merge 10.5 into 10.6 2024-01-17 13:35:05 +02:00
Alexander Barkov
fa3171df08 MDEV-27666 User variable not parsed as geometry variable in geometry function
Adding GEOMETRY type user variables.
2024-01-16 18:53:23 +04:00
Marko Mäkelä
8bd5a3de7f Merge 10.5 into 10.6 2024-01-03 14:24:47 +02:00
Marko Mäkelä
3a3a4f044f Merge 10.4 into 10.5 2024-01-03 12:07:51 +02:00
Alexander Barkov
9695974e4b MDEV-33019 The database part is not case sensitive in SP names
Problem:

sp_cache erroneously looked up fully qualified SP names (e.g. `DB`.`SP`),
in case insensitive style. It was wrong, because only the "name"
part is always case insensitive, while the "db" part should be compared
according to lower_case_table_names (case sensitively for 0,
case insensitively for 1 and 2).

Fix:

Adding a "casedn_name" parameter make_qname() to tell
if the name part should be lower cased:
  `DB1`.`SP` -> "DB1.SP"  (when casedn_name=false)
  `DB1`.`SP` -> "DB1.sp"  (when casedn_name=true)
and using make_qname() with casedn_name=true when creating
sp_cache hash lookup keys.

Details:

As a result, it now works as follows:
- sp_head::m_db is converted to lower case if lower_case_table_names>0
  during the sp_name initialization phase. So when make_qname() is called,
  sp_head::m_db is already normalized. There are no changes in here.

- The initialization phase of sp_head when creating sp_head::m_qname
  now calls make_qname() with casedn_name=true,
  so sp_head::m_name gets written to sp_head::m_qname in lower case.

- sp_cache_lookup() now also calls make_qname() with casedn_name=true,
  so sp_head::m_name gets written to the temporary lookup key in lower case.

- sp_cache::m_hashtable now uses case sensitive comparison
2023-12-27 13:41:42 +04:00
Alexander Barkov
916caac2a5 MDEV-33019 The database part is not case sensitive in SP names
Part#1 A non-functional change

Changing the signature of Identifier_chain2::make_qname() from

  bool make_qname(MEM_ROOT *mem_root, LEX_CSTRING *dst) const;

to

  LEX_CSTRING make_qname(MEM_ROOT *mem_root) const;

Now the result is returned as LEX_CSTRING from the function rather than
is passed as a parameter.
The return value {NULL,0} means "EOM".
2023-12-27 13:22:49 +04:00
Alexander Barkov
371bf4abc6 A 11.3->10.4 backport for MDEV-31991 Split class Database_qualified_name
This is a requirement step to fix and merge easier
  MDEV-33019 The database part is not case sensitive in SP names

The original MDEV-31991 commit commend:

- Moving some of Database_qualified_name methods into a new class
  Identifier_chain2.

- Changing the data type of the following variables from
  Database_qualified_name to Identifier_chain2:

  * q_pkg_proc in LEX::call_statement_start()
  * q_pkg_func in LEX::make_item_func_call_generic()

Rationale:

The data type of Database_qualified_name::m_db will be changed
to Lex_ident_db soon. So Database_qualified_name won't be able
to store the `pkg.routine` part of `db.pkg.routine` any more,
because `pkg` must not depend on lower-case-table-names.
2023-12-27 13:02:58 +04:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Sergei Golubchik
98a39b0c91 Merge branch '10.4' into 10.5 2023-12-02 01:02:50 +01:00
Monty
dc1165419a Do not use MEM_ROOT in set_killed_no_mutex()
The reason for this change are the following:
- If we call set_killed() from one thread to kill another thread with
  a message, there may be concurrent usage of the MEM_ROOT which is
  not supported (this could cause memory corruption).
  We do not currently have code that does this, but the API allows this
  and it is better to be fix the issue before it happens.
- The per thread memory tracking does not work if one thread uses
  another threads MEM_ROOT.
- set_killed() can be called if a MEM_ROOT allocation fails.  In this case
  it is not good to try to allocate more memory from potentially the same
  MEM_ROOT.

Fix is to use my_malloc() instead of mem_root for killed messages.
2023-11-27 19:08:14 +02:00
Dmitry Shulga
5064750fbf MDEV-32466: Potential memory leak on executing of create view statement
This patch is actually follow-up for the task
  MDEV-23902: MariaDB crash on calling function
to use correct query arena for a statement. In case invocation of
a function is in progress use its call arena, else use current
query arena that can be either a statement or a regular query arena.
2023-11-24 16:26:12 +07:00
Oleksandr Byelkin
b83c379420 Merge branch '10.5' into 10.6 2023-11-08 15:57:05 +01:00
Kristian Nielsen
2a4c573339 MDEV-32728: Wrong mutex usage 'LOCK_thd_data' and 'wait_mutex'
Checking for kill with thd_kill_level() or check_killed() runs apc
requests, which takes the LOCK_thd_kill mutex. But this is dangerous,
as checking for kill needs to be called while holding many different
mutexes, and can lead to cyclic mutex dependency and deadlock.

But running apc is only "best effort", so skip running the apc if the
LOCK_thd_kill is not available. The apc will then be run on next check
of kill signal.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-11-08 14:50:43 +01:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Oleksandr Byelkin
fefd6d5559 MDEV-32656: ASAN errors in base_list_iterator::next / setup_table_map upon 2nd execution of PS
Correctly supress error issuing when saving value in field for comporison
2023-11-08 12:08:23 +01:00
Monty
2447172afb Ensure that process "State" is properly cleaned after query execution
In some cases "SHOW PROCESSLIST" could show "Reset for next command"
as State, even if the previous query had finished properly.

Fixed by clearing State after end of command and also setting the State
for the "Connect" command.

Other things:
- Changed usage of 'thd->set_command(COM_SLEEP)' to
  'thd->mark_connection_idle()'.
- Changed thread_state_info() to return "" instead of NULL. This is
  just a safety measurement and in line with the logic of the
  rest of the function.
2023-11-07 10:07:30 +02:00
Alexey Botchkov
3a8eb405e7 MDEV-27832 disable binary logging for SQL SERVICE.
Binary logging is now disabled for the queries run by SQL SERVICE.
The binlogging can be turned on with the 'SET SQL_LOG_BIN=On' query.

Conflicts:
	sql/sql_prepare.cc

Conflicts:
	sql/sql_prepare.cc
2023-11-05 23:35:31 +04:00
Alexey Botchkov
1fa196a559 MDEV-27595 Backport SQL service, introduced by MDEV-19275.
The SQL SERVICE backported into the 10.4.
2023-11-05 23:35:31 +04:00
Oleksandr Byelkin
df93b4f259 Fix MDEV-30820 problem found by Monty 2023-11-02 07:03:32 +01:00
Alexander Barkov
d2d657e722 MDEV-31187 Add class Sql_mode_save_for_frm_handling 2023-10-23 13:44:31 +04:00
Sergei Golubchik
f293b2b211 cleanup 2023-10-17 14:32:05 +02:00
Monty
fdcb443e62 Remember first error in Dummy_error_handler
Use Dummy_error_handler in open_stat_tables() to ignore all errors
when opening statistics tables.
2023-10-10 11:12:26 +03:00
Monty
d4347177c7 Change SEL_ARG::MAX_SEL_ARGS to a user defined variable optimizer_max_sel_args
This allows a user to to change the default value of MAX_SEL_ARGS (16000)
in the rare case where they neeed more generated SEL_ARGS (as part of
the range optimizer)
2023-10-03 08:25:31 +03:00
Monty
4e9322e2ff MDEV-32203 Raise notes when an index cannot be used on data type mismatch
Raise notes if indexes cannot be used:
- in case of data type or collation mismatch (diferent error messages).
- in case if a table field was replaced to something else
  (e.g. Item_func_conv_charset) during a condition rewrite.

Added option to write warnings and notes to the slow query log for
slow queries.

New variables added/changed:

- note_verbosity, with is a set of the following options:
  basic            - All old notes
  unusable_keys    - Print warnings about keys that cannot be used
                     for select, delete or update.
  explain          - Print unusable_keys warnings for EXPLAIN querys.

The default is 'basic,explain'. This means that for old installations
the only notable new behavior is that one will get notes about
unusable keys when one does an EXPLAIN for a query. One can turn all
of all notes by either setting note_verbosity to "" or setting sql_notes=0.

- log_slow_verbosity has a new option 'warnings'. If this is set
  then warnings and notes generated are printed in the slow query log
  (up to log_slow_max_warnings times per statement).

- log_slow_max_warnings   - Max number of warnings written to
                            slow query log.

Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
  For example using "note_verbosity=ALL" in a config file or
  "SET @@note_verbosity=ALL' in SQL.
- mysqldump will in the future use @@note_verbosity=""' instead of
  @sql_notes=0 to disable notes.
- Added "enum class Data_type_compatibility" and changing the return type
  of all Field::can_optimize*() methods from "bool" to this new data type.

Reviewer & Co-author: Alexander Barkov <bar@mariadb.com>
- The code that prints out the notes comes mainly from Alexander
2023-10-03 08:25:31 +03:00
Monty
e3b36b8f1b MDEV-31957 Concurrent ALTER and ANALYZE collecting statistics can result in stale statistical data
Example of what causes the problem:
T1: ANALYZE TABLE starts to collect statistics
T2: ALTER TABLE starts by deleting statistics for all changed fields,
    then creates a temp table and copies data to it.
T1: ANALYZE ends and writes to the statistics tables.
T2: ALTER TABLE renames temp table in place of the old table.

Now the statistics from analyze matches the old deleted tables.

Fixed by waiting to delete old statistics until ALTER TABLE is
the only one using the old table and ensure that rename of columns
can handle swapping of column names.

rename_columns_in_stat_table() (former rename_column_in_stat_tables())
now takes a list of columns to rename. It uses the following algorithm
to update column_stats to be able to handle circular renames

- While there are columns to be renamed and it is the first loop or
  last rename loop did change something.
  - Loop over all columns to be renamed
    - Change column name in column_stat
      - If fail because of duplicate key
      - If this is first change attempt for this column
         - Change column name to a temporary column name
         - If there was a conflicting row, replace it with the current row.
    else
     - Remove entry from column list

- Loop over all remaining columns in the list
 - Remove the conflicting row
 - Change column from temporary name to final name in column_stat

Other things:
- Don't flush tables for every operation. Only flush when all updates
  are done.
- Rename of columns was not handled in case of ALGORITHM=copy (old bug).
  - Fixed that we do not collect statistics for hidden hash columns
    used by UNIQUE constraint on long values.
  - Fixed that we do not collect statistics for blob columns referred by
    generated virtual columns. This was achieved by storing the fields for
    which we want to have statistics in table->has_value_set instead of
    in table->read_set.
- Rename of indexes was not handled for persistent statistics.
  - This is now handled similar as rename of columns. Renamed columns
    are now stored in 'rename_stat_indexes' and handled in
    Alter_info::delete_statistics() together with drooped indexes.
- ALTER TABLE .. ADD INDEX may instead of creating a new index rename
  an existing generated foreign key index. This was not reflected in
  the index_stats table because this was handled in
  mysql_prepare_create_table instead instead of in the mysql_alter() code.
  Fixed by adding a call in mysql_prepare_create_table() to drop the
  changed index.
  I also had to change the code that 'marked the index' to be ignored
  with code that would not destroy the original index name.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2023-10-03 08:25:30 +03:00
Jan Lindström
f57deb314f MDEV-31660 : Assertion `client_state.transaction().active() in wsrep_append_key
At the moment we cannot support
wsrep_forced_binlog_format=[MIXED|STATEMENT]
during CREATE TABLE AS SELECT.
Statement will use ROW instead and give
a warning.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-09-29 12:54:04 +02:00
Marko Mäkelä
8096139b3a Merge 10.5 into 10.6 2023-09-19 10:47:26 +03:00
Marko Mäkelä
6c05edfdcd Merge 10.4 into 10.5 2023-09-19 10:20:09 +03:00
Dmitry Shulga
68353dc92a MDEV-23902: MariaDB crash on calling function
On creation of a VIEW that depends on a stored routine an instance of
the class Item_func_sp is allocated on a memory root of SP statement.
It happens since mysql_make_view() calls the method
  THD::activate_stmt_arena_if_needed()
before parsing definition of the view.
On the other hand, when sp_head's rcontext is created an instance of
the class Field referenced by the data member
  Item_func_sp::result_field
is allocated on the Item_func_sp's Query_arena (call arena) that set up
inside the method
  Item_sp::execute_impl
just before calling the method
  sp_head::execute_function()

On return from the method sp_head::execute_function() all items allocated
on the Item_func_sp's Query_arena are released and its memory root is freed
(see implementation of the method Item_sp::execute_impl). As a consequence,
the pointer
  Item_func_sp::result_field
references to the deallocated memory. Later, when the method
  sp_head::execute
cleans up items allocated for just executed SP instruction the method
  Item_func_sp::cleanup is invoked and tries to delete an object referenced
by data member Item_func_sp::result_field that points to already deallocated
memory, that results in a server abnormal termination.

To fix the issue the current active arena shouldn't be switched to
a statement arena inside the function mysql_make_view() that invoked indirectly
by the method sp_head::rcontext_create. It is implemented by introducing
the new Query_arena's state STMT_SP_QUERY_ARGUMENTS that is set when explicit
Query_arena is created for placing SP arguments and other caller's side items
used during SP execution. Then the method THD::activate_stmt_arena_if_needed()
checks Query_arena's state and returns immediately without switching to
statement's arena.
2023-09-19 08:57:36 +07:00
Marko Mäkelä
0dd25f28f7 Merge 10.5 into 10.6 2023-09-11 14:46:39 +03:00
Marko Mäkelä
f8f7d9de2c Merge 10.4 into 10.5 2023-09-11 11:29:31 +03:00
Kristian Nielsen
e937a64d46 MDEV-10356: rpl.rpl_parallel_temptable failure due to incorrect commit optimization of temptables
The problem was that parallel replication of temporary tables using
statement-based binlogging could overlap the COMMIT in one thread with a DML
or DROP TEMPORARY TABLE in another thread using the same temporary table.
Temporary tables are not safe for concurrent access, so this caused
reference to freed memory and possibly other nastiness.

The fix is to disable the optimisation with overlapping commits of one
transaction with the start of a later transaction, when temporary tables are
in use. Then the following event groups will be blocked from starting until
the one using temporary tables is completed.

This also fixes occasional test failures of rpl.rpl_parallel_temptable seen
in Buildbot.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-09-07 14:40:05 +02:00
Oleksandr Byelkin
5ea5291d97 Merge branch '10.5' into 10.6 2023-08-04 07:52:54 +02:00
Sergei Golubchik
ab1191c039 cleanup: key->key_create_info.check_for_duplicate_indexes -> key->old
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.

This allows to mark old foreign keys too.
2023-08-01 22:43:16 +02:00
Sergei Golubchik
b8233b38da cleanup: put db/table_name into Alter_info
also, prefer Lex_table_name and Lex_ident over LEX_CSTRING
2023-08-01 22:43:16 +02:00
Sergei Golubchik
383baa812e cleanup: invert return code 2023-08-01 22:42:24 +02:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Sergei Petrunia
6e484c3bd9 MDEV-31577: Make ANALYZE FORMAT=JSON print innodb stats
ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
has the engine statistics. Only non-zero members are printed.

Internally: EXPLAIN data structures Explain_table_acccess and
Explain_update now have handler* handler_for_stats pointer.
It is used to read statistics from handler_for_stats->handler_stats.

The following applies only to 10.9+, backport doesn't use it:

Explain data structures exist after the tables are closed. We avoid
walking invalid pointers using this:
- SQL layer calls Explain_query::notify_tables_are_closed() before
  closing tables.
- After that call, printing of JSON output is disabled. Non-JSON output
  can be printed but we don't access handler_for_stats when doing that.
2023-07-21 16:50:11 +03:00
Alexander Barkov
400c101332 MDEV-30662 SQL/PL package body does not appear in I_S.ROUTINES.ROUTINE_DEFINITION
- Moving the code from a public function trim_whitespaces()
  to the class Lex_cstring as methods. This code may
  be useful in other contexts, and also this code becomes
  visible inside sql_class.h

- Adding a helper method THD::strmake_lex_cstring_trim_whitespaces()

- Unifying the way how CREATE PROCEDURE/CREATE FUNCTION and
  CREATE PACKAGE/CREATE PACKAGE BODY work:

  a) Now CREATE PACKAGE/CREATE PACKAGE BODY also calls
  Lex->sphead->set_body_start() to remember the cpp body start inside
  an sp_head member.

  b) adding a "const char *cpp_body_end" parameter to
  sp_head::set_stmt_end().

  These changes made it possible to reuse sp_head::set_stmt_end() inside
  LEX::create_package_finalize() and remove the duplucate code.

- Renaming sp_head::m_body_begin to m_cpp_body_begin and adding a comment
  to make it clear that this member is used only during parsing, and
  points to a fragment inside the cpp buffer.

- Changed sp_head::set_body_start() and sp_head::set_stmt_end()
  to skip the calls related to "body_utf8" in cases when m_parent is not NULL.
  A non-NULL m_parent means that we're inside a package routine.
  "body_utf8" in such case belongs not to the current sphead itself,
  but to parent (the package) sphead.
  So an sphead instance of a package routine should neither initialize,
  nor finalize, nor change in any other ways the "body_utf8" related
  members of Lex_input_stream, and should not take over or copy "body_utf8"
  data from Lex_input_stream to "this".
2023-07-14 13:26:26 +04:00
Kristian Nielsen
5d61442c85 MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica
The problem is that when a worker thread is (user) killed in
wait_for_prior_commit, the event group may complete out-of-order since the
wait for prior commit was aborted by the kill.

This fix ensures that event groups will always complete in-order, even
in the error case. This is done in finish_event_group() by doing an
extra wait_for_prior_commit(), if necessary, that ignores kills.

This fix supersedes the fix for MDEV-30780, so the earlier fix for
that is reverted in this patch.

Also fix that an error from wait_for_prior_commit() inside
finish_event_group() would not signal the error to
wakeup_subsequent_commits().

Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
some changes to simplify the semantics of wait_for_prior_commit() and
make the code more robust to future changes.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Brandon Nesterenko
9808ebe195 MDEV-30978: On slave XA COMMIT/XA ROLLBACK fail to return an error in read-only mode
Where a read-only server permits writes through replication, it
should not permit user connections to commit/rollback XA
transactions prepared via replication. The bug reported in
MDEV-30978 shows that this can happen. This is because there is no
read only check in the XA transaction logic, the most relevant one
occurs in ha_commit_trans() for normal statements/transactions.

This patch extends the XA transaction logic to check the read only
status of the server before performing an XA COMMIT or ROLLBACK.

Reviewed By:
Andrei Elkin <andrei.elkin@mariadb.com>
2023-07-11 07:49:44 -06:00
Monty
99bd226059 MDEV-31558 Add InnoDB engine information to the slow query log
The new statistics is enabled by adding the "engine", "innodb" or "full"
option to --log-slow-verbosity

Example output:

 # Pages_accessed: 184  Pages_read: 95  Pages_updated: 0  Old_rows_read: 1
 # Pages_read_time: 17.0204  Engine_time: 248.1297

Page_read_time is time doing physical reads inside a storage engine.
(Writes cannot be tracked as these are usually done in the background).
Engine_time is the time spent inside the storage engine for the full
duration of the read/write/update calls. It uses the same code as
'analyze statement' for calculating the time spent.

The engine statistics is done with a generic interface that should be
easy for any engine to use. It can also easily be extended to provide
even more statistics.

Currently only InnoDB has counters for Pages_% and Undo_% status.
Engine_time works for all engines.

Implementation details:

class ha_handler_stats holds all engine stats.  This class is included
in handler and THD classes.
While a query is running, all statistics is updated in the handler. In
close_thread_tables() the statistics is added to the THD.

handler::handler_stats is a pointer to where statistics should be
collected. This is set to point to handler::active_handler_stats if
stats are requested. If not, it is set to 0.
handler_stats has also an element, 'active' that is 1 if stats are
requested. This is to allow engines to avoid doing any 'if's while
updating the statistics.

Cloned or partition tables have the pointer set to the base table if
status are requested.

There is a small performance impact when using --log-slow-verbosity=engine:
- All engine calls in 'select' will be timed.
- IO calls for InnoDB reads will be timed.
- Incrementation of counters are done on local variables and accesses
  are inline, so these should have very little impact.
- Statistics has to be reset for each statement for the THD and each
  used handler. This is only 40 bytes, which should be neglectable.
- For partition tables we have to loop over all partitions to update
  the handler_status as part of table_init(). Can be optimized in the
  future to only do this is log-slow-verbosity changes. For this to work
  we have to update handler_status for all opened partitions and
  also for all partitions opened in the future.

Other things:
- Added options 'engine' and 'full' to log-slow-verbosity.
- Some of the new files in the test suite comes from Percona server, which
  has similar status information.
- buf_page_optimistic_get(): Do not increment any counter, since we are
  only validating a pointer, not performing any buf_pool.page_hash lookup.
- Added THD argument to save_explain_data_intern().
- Switched arguments for save_explain_.*_data() to have
  always THD first (generates better code as other functions also have THD
  first).
2023-07-07 12:53:18 +03:00
Brandon Nesterenko
8ed88e3455 Revert "MDEV-13915: STOP SLAVE takes very long time on a busy system"
This reverts commit 0a99d457b3
because it should go into only 10.5+
2023-06-06 08:11:38 -06:00
Brandon Nesterenko
0a99d457b3 MDEV-13915: STOP SLAVE takes very long time on a busy system
The problem is that a parallel replica would not immediately stop
running/queued transactions when issued STOP SLAVE. That is, it
allowed the current group of transactions to run, and sometimes the
transactions which belong to the next group could be started and run
through commit after STOP SLAVE was issued too, if the last group
had started committing. This would lead to long periods to wait for
all waiting transactions to finish.

This patch updates a parallel replica to try and abort immediately
and roll-back any ongoing transactions. The exception to this is any
transactions which are non-transactional (e.g. those modifying
sequences or non-transactional tables), and any prior transactions,
will be run to completion.

The specifics are as follows:

 1. A new stage was added to SHOW PROCESSLIST output for the SQL
Thread when it is waiting for a replica thread to either rollback or
finish its transaction before stopping. This stage presents as
“Waiting for worker thread to stop”

 2. Worker threads which error or are killed no longer perform GCO
cleanup if there is a concurrently running prior transaction. This
is because a worker thread scheduled to run in a future GCO could be
killed and incorrectly perform cleanup of the active GCO.

 3. Refined cases when the FL_TRANSACTIONAL flag is added to GTID
binlog events to disallow adding it to transactions which modify
both transactional and non-transactional engines when the binlogging
configuration allow the modifications to exist in the same event,
i.e. when using binlog_direct_non_trans_update == 0 and
binlog_format == statement.

 4. A few existing MTR tests relied on the completion of certain
transactions after issuing STOP SLAVE, and were re-recorded
(potentially with added synchronizations) under the new rollback
behavior.

Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>
2023-06-05 10:03:06 -06:00
Teemu Ollakka
f307160218 MDEV-29293 MariaDB stuck on starting commit state
This commit contains a merge from 10.5-MDEV-29293-squash
into 10.6.

Although the bug MDEV-29293 was not reproducible with 10.6,
the fix contains several improvements for wsrep KILL query and
BF abort handling, and addresses the following issues:

* MDEV-30307 KILL command issued inside a transaction is
  problematic for galera replication:
  This commit will remove KILL TOI replication, so Galera side
  transaction context is not lost during KILL.
* MDEV-21075 KILL QUERY maintains nodes data consistency but
  breaks GTID sequence: This is fixed as well as KILL does not
  use TOI, and thus does not change GTID state.
* MDEV-30372 Assertion in wsrep-lib state: This was caused by
  BF abort or KILL when local transaction was in the middle
  of group commit. This commit disables THD::killed handling
  during commit, so the problem is avoided.
* MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim
  in trx0trx.h:1065: The assertion happened when the victim was
  BF aborted via MDL while it was committing. This commit changes
  MDL BF aborts so that transactions which are committing cannot
  be BF aborted via MDL. The RQG grammar attached in the issue
  could not reproduce the crash anymore.

Original commit message from 10.5 fix:

    MDEV-29293 MariaDB stuck on starting commit state

    The problem seems to be a deadlock between KILL command execution
    and BF abort issued by an applier, where:
    * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
    * Applier has innodb side global lock mutex and victim trx mutex.
    * KILL is calling innobase_kill_query, and is blocked by innodb
      global lock mutex.
    * Applier is in wsrep_innobase_kill_one_trx and is blocked by
      victim's LOCK_thd_kill.

    The fix in this commit removes the TOI replication of KILL command
    and makes KILL execution less intrusive operation. Aborting the
    victim happens now by using awake_no_mutex() and ha_abort_transaction().
    If the KILL happens when the transaction is committing, the
    KILL operation is postponed to happen after the statement
    has completed in order to avoid KILL to interrupt commit
    processing.

    Notable changes in this commit:
    * wsrep client connections's error state may remain sticky after
      client connection is closed. This error message will then pop
      up for the next client session issuing first SQL statement.
      This problem raised with test galera.galera_bf_kill.
      The fix is to reset wsrep client error state, before a THD is
      reused for next connetion.
    * Release THD locks in wsrep_abort_transaction when locking
      innodb mutexes. This guarantees same locking order as with applier
      BF aborting.
    * BF abort from MDL was changed to do BF abort on server/wsrep-lib
      side first, and only then do the BF abort on InnoDB side. This
      removes the need to call back from InnoDB for BF aborts which originate
      from MDL and simplifies the locking.
    * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
      The manipulation of the wsrep_aborter can be done solely on
      server side. Moreover, it is now debug only variable and
      could be excluded from optimized builds.
    * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
      fine grained locking for SR BF abort which may require locking
      of victim LOCK_thd_kill. Added explicit call for
      wsrep_thd_kill_LOCK/UNLOCK where appropriate.
    * Wsrep-lib was updated to version which allows external
      locking for BF abort calls.

    Changes to MTR tests:
    * Disable galera_bf_abort_group_commit. This test is going to
      be removed (MDEV-30855).
    * Make galera_var_retry_autocommit result more readable by echoing
      cases and expectations into result. Only one expected result for
      reap to verify that server returns expected status for query.
    * Record galera_gcache_recover_manytrx as result file was incomplete.
      Trivial change.
    * Make galera_create_table_as_select more deterministic:
      Wait until CTAS execution has reached MDL wait for multi-master
      conflict case. Expected error from multi-master conflict is
      ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
      wsrep transaction when it is waiting for MDL, query gets interrupted
      instead of BF aborted. This should be addressed in separate task.
    * A new test galera_bf_abort_registering to check that registering trx gets
      BF aborted through MDL.
    * A new test galera_kill_group_commit to verify correct behavior
      when KILL is executed while the transaction is committing.

    Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
    Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:42:05 +02:00
Teemu Ollakka
3f59bbeeae MDEV-29293 MariaDB stuck on starting commit state
The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
  global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
  victim's LOCK_thd_kill.

The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.

Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
  client connection is closed. This error message will then pop
  up for the next client session issuing first SQL statement.
  This problem raised with test galera.galera_bf_kill.
  The fix is to reset wsrep client error state, before a THD is
  reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
  innodb mutexes. This guarantees same locking order as with applier
  BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
  side first, and only then do the BF abort on InnoDB side. This
  removes the need to call back from InnoDB for BF aborts which originate
  from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
  The manipulation of the wsrep_aborter can be done solely on
  server side. Moreover, it is now debug only variable and
  could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
  fine grained locking for SR BF abort which may require locking
  of victim LOCK_thd_kill. Added explicit call for
  wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
  locking for BF abort calls.

Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
  be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
  Trivial change.
* Make galera_create_table_as_select more deterministic:
  Wait until CTAS execution has reached MDL wait for multi-master
  conflict case. Expected error from multi-master conflict is
  ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
  wsrep transaction when it is waiting for MDL, query gets interrupted
  instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
  when KILL is executed while the transaction is committing.

Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:39:43 +02:00
Teemu Ollakka
6966d7fe4b MDEV-29293 MariaDB stuck on starting commit state
This is a backport from 10.5.

The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
  global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
  victim's LOCK_thd_kill.

The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.

Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
  client connection is closed. This error message will then pop
  up for the next client session issuing first SQL statement.
  This problem raised with test galera.galera_bf_kill.
  The fix is to reset wsrep client error state, before a THD is
  reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
  innodb mutexes. This guarantees same locking order as with applier
  BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
  side first, and only then do the BF abort on InnoDB side. This
  removes the need to call back from InnoDB for BF aborts which originate
  from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
  The manipulation of the wsrep_aborter can be done solely on
  server side. Moreover, it is now debug only variable and
  could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
  fine grained locking for SR BF abort which may require locking
  of victim LOCK_thd_kill. Added explicit call for
  wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
  locking for BF abort calls.

Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
  be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
  Trivial change.
* Make galera_create_table_as_select more deterministic:
  Wait until CTAS execution has reached MDL wait for multi-master
  conflict case. Expected error from multi-master conflict is
  ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
  wsrep transaction when it is waiting for MDL, query gets interrupted
  instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
  when KILL is executed while the transaction is committing.

Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:33:37 +02:00