1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-24 11:21:21 +03:00
Commit Graph

11952 Commits

Author SHA1 Message Date
Alice Sherepa
451573fab1 MDEV-21360 debug_dbug pre-test value restoration issues 2020-01-15 18:06:24 +01:00
Jan Lindström
800d1f3010 Disable usually failing Galera tests until a real fix is found. 2020-01-15 14:55:42 +02:00
Daniele Sciascia
7d31321464 MDEV-19803 Long semaphore wait error on galera.MW-388
The long semaphore wait appeared to be the caused by the following
pattern in the MTR test:

```
SET DEBUG_SYNC = "now SIGNAL wsrep_after_certification_continue";
SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb;
```

Raising two signals, one right after another, caused one signal to
overwrite the other, before the signal was consumed by the thread.
This caused one thread to be stuck until the debug sync point would
timeout.
2020-01-14 09:11:35 +02:00
Eugene Kosov
56529a7d7f MDEV-21454 Show actual mismatching values in mismatch error messages from row_import::match_table_columns()
Patch by Hartmut Holzgraefe
2020-01-10 22:50:19 +07:00
Sujatha
41cde4fe22 MDEV-18514: Assertion `!writer.checksum_len || writer.remains == 0' failed
Analysis:
========
'max_binlog_cache_size' is configured and a huge transaction is executed. When
the transaction specific events size exceeds 'max_binlog_cache_size' the event
cannot be written to the binary log cache and cache write error is raised.
Upon cache write error the statement is rolled back and the transaction cache
should be truncated to a previous statement specific position.  The truncate
operation should reset the cache to earlier valid positions and flush the new
changes. Even though the flush is successful the cache write error is still in
marked state. The truncate code interprets the cache write error as cache flush
failure and returns abruptly without modifying the write cache parameters.
Hence cache is in a invalid state. When a COMMIT statement is executed in this
session it tries to flush the contents of transaction cache to binary log.
Since cache has partial events the cache write operation will report
'writer.remains' assert.

Fix:
===
Binlog truncate function resets the cache to a specified size. As a first step
of truncation, clear the cache write error flag that was raised during earlier
execution. With this new errors that surface during cache truncation can be
clearly identified.
2020-01-09 12:45:05 +05:30
Sujatha
8317f77ccc Merge branch '10.1' into 10.2
MDEV-18046: Assortment of crashes, assertion failures and ASAN errors in mysql_show_binlog_events

Problem:
========
SHOW BINLOG EVENTS FROM <pos> reports following assert when ASAN is enabled.

uint32 binlog_get_uncompress_len(const char*):
  Assertion `(buf[0] & 0xe0) == 0x80' failed

Fix:
===
**Part11: Converted debug assert to error handler code**
2020-01-07 21:29:07 +05:30
Sujatha
a6dd827a4d MDEV-18046: Assortment of crashes, assertion failures and ASAN errors in mysql_show_binlog_events
Problem:
========
SHOW BINLOG EVENTS FROM <pos> causes a variety of failures, some of which are
listed below. It is not a race condition issue, but there is some
non-determinism in it.

Analysis:
========
"show binlog events from <pos>" code considers the user given position as a
valid event start position. The code starts reading data from this event start
position onwards and tries to map it to a set of known events. Each event has
a specific event structure and asserts have been added to ensure that read
event data satisfies the event specific requirements. When a random position
is supplied to "show binlog events command" the event structure specific
checks will fail and they result in assert.

Fix:
====
The fix is split into different parts. Each part addresses either an ASAN
issue or an assert/crash.

**Part1: Checksum based position validation when checksum is enabled**


Using checksum validate the very first event read at the user specified
position. If there is a checksum mismatch report an appropriate error for the
invalid event.
2020-01-07 18:27:05 +05:30
Marko Mäkelä
82187a1221 MDEV-21429 TRUNCATE and OPTIMIZE are being refused due to "row size too large"
By default (innodb_strict_mode=ON), InnoDB attempts to guarantee
at DDL time that any INSERT to the table can succeed.
MDEV-19292 recently revised the "row size too large" check in InnoDB.
The check still is somewhat inaccurate;
that should be addressed in MDEV-20194.

Note: If a table contains multiple long string columns so that each column
is part of a column prefix index, then an UPDATE that attempts to modify
all those columns at once may fail, because the undo log record might
not fit in a single undo log page (of innodb_page_size). In the worst case,
the undo log record would grow by about 3KiB of for each updated column.

The DDL-time check (since the InnoDB Plugin for MySQL 5.1) is optional
in the sense that when the maximum B-tree record size or undo log
record size would be exceeded, the DML operation will fail and the
transaction will be properly rolled back.

create_table_info_t::row_size_is_acceptable(): Add the parameter
'bool strict' so that innodb_strict_mode=ON can be overridden during
TRUNCATE, OPTIMIZE and ALTER TABLE...FORCE (when the storage format
is not changing).

create_table_info_t::create_table(): Perform a sloppy check for
TRUNCATE TABLE (create_fk=false).

prepare_inplace_alter_table_dict(): Perform a sloppy check for
simple operations.

trx_is_strict(): Remove. The function became unused in
commit 98694ab0cb (MDEV-20949).
2020-01-07 11:02:12 +02:00
Jan Lindström
5824e9f8df MDEV-13569: wsrep_info.plugin failed in buildbot with "no nodes coming from prim view
Modify configuration so that all nodes are part of galera cluster
i.e. wsrep_on=ON. Add missing wait conditions.

test changes only.
2020-01-07 08:57:30 +02:00
Jan Lindström
17b1b8118a MDEV-21189 : Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Test cleanup. Best practice for using RSU, is to isolate the node
up-front, so this test did not reflect real world scenario
2019-12-23 12:57:22 +02:00
Marko Mäkelä
73985d8301 Merge 10.1 into 10.2 2019-12-23 07:14:51 +02:00
Jan Lindström
c3824766c5 Fortify galera_partition test. 2019-12-18 10:02:57 +02:00
Jan Lindström
e9259d50f0 MDEV-21335 : Galera test failure on suite wsrep
Problem was that wsrep_on was OFF.
2019-12-18 10:02:57 +02:00
Anel Husakovic
8129ff1440 PR #1127 and PR #1150
PR#1127: Fix is_check_constraints.result to be compatibile with 10.3

The patch is done according to the original patch for MDEV-14474
1edd09c325 and not one which is merged on server
d526679efd.
This patch includes:
- Rename from `is_check_constraint` to `is_check_constraints` to tests
and results
- Per review, change the order of fields in IS check_constraints table by adding
the column `table_name` before `constraint_name`. According to the standard
2006 there is no `table_name` column.
- Original patch and one in `10.3` supports embedded server this patch doesn't
support. After the merge `10.3` will not support also.
- Don't use patch c8b8b01b61 to change the length of `CHECK_CLAUSE` field

PR#1150: MDEV-18440: Information_schema.check_constraints possible data leak

This patch is extension of PR 1127 and includes:
- Check for table grants
- Additional test according to the MDEV specification

Signed-off-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
2019-12-13 16:38:14 +02:00
Vlad Lesin
beec9c0e19 MDEV-21255: Deadlock of parallel slave and mariabackup (with failed log
copy thread)

mariabackup hangs waiting until innodb redo log thread read log till certain
LSN, and it waits under FTWRL. If there is redo log read error in the thread,
it is finished, and main thread knows nothing about it, what leads to hanging.
As it hangs under FTWRL, slave threads on server side can be blocked due
to MDL lock conflict.

The fix is to finish mariabackup with error message on innodb redo log read
failure.
2019-12-12 13:28:30 +03:00
Marko Mäkelä
f2d3b2eede Cleanup test sys_vars.innodb_buffer_pool_size_basic
When using huge pages, the innodb_buffer_pool_size cannot necessarily
be restored. Simplify things by restarting the server.
2019-12-10 17:01:36 +02:00
Marko Mäkelä
41e6a154ec MDEV-14482 - Cache line contention on ut_rnd_interval()
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse, this is cross-mutex traffic. That is, different
mutexes suffer from contention.

Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).

This is a backport of ce04790065 from
MariaDB Server 10.3.
2019-12-10 17:01:36 +02:00
Marko Mäkelä
b1f2d3a8c8 MDEV-21256: Replace the 64-bit LCG with a 32-bit Galois LFSR
We should not need anywhere near 32 bits of entropy, so we might
just limit ourselves to a 32-bit random number generator.

Also, it might be cheaper to use exclusive-or, bit shifting and
conditional jumps, instead of multiplication and addition.

We use relaxed atomic operations on the global random number generator
state in order in an attempt to silence any warnings about race conditions.
There is an obvious race condition between the load and store in
ut_rnd_gen(), but we do not think that it matters much that the
state of the random number generator could 'stutter'.

This change seems makes the 'uncompress_ops' nondeterministic
in innodb_zip.cmp_per_index after the restart. It looks like
there is an inherent race condition in the test, because the
table could be opened for InnoDB statistics recalculation
already before innodb_cmp_per_index_enabled was set. We might
end up having uncompress_ops anywhere between 0 and 9, or perhaps
even more. Let us remove that part of the test.
2019-12-10 16:59:34 +02:00
Jan Lindström
59e14b9684 MDEV-21189: Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Found two bugs

(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.

(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
2019-12-09 08:14:39 +02:00
Jan Lindström
2b7e461cc0 MDEV-21209 : mysql_tzinfo_to_sql's Galera checks do not work
wsrep_on parameter can be visible even when wsrep_on is set OFF
so we need to check variable_value from I_S also.
2019-12-05 12:41:13 +02:00
Jan Lindström
c9b9eb3315 MDEV-18497 : CTAS async replication from mariadb master crashes galera nodes (#1410)
In MariaDB 10.2 master could have been configured so that there
is extra annotate events. When we peak next event type for CTAS we
need to skip annotate events.
2019-12-04 11:46:37 +02:00
Oleksandr Byelkin
f7d35ffc76 Galera test fix after merge. 2019-12-03 20:39:16 +01:00
Oleksandr Byelkin
f8b5e147da Merge branch '10.1' into 10.2 2019-12-03 14:45:06 +01:00
Jan Lindström
88073dae79 MDEV-21198 : Galera test failure on galera_var_notify_cmd
Add proper wsrep sync wait.
2019-12-03 08:04:46 +02:00
Jan Lindström
c6ed37b88a MDEV-21182: Galera test failure on MW-284
galera_2nodes.cnf did not contain wsrep_on=1 on correct places. Fixed
restart options to use correct configuration.
2019-11-30 13:52:49 +02:00
HF
3fb0fe400c MENT-510 Failing test(s): perfschema.threads_insert_delayed.
orig_test_id should be set properly.
Also fixed sporadic test failure.
2019-11-29 21:25:52 +00:00
Vlad Lesin
bd11bd63cc MDEV-18310: Aria engine: Undo phase failed with "Got error 121 when
executing undo undo_key_delete" upon startup on datadir restored from
incremental backup

aria_log* files were not copied on --prepare --incremental-dir step from
incremental to destination backup directory.
2019-11-29 17:01:12 +03:00
Alexey Botchkov
bfa6db38cd MENT-510 Failing test(s): perfschema.threads_insert_delayed.
The thread_id of the INSERT DELAYED thread should not be set to 0.
2019-11-27 09:31:47 +04:00
Alexey Botchkov
0e403db2c8 MENT-237 Audit to show INSERT DELAYED for the executing user.
Add notifications about the user and connection that actually
did the DELAYED insert.
2019-11-27 09:23:00 +04:00
seppo
38839854b7 MDEV-19572 async slave node fails to apply MyISAM only writes (#1418)
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.

The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.

This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.

There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
2019-11-26 08:49:50 +02:00
Eugene Kosov
899c5bd5aa MDEV-20832 Don't print "row size too large" warnings in error log if innodb_strict_mode=OFF and log_warnings<=2
create_table_info_t::row_size_is_acceptable(): add condition for log writing
2019-11-20 19:48:03 +07:00
Marko Mäkelä
b80df9eba2 MDEV-21069 Crash on DROP TABLE if the data file is corrupted
buf_read_ibuf_merge_pages(): Discard any page numbers that are
outside the current bounds of the tablespace, by invoking the
function ibuf_delete_recs() that was introduced in MDEV-20934.
This could avoid an infinite change buffer merge loop on
innodb_fast_shutdown=0, because normally the change buffer merge
would only be attempted if a page was successfully loaded into
the buffer pool.

dict_drop_index_tree(): Add the parameter trx_t*.
To prevent the DROP TABLE crash, do not invoke btr_free_if_exists()
if the entire .ibd file will be dropped. Thus, we will avoid a crash
if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted,
and we will also avoid unnecessarily accessing the to-be-dropped
tablespace via the buffer pool.

In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0,
because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE
requires that the individual pages be freed inside the tablespace.
2019-11-19 00:07:06 +02:00
Jan Lindström
c6b097ab37 Remove excessive sleep from test. 2019-11-18 15:22:01 +02:00
seppo
5c68343db7 MDEV-18497 CTAS async replication from mariadb master crashes galera nodes (#1410)
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.

PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.

This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
2019-11-18 15:18:00 +02:00
Eugene Kosov
98694ab0cb MDEV-20949 Stop issuing 'row size' error on DML
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.

dict_index_add_to_cache(): remove parameter 'strict', stop checking row size

dict_index_t::record_size_info_t: this is a result of row size check operation

create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.

create_table_info_t::create_table(): add row size check

dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.

dict_index_t::rec_potentially_too_big(): removed
2019-11-13 22:00:55 +07:00
Marko Mäkelä
2350066e63 Merge 10.1 into 10.2 2019-11-12 14:36:37 +02:00
Sujatha
7df07c7666 MDEV-20953: binlog_encryption.rpl_corruption failed in buildbot due to wrong error code
Problem:
========
CURRENT_TEST: binlog_encryption.rpl_corruption

mysqltest: In included file "./include/wait_for_slave_io_error.inc":
...
At line 72: Slave stopped with wrong error code
**** Slave stopped with wrong error code: 1743 (expected 1595,1913) ****

Analysis:
========
The test emulates the corruption at the various stages of replication for
example in binlog file, in network and in relay log etc. It verifies that all
corruption cases are handled through appropriate error messages.

The test cases which emulate network failure expect following errors.
--ER_SLAVE_RELAY_LOG_WRITE_FAILURE (1595)
--ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE (1743)

Ideally test should expect error codes as 1595 and 1743.
But the test actually waits on incorrect error code 1595,1913

Fix:
===
Added appropriate error code for 'ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE'.
Replaced 1913 with 1743.
2019-11-12 16:31:08 +05:30
Andrei Elkin
40e65e878e rpl_semi_sync_gtid_reconnect results merge 2019-11-11 21:12:14 +02:00
Andrei Elkin
26fd880d5e manual merge 10.1->10.2 2019-11-11 16:03:43 +02:00
Andrei Elkin
13db50fc03 MDEV-19376 Repl_semi_sync_master::commit_trx assertion failure: ... || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.

The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.

Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.

Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
2019-11-10 16:16:37 +02:00
Marko Mäkelä
8688ef22c2 Merge 10.1 to 10.2 2019-11-06 10:18:51 +02:00
Marko Mäkelä
d7a2401750 MDEV-20934 Infinite loop on innodb_fast_shutdown=0 with inconsistent change buffer
Due to a data corruption bug that may have occurred a long time earlier
(possibly involving physical backup and MySQL Bug #69122, which was
addressed in commit f166ec71b7)
it seems possible that the InnoDB change buffer might end up containing
entries, while no buffered changes exist according to the change buffer
bitmap pages in the .ibd files.

ibuf_delete_recs(): New function, to be invoked on slow shutdown only.
Remove all buffered changes for a specific page.

ibuf_merge_or_delete_for_page(): If the change buffer bitmap is clean
and a slow shutdown is in progress, invoke ibuf_delete_recs().
We do not want to do that during normal operation, due to the additional
overhead that is involved. The bitmap page should be consistent with
the change buffer in the first place.
2019-11-06 08:48:48 +02:00
Thirunarayanan Balathandayuthapani
5c3bbbd845 MDEV-20987 InnoDB fails to start when fts table has FK relation
InnoDB: Assertion failure in file .../dict/dict0dict.cc line ...
InnoDB: Failing assertion: table->can_be_evicted

This fixes a regression that was caused by the fix of MDEV-20621
(commit a41d429765).
MySQL 5.6 (and MariaDB 10.0) introduced eviction of tables from
the InnoDB data dictionary cache. Tables that are connected to
FOREIGN KEY constraints or FULLTEXT INDEX are exempt of the eviction.
With the problematic change, a table that would already be exempt
from eviction due to FOREIGN KEY would cause the problem if there
also was a FULLTEXT INDEX defined on it.

dict_load_table(): Only prevent eviction if table->can_be_evicted holds.
2019-11-06 08:12:00 +02:00
Robert Bindar
6f86150ab3 MDEV-17896 Assertion `pfs->get_refcount() > 0' failed
Unfortunate DROP TEMPORARY..IF EXISTS on a regular table may allow
subsequent CREATE TABLE statements to steal away the PFS_table_share
instance from the dropped table.
2019-11-01 11:10:04 +02:00
Sergei Golubchik
fd6dfb3b54 Merge branch 'github/10.1' into 10.2 2019-10-30 23:38:05 +01:00
Sergei Golubchik
5392b4a32c MDEV-20354 All but last insert ignored in InnoDB tables when table locked
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.

The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
2019-10-30 23:14:44 +01:00
Oleksandr Byelkin
36f67a7dff Merge branch '10.1' into 10.2 2019-10-30 21:33:01 +01:00
Marko Mäkelä
d03a59c6ff XtraDB 5.6.45-86.1 2019-10-30 13:21:36 +02:00
Marko Mäkelä
d1e6b0bcff MDEV-20927 Duplicate key with auto increment
Apply the changes to InnoDB and XtraDB that had been
inadvertently skipped in the merge
commit ae476868a5

That merge failure sabotaged part of MDEV-20127:
>Revert a problematic auto_increment_increment 'fix' from 2014.
>This involves replacing the MDEV-8827 fix and in 10.1,
>removing some WSREP instrumentation.

The code changes were re-merged manually by executing the following:

 # Get the parent of the problematic merge.
git checkout ae476868a5394041a00e75a29c7d45917e8dfae8^
 # Perform the merge again.
git merge ae476868a5394041a00e75a29c7d45917e8dfae8^2
 # Get the conflict resolution from that merge.
git checkout ae476868a5 .
 # Note: Any changes to these files were removed (empty diff)!
git diff HEAD storage/{innobase,xtradb}/handler/ha_innodb.cc
 # Apply the code changes:
git diff cf40393471b10ca68cc1d2804c22ab9203900978^2..MERGE_HEAD \
storage/{innobase,xtradb}/handler/ha_innodb.cc|
patch -p1
2019-10-30 13:21:36 +02:00
Jan Lindström
cd1c10859d Fix test cases that use debug galera library.
Changes to be committed:
	modified:   mysql-test/suite/galera/r/MW-369.result
	modified:   mysql-test/suite/galera/r/MW-402.result
	modified:   mysql-test/suite/galera/r/galera#500.result
	modified:   mysql-test/suite/galera/r/galera_gcs_fragment.result
	modified:   mysql-test/suite/galera/r/mysql-wsrep#332.result
2019-10-30 10:14:56 +02:00