This failure was caused because of several bugs:
- Someone had removed s3-slave-ignore-updates=1 from slave.cnf, which
caused the slave to remove files that the master was working on.
- Bug in ha_partition::change_partitions() that didn't reset m_new_file
in case of errors. This caused crashes in ha_maria::extra() as the
maria handler was called on files that was already closed.
- In ma_pagecache there was a bug that when one got a read error one a
big block (s3 block), it left the flag PCBLOCK_BIG_READ on for the page
which cased an assert when the page where flushed.
- Flush all cached tables in case of ignored ALTER TABLE
Note that when merging code from 10.3, that fixes the partition bug, use
the code from this patch instead.
Changes to ma_pagecache.cc written or reviewed by Sanja
This commit fixed the problems with S3 after the "DROP TABLE FORCE" changes.
It also fixes all failing replication S3 tests.
A slave is delayed if it is trying to execute replicated queries on a
table that is already converted to S3 by the master later in the binlog.
Fixes for replication events on S3 tables for delayed slaves:
- INSERT and INSERT ... SELECT and CREATE TABLE are ignored but written
to the binary log. UPDATE & DELETE will be fixed in a future commit.
Other things:
- On slaves with --s3-slave-ignore-updates set, allow S3 tables to be
opened in read-write mode. This was done to be able to
ignore-but-replicate queries like insert. Without this change any
open of an S3 table failed with 'Table is read only' which is too
early to be able to replicate the original query.
- Errors are now printed if handler::extra() call fails in
wait_while_tables_are_used().
- Error message for row changes are changed from HA_ERR_WRONG_COMMAND
to HA_ERR_TABLE_READONLY.
- Disable some maria_extra() calls for S3 tables. This could cause
S3 tables to fail in some cases.
- Added missing thr_lock_delete() to ma_open() in case of failure.
- Removed from mysql_prepare_insert() the not needed argument 'table'.
Temporarily disable failing S3 tests to allow the suite to run in buildbot:
s3.partition and s3.partition_move are disabled due to MDEV-23648.
While the problem is supposedly generic, the tests don't fail with Amazon
setup, so they are disabled conditionally for emulator (MinIO).
s3.replication_partition is disabled due to MDEV-23730,
it fails both with Amazon and MinIO.
s3.replication_mixed and s3.replication_stmt are disabled due to MDEV-23770,
they also fail both with Amazon and the emulator.
* use the environment variable HA_S3_SO, not a literal ha_s3 in cnf files
* make ConfigFactory to support empty option values
* update no_s3.result after MDEV-11412
When converting a table (test.s3_table) from S3 to another engine, the
following will be logged to the binary log:
DROP TABLE IF EXISTS test.t1;
CREATE OR REPLACE TABLE test.t1 (...) ENGINE=new_engine
INSERT rows to test.t1 in binary-row-log-format
The bug is that the above statements are logged one by one to the binary
log. This means that a fast slave, configured to use the same S3 storage
as the master, would be able to execute the DROP and CREATE from the
binary log before the master has finished the ALTER TABLE.
In this case the slave would ignore the DROP (as it's on a S3 table) but
it will stop on CREATE of the local tale, as the table is still exists in
S3. The REPLACE part will be ignored by the slave as it can't touch the
S3 table.
The fix is to ensure that all the above statements is written to binary
log AFTER the table has been deleted from S3.
MDEV-22088 S3 partitioning support
All ALTER PARTITION commands should now work on S3 tables except
REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION
In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.
The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file
Things in more detail
- The .frm and .par files of partitioned tables are stored in S3 and kept
in sync.
- Added hton callback create_partitioning_metadata to inform handler
that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
critical tracing.
MDEV-19964 S3 replication support
Added new configure options:
s3_slave_ignore_updates
"If the slave has shares same S3 storage as the master"
s3_replicate_alter_as_create_select
"When converting S3 table to local table, log all rows in binary log"
This allows on to configure slaves to have the S3 storage shared or
independent from the master.
Other thing:
Added new session variable '@@sql_if_exists' to force IF_EXIST to DDL's.
Changes:
- maria_create() now uses a bit in the parameter flags to check if table
should be encrypted instead of using maria_encrypted_tables.
- Don't encrypt tables that are to be converted to S3
- Added encrypted flag to ARIA_TABLE_CAPABILITIES
- maria_chk --description now prints if table is encrypted. Other
operations is not allowed on encrypted tables.
MDEV-19591
Assertion `!table->pos_in_locked_tables' failed in tc_release_table upon
altering table into S3 under lock.
The problem was that thd->open_tables->pos_in_locked_tables was not reset
when alter table failed to reopen a locked table.
There was a bug in the page cache that didn't take into account that
another thread could be waiting for a page to be read by read_big_block().
Fixed by releasing all waiters in read_big_block()
There are two options when coping S3 tables with mysqldump
(there is startup option --copy_s3_tables, boolean, default no)
1) Ignore all tables with engine S3, as the data is already safe in S3 and any
computer where you restore the backup will automatically discover the S3 table.
2) Copy the table as a normal table with the following 2 changes:
- Change ENGINE=S3 to ENGINE=ARIA;
- After copy add to log 'ALTER TABLE table_name ENGINE=S3'
The error occured because aria_copy_to_s3() function tried to copy .frm file
of partition, but partition does not have it's own .frm file. The same is true
for aria_rename_s3().
To fix this issue the new parameter was added to those two functions to specify
if .frm file must be copied or not. The parameter is set to 'false' for
partitions.
Also there was other issue with EXCHANGE PARTITION. Briefly, there is the
following sequence of operations(see exchange_name_with_ddl_log() for details):
1) rename swap table to temporary table,
2) rename partition to swap table,
3) rename temporary table to partition.
On step (1) .frm file is renamed too. On step (2) the swap table does not
have .frm file, as partition does not have it. On step (3) partition will have
.frm file, because it will be renamed from temporary table. All of this causes
error on different stages of the table access. To fix it, .frm is not touched
at all for s3 during EXCHANGE PARTITION operation. This is implemented in
ha_s3::rename_table() by additional checking of
current_thd->lex->alter_info.partition_flags(see also ALTER_PARTITION_EXCHANGE
in sql_yacc.yy).
A read-only storage engine that stores it's data in (aws) S3
To store data in S3 one could use ALTER TABLE:
ALTER TABLE table_name ENGINE=S3
libmarias3 integration done by Sergei Golubchik
libmarias3 created by Andrew Hutchings