handler::clone() call did not work with read only tables like S3.
It gave a wrong error message (out of memory instead of a permission
error) and aborted the query.
The issue was that the clone call had a wrong parameter to ha_open().
This now fixed. I also changed the clone call to provide the correct
error message if things fails.
This patch fixes an 'out of memory' error when using the S3 engine
for queries that could use multiple indexes together to find the matching
rows, like the following:
SELECT * FROM t1 WHERE key1 = 99 OR key2 = 2
This commit fixes a bug where Aria tables are used in
(master->slave1->slave2) and a backup is taken on slave2. In this case
it is possible that the replication position in the backup, stored in
mysql.gtid_slave_pos, will be wrong. This will lead to replication
errors if one is trying to use the backup as a new slave.
Analyze:
Replicated row events are committed with trans_commit_stmt() and
thd->transaction->all.ha_list != 0.
This means that backup_commit_lock is not taken for Aria tables,
which means the rows are committed and binary logged on the slave
under BLOCK_COMMIT which should not happen.
This issue does not occur on the master as thd->transaction->all.ha_list
is == 0 under AUTO_COMMIT, which sets 'is_real_trans' and 'rw_trans'
which in turn causes backup_commit_lock to be taken.
Fixed by checking in ha_check_and_coalesce_trx_read_only() if all handlers
supports rollback and if not, then wait for BLOCK_COMMIT also for
statement commit.
The main purpose of this allow one to use the --read-only
option to ensure that no one can issue a query that can
block replication.
The --read-only option can now take 4 different values:
0 No read only (as before).
1 Blocks changes for users without the 'READ ONLY ADMIN'
privilege (as before).
2 Blocks in addition LOCK TABLES and SELECT IN SHARE MODE
for not 'READ ONLY ADMIN' users.
3 Blocks in addition 'READ_ONLY_ADMIN' users for all the
previous statements.
read_only is changed to an enum and one can use the following
names for the lock levels:
OFF, ON, NO_LOCK, NO_LOCK_NO_ADMIN
Too keep things compatible with older versions config files, one can
still use values FALSE and TRUE, which are mapped to OFF and ON.
The main visible changes are:
- 'show variables like "read_only"' now returns a string
instead of a number.
- Error messages related to read_only violations now contains
the current value off readonly.
Other things:
- is_read_only_ctx() renamed to check_read_only_with_error()
- Moved TL_READ_SKIP_LOCKED to it's logical place
Reviewed by: Sergei Golubchik <serg@mariadb.org>
There is a need in MDEV-25292 to have both C_ALTER_TABLE and
select_field_count in one call. Semantically creation mode and field
count are two different things. Making creation mode negative
constants and field count positive variable into one parameter seems
to be a lazy hack for not making the second parameter.
select_count does not make sense without alter_info->create_list, so
the natural way is to hold it in Alter_info too. select_count is now
stored in member select_field_count.
Merged and updated by: Monty
To check the rows, the table needs to be opened. To that end, and like
MDEV-36038, we force COPY algorithm on ALTER TABLE ... SEQUENCE=1.
This also results in checking the sequence state / metadata.
The table structure was already validated before this patch.
7544fd4cae had to make use of a static array to avoid memory
use-after-free or leak.
Instead, let us make a function returning String, this is the only way
to automatically manage the memory after the function returned.
To make it all correct, move constructor is added. Normally, it is
expected, that the constructor will be elided upon return of an object
by value, but if something goes different, or -fno-elide-constructors is
used, we can have a problem. So this was a move constructor avoids
copy elision-related UB.
dbug_print_row returning char* is still there for convenient use in a
debugger.
Currently execution of commit in one phase proceeds to commit by
engines when binlog_commit() does not succeed.
There are two issues with that:
1. absence of binlog_rollback() or lower-level
`binlog_cache_data::reset()` along the following execution of the
failing statement eventually will raise an assert on non-empty binlog
cache, find in the MDEV description
# --error assert(sql/log.cc:1712(binlog_close_connection))
# --disconnect default
2. engines, including ones that are rollback capable, commit in this
particular error situation.
Both effects can be observed with a new mtr test that would fail when run on
a BASE of this commit.
The BASE has to include MDEV-35207 et all fixes because the test is written
with CREATE-TABLE-SELECTs.
A new test file verifies the new behaviour to rollback including
cases with a side effect of modified non-transactional engine which
expose another MDEV-36027 (TODO: fix).
Allows index condition pushdown for reverse ordered scans, a previously
disabled feature due to poor performance. This patch adds a new
API to the handler class called set_end_range which allows callers to
tell the handler what the end of the index range will be when scanning.
Combined with a pushed index condition, the handler can scan the index
efficiently and not read beyond the end of the given range. When
checking if the pushed index condition matches, the handler will also
check if scanning has reached the end of the provided range and stop if
so.
If we instead only enabled ICP for reverse ordered scans without
also calling this new API, then the handler would perform unnecessary
index condition checks. In fact this would continue until the end of
the index is reached.
These changes are agnostic of storage engine. That is, any storage
engine that supports index condition pushdown will inhereit this new
behavior as it is implemented in the SQL and storage engine
API layers.
The partitioned tables storage meta-engine (ha_partition) adds an
override of set_end_range which recursively calls set_end_range on its
child storage engine (handler) implementations.
This commit updates the test made in an earlier commit to show that
ICP matches happen for the reverse ordered case.
This patch is based on changes written by Olav Sandstaa in
MySQL commit da1d92fd46071cd86de61058b6ea39fd9affcd87
The problem with MariaDB waiting was fixed earlier.
However the server still gives the old error,in case of disk full,
that includes "waiting for someone to free some space" even if
there is now wait.
This commit changes the error message for the non waiting case to:
Disk got full writing 'db.table' (Errcode: 28 "No space left on device")
Disk got full writing 'test.t1' (Errcode: 28 "No space left on device")Disk got full writing 'test.t1' (Errcode: 28 "No space left on device")Disk got full writing 'test.t1' (Errcode: 28 "No space left on device")
Problem was that initial GTID was set on wsrep_before_prepare
out-of-order. In practice GTID was set to same as previous
executed transaction GTID. In recovery valid GTID was found
from prepared transaction and this transaction is committed
leading to fact that same GTID was executed twice.
This is fixed by setting invalid GTID at wsrep_before_prepare
and later in wsrep_before_commit actual correct GTID is set
and this setting is done while we are in commit monitor i.e.
assigment is done in order of replication.
In recovery if prepared transaction is found we check its
GTID, if it is invalid transaction will be rolled back
and if it is valid it will be committed.
Initialize gtid seqno from recovered seqno when
bootstrapping a new cluster.
Added two test cases for both mariabackup and rsync SST methods
to show that GTIDs remain consistent on cluster and that
all expected rows are in the table.
Added tests for wsrep GTID recovery with binlog on and off.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Propagate discard/import tablespace request to hlindexes.
Let FLUSH TABLES ... FOR EXPORT open/lock hlindexes, so that InnoDB
prepares hlindexes for export.
Moved reset_hlindexes() to external_lock(F_UNLCK), so that hlindexes
are available for export until UNLOCK TABLES.
Closes#3631
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
Wsrep_commit_empty happens too early when wsrep is disabled. Let the
cleanup happen at end of statement.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that in case of INSERT DELAYED thd->query() is
freed before we call trans_rollback where WSREP_DEBUG
could access thd->query() in wsrep_thd_query().
Fix is to reset thd->query() to NULL in delayed_insert
destructor after it is freed. There is already
null guard at wsrep_thd_query().
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>