DROP USER looks for sessions by the do-be-dropped user and if found:
* fails with ER_CANNOT_USER in Oracle mode
* continues with ER_ACTIVE_CONNECTIONS_FOR_USER_TO_DROP warning otherwise
Every user being dropped is marked with flag that disallow establishing
a new connections on behalf this user.
let's always disconnect a user connection before dropping the said user.
MariaDB is traditionally very tolerant to active connections
of the dropped user, which isn't the case for most other databases.
Let's avoid unintentionally spreading incompatible behavior
and disconnect before drop.
Except in cases when the test specifically tests such a behavior.
Remove one of the major sources of race condiitons in mariadb-test.
Normally, mariadb_close() sends COM_QUIT to the server and immediately
disconnects. In mariadb-test it means the test can switch to another
connection and sends queries to the server before the server even
started parsing the COM_QUIT packet and these queries can see the
connection as fully active, as it didn't reach dispatch_command yet.
This is a major source of instability in tests and many - but not all,
still less than a half - tests employ workarounds. The correct one
is a pair count_sessions.inc/wait_until_count_sessions.inc.
Also very popular was wait_until_disconnected.inc, which was completely
useless, because it verifies that the connection is closed, and after
disconnect it always is, it didn't verify whether the server processed
COM_QUIT. Sadly the placebo was as widely used as the real thing.
Let's fix this by making mariadb-test `disconnect` command _to wait_ for
the server to confirm. This makes almost all workarounds redundant.
In some cases count_sessions.inc/wait_until_count_sessions.inc is still
needed, though, as only `disconnect` command is changed:
* after external tools, like `exec $MYSQL`
* after failed `connect` command
* replication, after `STOP SLAVE`
* Federated/CONNECT/SPIDER/etc after `DROP TABLE`
and also in some XA tests, because an XA transaction is dissociated from
the THD very late, after the server has closed the client connection.
Collateral cleanups: fix comments, remove some redundant statements:
* DROP IF EXISTS if nothing is known to exist
* DROP table/view before DROP DATABASE
* REVOKE privileges before DROP USER
etc
Before MySQL 4.0.18, user-specified constraint names were ignored.
Starting with MySQL 4.0.18, the specified constraint name was
prepended with the schema name and '/'. Now we are transforming
into a format where the constraint name is prepended with the
dict_table_t::name and the impossible UTF-8 sequence 0xff.
Generated constraint names will be ASCII decimal numbers.
On upgrade, old FOREIGN KEY constraint names will be displayed
without any schema name prefix. They will be updated to the new
format on DDL operations.
dict_foreign_t::sql_id(): Return the SQL constraint name
without any schemaname/tablename\377 or schemaname/ prefix.
row_rename_table_for_mysql(), dict_table_rename_in_cache():
Simplify the logic: Just rename constraints to the new format.
dict_table_get_foreign_id(): Replaces dict_table_get_highest_foreign_id().
innobase_get_foreign_key_info(): Let my_error() refer to erroneous
anonymous constraints as "(null)".
row_delete_constraint(): Try to drop all 3 constraint name variants.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
The main purpose of this allow one to use the --read-only
option to ensure that no one can issue a query that can
block replication.
The --read-only option can now take 4 different values:
0 No read only (as before).
1 Blocks changes for users without the 'READ ONLY ADMIN'
privilege (as before).
2 Blocks in addition LOCK TABLES and SELECT IN SHARE MODE
for not 'READ ONLY ADMIN' users.
3 Blocks in addition 'READ_ONLY_ADMIN' users for all the
previous statements.
read_only is changed to an enum and one can use the following
names for the lock levels:
OFF, ON, NO_LOCK, NO_LOCK_NO_ADMIN
Too keep things compatible with older versions config files, one can
still use values FALSE and TRUE, which are mapped to OFF and ON.
The main visible changes are:
- 'show variables like "read_only"' now returns a string
instead of a number.
- Error messages related to read_only violations now contains
the current value off readonly.
Other things:
- is_read_only_ctx() renamed to check_read_only_with_error()
- Moved TL_READ_SKIP_LOCKED to it's logical place
Reviewed by: Sergei Golubchik <serg@mariadb.org>
MDEV-36563 Assertion `!mysql_bin_log.is_open()' failed in
THD::mark_tmp_table_as_free_for_reuse
The purpose of this commit is to ensure that creation and changes of
temporary tables are properly and predicable logged to the binary
log. It also fixes some bugs where ROW logging was used in MIXED mode,
when STATEMENT would be a better (and expected) choice.
In this comment STATEMENT stands for logging to binary log in
STATEMENT format, MIXED stands for MIXED binlog format and ROW for ROW
binlog format.
New rules for logging of temporary tables
- CREATE of temporary tables are now by default binlogged only if
STATEMENT binlog format is used. If it is binlogged, 1 is stored in
TABLE_SHARE->table_creation_was_logged. The user can change this
behavior by setting create_temporary_table_binlog_formats to
MIXED,STATEMENT in which case the create is logged in statement
format also in MIXED mode (as before).
- Changes to temporary tables are only binlogged if and only if
the CREATE was logged. The logging happens under STATEMENT or MIXED.
If binlog_format=ROW, temporary table changes are not binlogged. A
temporary table that are changed under ROW are marked as 'not up to
date in binlog' and no future row changes are logged. Any usage of
this temporary table will force row logging of other tables in any
future statements using the temporary table to be row logged.
- DROP TEMPORARY is binlogged only of the CREATE was binlogged.
Changes done:
- Row logging is forced for any statement using temporary tables that
are not up to date in the binary log.
(Before the row logging was forced if the user has a temporary table)
- If there is any changes to the temporary table that is not binlogged,
the table is marked as not up to date.
- TABLE_SHARE->table_creation_was_logged has a new definition for
temporary tables:
0 Table creating was not logged to binary log
1 Table creating was logged to binary log and table is up to date.
2 Table creating was logged to binary log but some changes where
not logged to binary log.
Table is not up to date in binary log is defined as value 0 or 2.
- If a multi-table-update or multi-table-delete fails then
all updated temporary tables are marked as not up to date.
- Enforce row logging if the query is using temporary tables
that are not up to date.
Before row logging was enforced if the user had any
temporary tables.
- When dropping temporary tables use IF EXISTS. This ensures
that slave will not stop if it had crashed and lost the
temporary tables.
- Remove comment and version from DROP /*!4000 TEMPORARY.. generated when
a connection closes that has open temporary tables. Added 'generated by
server' at the end of the DROP.
Bugs fixed:
- When using temporary tables with commands that forced row based,
like INSERT INTO temporary_table VALUES (UUID()), this was never
logged which causes the temporary table to be inconsistent on
master and slave.
- Used binlog format is now clearly defined. It is now only depending
on the current binlog_format and the tables used.
Before it was depending on the user had ANY temporary tables and
the state of 'current_stmt_binlog_format' set by previous queries.
This also caused temporary tables to be logged to binary log in
some cases.
- CREATE TABLE t1 LIKE not_logged_temporary_table caused replication
to stop.
- Rename of not binlogged temporary tables where binlogged to binary log
which caused replication to stop.
Changes in behavior:
- By default create_temporary_table_binlog_formats=STATEMENT, which
means that CREATE TEMPORARY is not logged to binary log under MIXED
binary logging. This can be changed by setting
create_temporary_table_binlog_formats to MIXED,STATEMENT.
- Using temporary tables that was not logged to the binary log will
cause any query using them for updating other tables to be logged in
ROW format. Before all queries was logged in ROW format if the user had
any temporary tables, even if they were not used by the query.
- Generated DROP TEMPORARY TABLE is now always using IF EXISTS and
has a "generated by server" comment in the binary log.
The consequences of the above is that manipulations of a lot of rows
through temporary tables will by default be be slower in mixed mode.
For example:
BEGIN;
CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM
large_table1 JOIN large_table2 ON ...;
INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
DROP TEMPORARY TABLE tmp;
COMMIT;
By default this will create a huge entry in the binary log, compared
to just a few hundred bytes in statement mode. However the change in
this commit will make usage of temporary tables more reliable and
predicable and is thus worth it. Using statement mode or
create_temporary_table_binlog_formats can be used to avoid this issue.
MDEV-27037 added functionality to warn users that a specified
stop-position or stop-datetime was never reached. It only worked for
local files though. The patch in MDEV-35528 changed the
implementation for stop-datetime to work to provide the warning also
when using --read-from-remote-server. The PR for that MDEV (#3670)
was limited to only the stop-datetime field.
This patch updates the --stop-position warning to also work with
--read-from-remote-server.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Jimmy Hu <jimmy.hu@mariadb.com>
Currently execution of commit in one phase proceeds to commit by
engines when binlog_commit() does not succeed.
There are two issues with that:
1. absence of binlog_rollback() or lower-level
`binlog_cache_data::reset()` along the following execution of the
failing statement eventually will raise an assert on non-empty binlog
cache, find in the MDEV description
# --error assert(sql/log.cc:1712(binlog_close_connection))
# --disconnect default
2. engines, including ones that are rollback capable, commit in this
particular error situation.
Both effects can be observed with a new mtr test that would fail when run on
a BASE of this commit.
The BASE has to include MDEV-35207 et all fixes because the test is written
with CREATE-TABLE-SELECTs.
A new test file verifies the new behaviour to rollback including
cases with a side effect of modified non-transactional engine which
expose another MDEV-36027 (TODO: fix).
* automatically disable ps2 and cursor protocol when the
select statement returns no result set
* remove manual {disable|enable}_{ps2|cursor}_protocol from around
`select ... into` in tests
* other misc collateral test cleanups
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
We now allow multitable queries with order by and limit, such as:
delete t1.*, t2.* from t1, t2 order by t1.id desc limit 3;
To predict what rows will be deleted, run the equivalent select:
select t1.*, t2.* from t1, t2 order by t1.id desc limit 3;
Additionally, index hints are now supported with single table delete statements:
delete from t2 use index(xid) order by (id) limit 2;
This approach changes the multi_delete SELECT result interceptor to use a temporary
table to collect row ids pertaining to the rows that will be deleted, rather than
directly deleting rows from the target table(s). Row ids are collected during
send_data, then read during send_eof to delete target rows. In the event that the
temporary table created in memory is not big enough for all matching rows, it is
converted to an aria table.
Other changes:
- Deleting from a sequence now affects zero rows instead of emitting an error
Limitations:
- The federated connector does not create implicit row ids, so we to use a key
when conditionally deleting. See the change in federated_maybe_16324629.test
InnoDB is limited to row-logging when transaction isolation level is
READ COMMITTED or READ UNCOMMITTED. This limitation is enforced by
a check within ha_innobase::external_lock().
InnoDB also expects number of "successful external locks" to match
number of "external unlocks", which is controlled by the assertion
in subject.
However "unlock" was called even after failing "lock" for high-level
indexes.
Fixed by calling "unlock" only after "successful lock".
mysql_prepare_create_table: Extract a Key initialization part that
relates to length calculation and long unique index designation.
append_system_key_parts call also moves there.
Move this initialization before the duplicate elimination.
Extract WITHOUT OVERPLAPS check into a separate function. It had to be moved
earlier in the code to preserve the order of the error checks, as in the tests.
Fix regression introduced by commits 9588526 which attempted to address
MDEV-27037. With the regression, mariadb-binlog cannot process multiple
log files when --stop-datetime is specified.
The change is to keep recording timestamp of last processed event, and
after all log files are processed, if the last recorded timestamp has not
reached specified --stop-datetime, it will emit a warning. This applies
when processing local log files, or log files from remote servers.
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.
Co-authored-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
MDEV-35350 consolidated two methods that MTR tests
would wait until a file had certain content
written to it, which were only available in 10.6+.
This patch only backports the functionality to
10.5 in case some test wants to use it (nothing
uses it in 10.5 at present).
The cleanup bc46f1a7d9 from 10.6 is also
backported so SEARCH_TYPE doesn't need to be
accounted for in the new search_pattern_in_file.inc
logic.
Changing the return type of the following functions:
- CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), NOW()
- SYSDATE()
- FROM_UNIXTIME()
from DATETIME to TIMESTAMP.
Note, the old function NOW() returning DATETIME is still available
as LOCALTIMESTAMP or LOCALTIMESTAMP(), e.g.:
SELECT
LOCALTIMESTAMP, -- DATETIME
CURRENT_TIMESTAMP; -- TIMESTAMP
The change in the functions return data type fixes some problems
that occurred near a DST change:
- Problem #1
INSERT INTO t1 (timestamp_field) VALUES (CURRENT_TIMESTAMP);
INSERT INTO t1 (timestamp_field) VALUES (COALESCE(CURRENT_TIMESTAMP));
could result into two different values inserted.
- Problem #2
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526));
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526+3600));
could result into two equal TIMESTAMP values near a DST change.
Additional changes:
- FROM_UNIXTIME(0) now returns SQL NULL instead of '1970-01-01 00:00:00'
(assuming time_zone='+00:00')
- UNIX_TIMESTAMP('1970-01-01 00:00:00') now returns SQL NULL instead of 0
(assuming time_zone='+00:00'
These additional changes are needed for consistency with TIMESTAMP fields,
which cannot store '1970-01-01 00:00:00 +00:00'
for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
RESET MASTER waits for storage engines to reply to a binlog checkpoint
requests. If this response is delayed for a long time for some reason, then
RESET MASTER can hang.
Fix this by forcing a log sync in all engines just before waiting for the
checkpoint reply.
(Waiting for old checkpoint responses is needed to preserve durability of
any commits that were synced to disk in the to-be-deleted binlog but not yet
synced in the engine.)
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"
Fix for v.10.5
crash recovery
Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.
This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.
Design
======
- Handler::recover_rollback_by_xid()
This patch provides a new handler interface to rollback transactions
in recover phase. InnoDB just set the transaction's state to active.
Then the transaction will be rolled back by the background rollback
thread.
- Handler::signal_tc_log_recover_done()
This function is called after tc log is opened(typically binlog opened)
has done. When this function is called, all transactions will be rolled
back have been reverted to ACTIVE state. Thus it starts rollback thread
to rollback the transactions.
- Background rollback thread
With this patch, background rollback thread is defered to run until binlog
recovery is finished. It is started by innobase_tc_log_recovery_done().