includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
revno: 4559
committer: Marc Alff <marc.alff@oracle.com>
branch nick: mysql-5.6-bug14741537-v4
timestamp: Thu 2012-11-08 22:40:31 +0100
message:
Bug#14741537 - MYSQL 5.6, GTID AND PERFORMANCE_SCHEMA
Before this fix, statements using performance_schema tables:
- were marked as unsafe for replication,
- did cause warnings during execution,
- were written to the binlog, either in STATEMENT or ROW format.
When using replication with the new GTID feature,
unsafe warnings are elevated to errors,
which prevents to use both the performance_schema and GTID together.
The root cause of the problem is not related to raising warnings/errors
in some special cases, but deeper: statements involving the performance
schema should not even be written to the binary log in the first place,
because the content of the performance schema tables is 'local' to a server
instance, and may differ greatly between nodes in a replication
topology.
In particular, the DBA should be able to configure (INSERT, UPDATE, DELETE)
or flush (TRUNCATE) performance schema tables on one node,
without affecting other nodes.
This fix introduces the concept of a 'non-replicated' or 'local' table,
and adjusts the replication logic to ignore tables that are not replicated
when deciding if or how to log a statement to the binlog.
Note that while this issue was detected using the performance_schema,
other tables are also affected by the same problem.
This fix define as 'local' the following tables, which are then never
replicated:
- performance_schema.*
- mysql.general_log
- mysql.slow_log
- mysql.slave_relay_log_info
- mysql.slave_master_info
- mysql.slave_worker_info
Existing behavior for information_schema.* is unchanged by this fix,
to limit the scope of changes.
Coding wise, this fix implements the following changes:
1)
Performance schema tables are not using any replication flags,
since performance schema tables are not replicated.
2)
In open_table_from_share(),
tables with no replication capabilities (performance_schema.*),
tables with TABLE_CATEGORY_LOG (logs)
and tables with TABLE_CATEGORY_RPL_INFO (replication)
are marked as non replicated, with TABLE::no_replicate
3)
A new THD member, THD::m_binlog_filter_state,
indicate if the current statement is written to the binlog
(normal cases for most statements), or is to be discarded
(because the statements affects non replicated tables).
4)
In THD::decide_logging_format(), the replication logic
is changed to take into account non replicated tables.
Statements that affect only non replicated tables are
executed normally (no warning or errors), but not written
to the binlog.
Statements that affect (i.e., write to) a replicated table
while also using (i.e., reading from or writing to) a non replicated table
are executed normally in MIXED and ROW binlog format,
and cause a new error in STATEMENT binlog format.
THD::decide_logging_format() uses THD::m_binlog_filter_state
to indicate if a statement is to be ignored, when writing to
the binlog.
5)
In THD::binlog_query(), statements marked as ignored
are not written to the binary log.
6)
For row based replication, the existing test for 'table->no_replicate',
has been moved from binlog_log_row() to check_table_binlog_row_based().
-Note 1031 Table storage engine for 't1' doesn't have this option
+Note 1031 Table storage engine for 'InnoDB' doesn't have this option
They were caused by a change in MariaDB which changed ER_ILLEGAL_HA message
text to be like:
"Storage engine InnoDB of the table `test`.`t1` doesn't have this option"
Some the error calls were changed to pass new parameters, but some were left
to be old. Also the error text in errmsg-ut8.txt was not changed.
Implement facility for the commit in one thread to wait for the commit of
another to complete first. The wait is done in a way that does not hinder
that a waiter and a waitee can group commit together with a single fsync()
in both binlog and InnoDB. The wait is done efficiently with respect to
locking.
The patch was originally made to support TaoBao parallel replication with
in-order commit; now it will be adapted to also be used for parallel
replication of group-committed transactions.
A waiter THD registers itself with a prior waitee THD. The waiter will then
complete its commit at the earliest in the same group commit of the waitee
(when using binlog). The wait can also be done explicitly by the waitee.
1. DROP DATABASE should use ha_discover_table_names(), not look at .frm files.
2. filename_to_tablename() also encodes temp file names #sql- -> #mysql50##sql
3. no special treatment for #sql- files, no TABLE_LIST::internal_tmp_table
4. discover also table file names, that start from #
-Added test and extra code to ensure we don't leave keyread on for a handler table.
-Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
mysql-test/include/default_mysqld.cnf:
Run test suite with smaller aria keybuffer size
mysql-test/suite/maria/maria3.result:
Run test suite with smaller aria keybuffer size
mysql-test/suite/sys_vars/r/aria_pagecache_buffer_size_basic.result:
Run test suite with smaller aria keybuffer size
sql/handler.cc:
Disable key read (extra safety if something went wrong)
sql/multi_range_read.cc:
Ensure we have don't leave keyread on for secondary_file
sql/opt_range.cc:
Simplify code with mark_columns_used_by_index_no_reset()
Ensure that read_keys_and_merge() disableds keyread if it enables it
sql/opt_subselect.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_derived.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_select.cc:
Use 'enable_keyread()' instead of calling HA_EXTRA_RESET. (Makes debugging easier)
Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
Remove not anymore used argument for create_internal_tmp_table()
More DBUG
sql/sql_select.h:
Remove not anymore used argument for create_internal_tmp_table()
* print "table doesn't exist in engine" when a table doesn't exist in the engine,
instead of "file not found" (if no file was involved)
* print a complete filename that cannot be found ('t1.MYI', not 't1')
* it's not an error for a DROP if a table doesn't exist in the engine (or some table
files cannot be found) - if the DROP succeeded regardless
DOWNGRADED FROM 5.6.11 TO 5.6.10
Problem was new syntax not accepted by previous version.
Fixed by adding version comment of /*!50531 around the
new syntax.
Like this in the .frm file:
'PARTITION BY KEY /*!50611 ALGORITHM = 2 */ () PARTITIONS 3'
and also changing the output from SHOW CREATE TABLE to:
CREATE TABLE t1 (a INT)
/*!50100 PARTITION BY KEY */ /*!50611 ALGORITHM = 1 */ /*!50100 ()
PARTITIONS 3 */
It will always add the ALGORITHM into the .frm for KEY [sub]partitioned
tables, but for SHOW CREATE TABLE it will only add it in case it is the non
default ALGORITHM = 1.
Also notice that for 5.5, it will say /*!50531 instead of /*!50611, which
will make upgrade from 5.5 > 5.5.31 to 5.6 < 5.6.11 fail!
If one downgrades an fixed version to the same major version (5.5 or 5.6) the
bug 14521864 will be visible again, but unless the .frm is updated, it will
work again when upgrading again.
Also fixed so that the .frm does not get updated version
if a single partition check passes.
Due to an internal change in the server code in between 5.1 and 5.5
(wl#2649) the hash function used in KEY partitioning changed
for numeric and date/time columns (from binary hash calculation
to character based hash calculation).
Also enum/set changed from latin1 ci based hash calculation to
binary hash between 5.1 and 5.5. (bug#11759782).
These changes makes KEY [sub]partitioned tables on any of
the affected column types incompatible with 5.5 and above,
since the calculation of partition id differs.
Also since InnoDB asserts that a deleted row was previously
read (positioned), the server asserts on delete of a row that
is in the wrong partition.
The solution for this situation is:
1) The partitioning engine will check that delete/update will go to the
partition the row was read from and give an error otherwise, consisting
of the rows partitioning fields. This will avoid asserts in InnoDB and
also alert the user that there is a misplaced row. A detailed error
message will be given, including an entry to the error log consisting
of both table name, partition and row content (PK if exists, otherwise
all partitioning columns).
2) A new optional syntax for KEY () partitioning in 5.5 is allowed:
[SUB]PARTITION BY KEY [ALGORITHM = N] (list_of_cols)
Where N = 1 uses the same hashing as 5.1 (Numeric/date/time fields uses
binary hashing, ENUM/SET uses charset hashing) N = 2 uses the same
hashing as 5.5 (Numeric/date/time fields uses charset hashing,
ENUM/SET uses binary hashing). If not set on CREATE/ALTER it will
default to 2.
This new syntax should probably be ignored by NDB.
3) Since there is a demand for avoiding scanning through the full
table, during upgrade the ALTER TABLE t PARTITION BY ... command is
considered a no-op (only .frm change) if everything except ALGORITHM
is the same and ALGORITHM was not set before, which allows manually
upgrading such table by something like:
ALTER TABLE t PARTITION BY KEY ALGORITHM = 1 () or
ALTER TABLE t PARTITION BY KEY ALGORITHM = 2 ()
4) Enhanced partitioning with CHECK/REPAIR to also check for/repair
misplaced rows. (Also works for ALTER TABLE t CHECK/REPAIR PARTITION)
CHECK FOR UPGRADE:
If the .frm version is < 5.5.3
and uses KEY [sub]partitioning
and an affected column type
then it will fail with an message:
KEY () partitioning changed, please run:
ALTER TABLE `test`.`t1` PARTITION BY KEY ALGORITHM = 1 (a)
PARTITIONS 12
(i.e. current partitioning clause, with the addition of
ALGORITHM = 1)
CHECK without FOR UPGRADE:
if MEDIUM (default) or EXTENDED options are given:
Scan all rows and verify that it is in the correct partition.
Fail for the first misplaced row.
REPAIR:
if default or EXTENDED (i.e. not QUICK/USE_FRM):
Scan all rows and every misplaced row is moved into its correct
partitions.
5) Updated mysqlcheck (called by mysql_upgrade) to handle the
new output from CHECK FOR UPGRADE, to run the ALTER statement
instead of running REPAIR.
This will allow mysql_upgrade (or CHECK TABLE t FOR UPGRADE) to upgrade
a KEY [sub]partitioned table that has any affected field type
and a .frm version < 5.5.3 to ALGORITHM = 1 without rebuild.
Also notice that if the .frm has a version of >= 5.5.3 and ALGORITHM
is not set, it is not possible to know if it consists of rows from
5.1 or 5.5! In these cases I suggest that the user does:
(optional)
LOCK TABLE t WRITE;
SHOW CREATE TABLE t;
(verify that it has no ALGORITHM = N, and to be safe, I would suggest
backing up the .frm file, to be used if one need to change to another
ALGORITHM = N, without needing to rebuild/repair)
ALTER TABLE t <old partitioning clause, but with ALGORITHM = N>;
which should set the ALGORITHM to N (if the table has rows from
5.1 I would suggest N = 1, otherwise N = 2)
CHECK TABLE t;
(here one could use the backed up .frm instead and change to a new N
and run CHECK again and see if it passes)
and if there are misplaced rows:
REPAIR TABLE t;
(optional)
UNLOCK TABLES;