1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-10 23:02:54 +03:00
Commit Graph

2335 Commits

Author SHA1 Message Date
Igor Babaev
fd386e39cd MDEV-18689 Simple query with extra brackets stopped working
Parenthesis around table names and derived tables should be allowed
in FROM clauses and some other context as it was in earlier versions.

Returned test queries that used such parenthesis in 10.3 to their
original form. Adjusted test results accordingly.
2019-05-06 11:14:39 -07:00
Vladislav Vaintroub
60bd353bdf Fixes for atomic writes on Windows.
Windows does atomic writes, as long as they are aligned and multiple
of sector size. this is documented in MSDN.

Fix innodb.doublewrite test to always use doublewrite buffer,
(even if atomic writes are autodetected)
2019-05-06 11:32:17 +00:00
Marko Mäkelä
d3dcec5d65 Merge 10.3 into 10.4 2019-05-05 15:06:44 +03:00
Marko Mäkelä
447b8ba164 Merge 10.2 into 10.3 2019-04-29 17:54:10 +03:00
Marko Mäkelä
bdd6e33f00 MDEV-13626: Add a test case 2019-04-29 15:11:06 +03:00
Marko Mäkelä
e6bdf77e4b Merge 10.3 into 10.4
In is_eits_usable(), we disable an assertion that fails due to
MDEV-19334.
2019-04-25 16:05:20 +03:00
Marko Mäkelä
acf6f92aa9 Merge 10.2 into 10.3 2019-04-25 09:05:52 +03:00
Marko Mäkelä
bc145193c1 Merge 10.1 into 10.2 2019-04-25 09:04:09 +03:00
Marko Mäkelä
bfb0726fc2 Merge 5.5 into 10.1 2019-04-24 12:03:11 +03:00
Thirunarayanan Balathandayuthapani
d5da8ae04d MDEV-15772 Potential list overrun during XA recovery
InnoDB could return the same list again and again if the buffer
passed to trx_recover_for_mysql() is smaller than the number of
transactions that InnoDB recovered in XA PREPARE state.

We introduce the transaction state TRX_PREPARED_RECOVERED, which
is like TRX_PREPARED, but will be set during trx_recover_for_mysql()
so that each transaction will only be returned once.

Because init_server_components() is invoking ha_recover() twice,
we must reset the state of the transactions back to TRX_PREPARED
after returning the complete list, so that repeated traversals
will see the complete list again, instead of seeing an empty list.
Without this tweak, the test main.tc_heuristic_recover would hang
in MariaDB 10.1.
2019-04-24 11:46:14 +03:00
Marko Mäkelä
e5aa8ea525 MDEV-18139 ALTER IGNORE ... ADD FOREIGN KEY causes bogus error
dict_create_foreign_constraints_low(): Tolerate the keywords
IGNORE and ONLINE between the keywords ALTER and TABLE.

We should really remove the hacky FOREIGN KEY constraint parser
from InnoDB.
2019-04-23 17:56:43 +03:00
Marko Mäkelä
7b216ceb90 Avoid DROP DATABASE test
DROP DATABASE would internally execute DROP TABLE on every contained
table and finally remove the directory. In InnoDB, DROP TABLE is
sometimes executed in the background. The table would be renamed to
a name that starts with #sql. The existence of these files would
prevent DROP DATABASE from succeeding.

CREATE OR REPLACE DATABASE can internally execute DROP DATABASE if
the directory already exists. This could fail due to the InnoDB
background DROP TABLE, possibly due to some tables that were
leftovers from earlier tests.
2019-04-18 14:28:39 +03:00
Marko Mäkelä
e7029e864f Merge 10.3 into 10.4 2019-04-17 15:59:30 +03:00
Marko Mäkelä
250799f961 Merge 10.2 into 10.3 2019-04-17 15:26:17 +03:00
Marko Mäkelä
169c00994b MDEV-12699 Improve crash recovery of corrupted data pages
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.

To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.

Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).

The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.

For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.

This is joint work with Thirunarayanan Balathandayuthapani.

fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.

mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.

recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.

recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.

recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.

recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.

recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.

buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.

recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.

ibuf_merge_or_delete_for_page(): Relax a debug assertion.

innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
2019-04-17 13:58:41 +03:00
Marko Mäkelä
376bf4ede5 MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.

row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
2019-04-17 13:58:22 +03:00
Marko Mäkelä
d9d79e4d01 MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large
With the MDEV-15562 instant DROP COLUMN, clustered index records
will contain traces of dropped columns, as follows:

In ROW_FORMAT=REDUNDANT, dropped columns will be stored as 0 bytes,
but they will consume 1 or 2 bytes per column in the record header.

In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, dropped columns will
be stored as NULL if allowed. This will consume 1 bit per nullable
column.

In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, dropped NOT NULL columns
will be stored as 0 bytes if allowed. This will consume 1 byte per
NOT NULL variable-length column. Fixed-length columns will be stored
using the fixed number of bytes.

The metadata record will be 20 bytes larger than user records, because
it will contain a metadata BLOB pointer.

We must refuse ALGORITHM=INSTANT (and require a table rebuild) if
the metadata record would grow too big to fit in the index page.

If SQL_MODE includes STRICT_TRANS_TABLES or STRICT_ALL_TABLES, we should
refuse ALGORITHM=INSTANT if the maximum length of user records would
exceed the maximum size of an index page, similar to what
row_create_index_for_mysql() does during CREATE TABLE. This limit
would kick in when the default values for any instantly added columns
in the metadata record are NULL or short, but the allowed maximum values
are long.

instant_alter_column_possible(): Add the parameter "bool strict" to
enable checks for the user record size, and always check the metadata
record size.
2019-04-16 17:21:17 +03:00
Marko Mäkelä
7896503686 Merge 10.3 into 10.4 2019-04-12 12:45:06 +03:00
Eugene Kosov
4dc10ec68d MDEV-19236 Improve error message for ER_ALTER_OPERATION_NOT_SUPPORTED_REASON_COLUMN_TYPE
remove a sometimes misleading word INPLACE from error message
2019-04-12 12:28:09 +03:00
Marko Mäkelä
edd1a53a55 Merge 10.3 into 10.4 2019-04-08 22:00:07 +03:00
Marko Mäkelä
9ba0865b87 Merge 10.2 into 10.3 2019-04-08 21:38:13 +03:00
Marko Mäkelä
7362f11554 Require --big-test for innodb.undo_truncate_recover 2019-04-08 21:33:49 +03:00
Marko Mäkelä
d8303c3ee7 Merge 10.3 into 10.4 2019-04-08 08:22:34 +03:00
Marko Mäkelä
cc492bfd4f Merge 10.2 into 10.3 2019-04-07 11:49:50 +03:00
Marko Mäkelä
867617a976 MDEV-18309: InnoDB reports bogus errors about missing #sql-*.ibd on startup
This is a follow-up to MDEV-18733. As part of that fix, we made
dict_check_sys_tables() skip tables that would be dropped by
row_mysql_drop_garbage_tables().

DICT_ERR_IGNORE_DROP: A new mode where the file should not be attempted
to be opened.

dict_load_tablespace(): Do not try to load the tablespace if
DICT_ERR_IGNORE_DROP has been specified.

row_mysql_drop_garbage_tables(): Pass the DICT_ERR_IGNORE_DROP mode.

fil_space_for_table_exists_in_mem(): Remove a parameter.
The only caller that passed print_error_if_does_not_exist=true
was row_drop_single_table_tablespace().
2019-04-07 10:57:38 +03:00
Marko Mäkelä
02d9b048a2 Merge 10.3 into 10.4 2019-04-05 11:41:03 +03:00
Marko Mäkelä
d5a2bc6a0f Merge 10.2 into 10.3 2019-04-04 19:41:12 +03:00
Marko Mäkelä
cad56fbaba MDEV-18733 MariaDB slow start after crash recovery
If InnoDB crash recovery was needed, the InnoDB function srv_start()
would invoke extra validation, reading something from every InnoDB
data file. This should be unnecessary now that MDEV-14717 made
RENAME operations crash-safe inside InnoDB (which can be
disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF).

dict_check_sys_tables(): Skip tables that would be dropped by
row_mysql_drop_garbage_tables(). Perform extra validation only
if innodb_safe_truncate=OFF, innodb_force_recovery=0 and
crash recovery was needed.

dict_load_table_one(): Validate the root page of the table.
In this way, we can deny access to corrupted or mismatching tables
not only after crash recovery, but also after a clean shutdown.
2019-04-03 19:56:03 +03:00
Eugene Kosov
3a3d5ba235 MDEV-13301 Optimize DROP INDEX, ADD INDEX into RENAME INDEX
Just rename index in data dictionary and in InnoDB cache when it's possible.
Introduce ALTER_INDEX_RENAME for that purpose so that engines can optimize
such operation.

Unused code between macro MYSQL_RENAME_INDEX was removed.

compare_keys_but_name(): compare index definitions except for index names

Alter_inplace_info::rename_keys:
ha_innobase_inplace_ctx::rename_keys: vector of rename indexes

fill_alter_inplace_info():: fills Alter_inplace_info::rename_keys
2019-04-03 18:36:33 +02:00
Marko Mäkelä
5c3ff5cb93 Merge 10.3 into 10.4 2019-04-02 11:04:54 +03:00
Marko Mäkelä
f9ab7b473a MDEV-18623 Assertion after DROP FULLTEXT INDEX and removing NOT NULL
instant_alter_column_possible(): Do not support instantaneous removal
of NOT NULL if the table needs to be rebuilt for removing the hidden
FTS_DOC_ID column. This is not ideal and should ultimately be fixed
properly in MDEV-17459.
2019-04-02 11:03:44 +03:00
Marko Mäkelä
43c20542dd MDEV-19030: Assertion failed in rec_init_offsets() after DROP COLUMN
This basically is a duplicate of MDEV-18219, proving that the
assertion was not relaxed correctly.

dict_index_t::in_instant_init: A debug flag that will only be set in
btr_cur_instant_init_low() in order to suppress the assertion failure
in rec_init_offsets() for that code path only.
2019-04-02 11:03:44 +03:00
Marko Mäkelä
517486963b Adjust tests after commit b5615eff0d 2019-04-02 11:03:28 +03:00
Marko Mäkelä
7b42d892de Merge 10.2 into 10.3 2019-04-02 09:19:34 +03:00
Marko Mäkelä
bce380f2a5 Merge 10.1 into 10.2 2019-04-02 09:14:15 +03:00
Michael Widenius
b5615eff0d Write information about restart in .result
Idea comes from MySQL which does something similar
2019-04-01 19:47:24 +03:00
Marko Mäkelä
10dd290b4b MDEV-17380 innodb_flush_neighbors=ON should be ignored on SSD
For tablespaces that do not reside on spinning storage, it does
not make sense to attempt to write nearby pages when writing out
dirty pages from the InnoDB buffer pool. It is actually detrimental
to performance and to the life span of flash ROM storage.

With this change, MariaDB will detect whether an InnoDB file resides
on solid-state storage. The detection has been implemented for Linux
and Microsoft Windows. For other systems, we will err on the safe side
and assume that files reside on SSD.

As part of this change, we will reduce the number of fstat() calls
when opening data files on POSIX systems and slightly clean up some
file I/O code.

FIXME: os_is_sparse_file_supported() on POSIX works in a destructive
manner. Thus, we can only invoke it when creating files, not when
opening them.

For diagnostics, we introduce the column ON_SSD to the table
INFORMATION_SCHEMA.INNODB_TABLESPACES_SCRUBBING. The table
INNODB_SYS_TABLESPACES might seem more appropriate, but its purpose
is to reflect the contents of the InnoDB system table SYS_TABLESPACES,
which we would like to remove at some point.

On Microsoft Windows, querying StorageDeviceSeekPenaltyProperty
sometimes returns ERROR_GEN_FAILURE instead of ERROR_INVALID_FUNCTION
or ERROR_NOT_SUPPORTED. We will silently ignore also this error,
and assume that the file does not reside on SSD.

On Linux, the detection will be based on the files
/sys/block/*/queue/rotational and /sys/block/*/dev.
Especially for USB storage, it is possible that
/sys/block/*/queue/rotational will wrongly report 1 instead of 0.

fil_node_t::on_ssd: Whether the InnoDB data file resides on
solid-state storage.

fil_system_t::ssd: Collection of Linux block devices that reside on
non-rotational storage.

fil_system_t::create(): Detect ssd on Linux based on the contents
of /sys/block/*/queue/rotational and /sys/block/*/dev.

fil_system_t::is_ssd(dev_t): Determine if a Linux block device is
non-rotational. Partitions will be identified with the containing
block device by assuming that the least significant 4 bits of the
minor number identify a partition, and that the "partition number"
of the entire device is 0.
2019-04-01 12:00:56 +03:00
Marko Mäkelä
349560d5d5 Merge 10.2 into 10.3 2019-03-27 13:27:04 +02:00
Marko Mäkelä
1e9c2b2305 Merge 10.1 into 10.2 2019-03-27 12:26:11 +02:00
Marko Mäkelä
a6585d5ce9 Merge 10.0 into 10.1 2019-03-27 11:56:08 +02:00
Marko Mäkelä
1933cf98e8 Merge 5.5 into 10.0 2019-03-26 14:13:46 +02:00
Marko Mäkelä
8b480df63e Merge 10.3 into 10.4 2019-03-25 17:18:15 +02:00
Marko Mäkelä
dbc0d576a3 Merge 10.2 into 10.3 2019-03-25 16:14:39 +02:00
Marko Mäkelä
525e79b057 MDEV-19022: InnoDB fails to cleanup useless B-tree pages
The test case for reproducing MDEV-14126 demonstrates that InnoDB can
end up with an index tree where a non-leaf page has only one child page.

The test case innodb.innodb_bug14676111 demonstrates that such pages
are sometimes unavoidable, because InnoDB does not implement any sort
of B-tree rotation.

But, there is no reason to allow a root page with only one child page.

btr_cur_node_ptr_delete(): Replaces btr_node_ptr_delete().

btr_page_get_father(): Declare globally.

btr_discard_only_page_on_level(): Declare with ATTRIBUTE_COLD.
It turns out that this function is not covered by the
innodb.innodb_bug14676111 test case after all.

btr_discard_page(): If the root page ends up having only one child
page, shrink the tree by invoking btr_lift_page_up().
2019-03-25 16:03:24 +02:00
Chris Calender
d8b7e76c37 Fix for MDEV-18276, typo in error message + all other occurrences of refering 2019-03-23 00:00:47 +04:00
Marko Mäkelä
50a8fc5298 MDEV-18224 MTR's internal check of the test case 'innodb.recovery_shutdown' failed due to extra #sql-ib*.ibd files
The test innodb.recovery_shutdown would occasionally fail,
because recovered incomplete transactions would be conflicting
with DROP TABLE, causing the background drop table queue to be invoked.

Add a slow shutdown before dropping the tables, so that the
recovered transactions will be rolled back. Starting with MDEV-14705,
normal shutdown would abort the rollback of recovered transactions.
2019-03-22 11:16:06 +02:00
Eugene Kosov
8dffaaef72 MDEV-12836 Avoid table rebuild when removing of auto_increment settings
Field::is_equal(): treat old type and new one without AUTO_INCREMENT as equal

Closes #1208
2019-03-20 19:18:21 +01:00
Marko Mäkelä
514b305dfb Merge 10.3 into 10.4
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.

In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
2019-03-20 10:41:32 +02:00
Daniel Black
de51acd037 MDEV-18726: innodb buffer pool size not consistent with large pages
Rather than add a small extra amount on the size of chunks, keep it
of the specified size. The rest of the chunk initialization code
adapts to this small size reduction. This has been made in the general
case, not just large pages, to keep it simple.

The chunks size is controlled by innodb-buffer-pool-chunk-size. In the
code increasing this by a descriptor table size length makes it
difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M
the code before this commit would of added a small amount extra to this
value when it tried to allocate this. While not normally a problem it is
with large pages, it now requires addition space, a whole extra large
page. With a number of pools, or with 1G or 16G large pages this is
quite significant.

By removing this additional amount, DBAs can set
innodb-buffer-pool-chunk size to the large page size, or a multiple of
it, and actually get that amount allocated. Previously they had to fudge
a value less.

The innodb.test results show how this is fudged over a number of tests. With
this change the values are just between 488 and 500 depending on architecture
and build options.

Tested with  --large-pages --innodb-buffer-pool-size=256M
--innodb-buffer-pool-chunk-size=2M on x86_64 with 2M default large page
size. Breaking before buf_pool init, one large page was allocated in
MyISAM, by the end of the function 128 huge pages where allocated as
expected. A further 16 pages where allocated for a 32M log buffer and
during startup 1 page was allocated briefly to the redo log.
2019-03-18 21:49:53 +02:00
Sergei Golubchik
b64fde8f38 Merge branch '10.2' into 10.3 2019-03-17 13:06:41 +01:00