mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	50fa94ea2b	Merge 10.7 into 10.8	2022-02-23 16:42:59 +02:00
Marko Mäkelä	d3e06dbbe3	MDEV-27924 page_zip_copy_recs() corrupts ROW_FORMAT=COMPRESSED block descriptor In commit `aaef2e1d8c` (MDEV-27058) we failed to introduce a special copy constructor that would preserve the "page_zip_des_t::fix" field that only exists there in order to avoid alignment loss on 64-bit systems. page_zip_copy_recs(): Invoke the special copy constructor. The block descriptor corruption causes assertion failures when running ./mtr --suite=innodb_zip while InnoDB has been built with UNIV_ZIP_COPY. Normally, calls to page_zip_copy_recs() occur very rarely on page splits.	2022-02-23 11:34:52 +02:00
Marko Mäkelä	358921ce32	MDEV-26938 Support descending indexes internally in InnoDB This is loosely based on the InnoDB changes in mysql/mysql-server@97fd8b1b69 that I had developed in 2015 or 2016. For each B-tree key field, we will allow a flag ASC/DESC to be associated. When PRIMARY KEY fields are internally appended to secondary indexes, the ASC/DESC attribute will be inherited, so that covering index scans will work as expected. Note: Until the subsequent commit, the DESC attribute will be ignored (no HA_REVERSE_SORT flag will be written to .frm files). dict_field_t::descending: A new flag to denote descending order. cmp_data(), cmp_dfield_dfield(): Add a new parameter descending. cmp_dtuple_rec(), cmp_dtuple_rec_with_match(): Add a parameter "index". dtuple_coll_eq(): Replaces dtuple_coll_cmp(). cmp_dfield_dfield_eq_prefix(): Replaces cmp_dfield_dfield_like_prefix(). dict_index_t::is_btree(): Check whether the index is a regular B-tree index (not SPATIAL, FULLTEXT, or the ibuf.index, or a corrupted index. btr_cur_search_to_nth_level_func(): Only attempt to use the adaptive hash index if index->is_btree(). This function may also be invoked on ibuf.index, and cmp_dtuple_rec_with_match_bytes() will no longer work on ibuf.index because it assumes that the index and record fields exactly match. The ibuf.index is a special variadic index tree. Thanks to Thirunarayanan Balathandayuthapani for fixing some bugs: MDEV-27439, MDEV-27374/MDEV-27445.	2022-01-26 18:43:05 +01:00
Marko Mäkelä	5d54fd611f	Cleanup: Replace ut_crc32c(x,y) with my_crc32c(0,x,y)	2022-01-21 16:13:04 +02:00
Marko Mäkelä	aaef2e1d8c	MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t:🔒 Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.	2021-11-18 17:47:19 +02:00
Marko Mäkelä	6841d1afdd	Merge 10.5 into 10.6	2021-11-16 17:15:13 +02:00
Marko Mäkelä	ebb15f986f	MDEV-27059 page_zip_dir_insert() may corrupt ROW_FORMAT=COMPRESSED tables In commit `7ae21b18a6` (MDEV-12353) the recovery of ROW_FORMAT=COMPRESSED tables was changed. Changes would be logged in a physical format for the compressed page image, so that the page need not be decompressed or compressed during recovery. page_zip_write_rec(): Log any update of the delete-mark flag in the ROW_FORMAT=COMPRESSED page. page_zip_dir_insert(): Copy the delete-mark flag. A delete-marked record may be inserted by btr_cur_pessimistic_update() via btr_cur_insert_if_possible(), page_cur_tuple_insert(), page_cur_insert_rec_zip(). In the observed scenario, it was an ROLLBACK. Presumably, the test case involved repeated DELETE and INSERT of the same key, or updating a key back and forth. This change alone might make the adjustment in page_zip_write_rec() redundant, but we play it safe because we failed to create a minimal test case for this scenario.	2021-11-16 17:13:15 +02:00
Marko Mäkelä	73f5cbd0b6	Merge 10.5 into 10.6	2021-10-21 16:06:34 +03:00
Marko Mäkelä	a0fda162eb	Fix GCC 11.2.0 -m32 (IA-32) warnings page_create_low(): Fix -Warray-bounds log_buffer_extend(): Fix -Wstringop-overflow	2021-10-21 15:31:21 +03:00
Marko Mäkelä	e696e9e63f	Merge 10.3 into 10.4	2021-08-25 07:30:47 +03:00
Michael Widenius	497b694936	Fixed compile errors when compiling with HAVE_valgrind	2021-08-24 23:05:21 +03:00
Oleksandr Byelkin	6efb5e9f5e	Merge branch '10.5' into 10.6	2021-08-02 10:11:41 +02:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Marko Mäkelä	b50ea90063	Merge 10.2 into 10.3	2021-07-22 18:57:54 +03:00
Marko Mäkelä	124dc0d85b	MDEV-25361 fixup: Fix integer type mismatch InnoDB tablespace identifiers and page numbers are 32-bit numbers. Let us use a 32-bit type for them in innochecksum. The changes in commit `1918bdf32c` broke the build on 32-bit Windows. Thanks to Vicențiu Ciorbaru for an initial version of this fixup.	2021-07-22 17:53:43 +03:00
Marko Mäkelä	641f09398f	Merge 10.5 into 10.6	2021-07-22 10:11:08 +03:00
Marko Mäkelä	82d5994520	MDEV-26110: Do not rely on alignment on static allocation It is implementation-defined whether alignment requirements that are larger than std::max_align_t (typically 8 or 16 bytes) will be honored by the compiler and linker. It turns out that on IBM AIX, both alignas() and MY_ALIGNED() only guarantees alignment up to 16 bytes. For some data structures, specifying alignment to the CPU cache line size (typically 64 or 128 bytes) is a mere performance optimization, and we do not really care whether the requested alignment is guaranteed. But, for the correct operation of direct I/O, we do require that the buffers be aligned at a block size boundary. field_ref_zero: Define as a pointer, not an array. For innochecksum, we can make this point to unaligned memory; for anything else, we will allocate an aligned buffer from the heap. This buffer will be used for overwriting freed data pages when innodb_immediate_scrub_data_uncompressed=ON. And exactly that code hit an assertion failure on AIX, in the test innodb.innodb_scrub. log_sys.checkpoint_buf: Define as a pointer to aligned memory that is allocated from heap. log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf instead of trying to allocate an aligned buffer from the stack.	2021-07-22 10:05:13 +03:00
Marko Mäkelä	65f1a42788	Merge 10.5 into 10.6	2021-06-09 16:50:58 +03:00
Marko Mäkelä	3c97097f11	Merge 10.4 into 10.5	2021-06-04 10:07:29 +03:00
Monty	fa0bbff032	Fixed that compile-pentium64-valgrind-max works - Removed Tokudb (no need to test this anymore with valgrind) - Added __attribute__(unused)) to a few places to be able to compile even if valgrind/memcheck.h is not installed. Reviewer: Marko Mäkelä <marko.makela@mariadb.com>	2021-06-02 18:54:49 +03:00
Marko Mäkelä	a722ee88f3	Merge 10.5 into 10.6	2021-06-01 11:39:38 +03:00
Marko Mäkelä	139333a6cc	MDEV-25745: Not applying INSERT_REUSE_REDUNDANT page_apply_insert_redundant(): Correct a condition that would occasionally fail when recovering changes for the change buffer tree (where extra_size and data_size can vary wildly). This was broken in commit `138cbec5f2` (MDEV-21724).	2021-05-31 15:44:11 +03:00
Marko Mäkelä	49e2c8f0a6	MDEV-25743: Unnecessary copying of table names in InnoDB dictionary Many InnoDB data dictionary cache operations require that the table name be copied so that it will be NUL terminated. (For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.) dict_table_t::is_garbage_name(): Check if a name belongs to the background drop table queue. dict_check_if_system_table_exists(): Remove. dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup. dict_sys_t::create_or_check_sys_tables(): Replaces dict_create_or_check_foreign_constraint_tables() and dict_create_or_check_sys_virtual(). dict_sys_t::load_table(): Replaces dict_table_get_low() and dict_load_table(). dict_sys_t::find_table(): Renamed from get_table(). dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist. trx_t::has_stats_table_lock(): Moved to dict0stats.cc. Some error messages will now report table names in the internal databasename/tablename format, instead of `databasename`.`tablename`.	2021-05-21 18:03:40 +03:00
Marko Mäkelä	d2e2d32933	Merge 10.5 into 10.6	2021-04-14 12:32:27 +03:00
Marko Mäkelä	6c3e860cbf	Merge 10.4 into 10.5	2021-04-14 11:35:39 +03:00
Marko Mäkelä	5008171b05	Merge 10.3 into 10.4	2021-04-14 10:33:59 +03:00
Marko Mäkelä	b8c8692fd9	MDEV-24620 ASAN heap-buffer-overflow in btr_pcur_restore_position() Between btr_pcur_store_position() and btr_pcur_restore_position() it is possible that purge empties a table and enlarges index->n_core_fields and index->n_core_null_bytes. Therefore, we must cache index->n_core_fields in btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be parsed correctly. Unfortunately, this is a huge change, because we will replace "bool leaf" parameters with "ulint n_core" (passing index->n_core_fields, or 0 for non-leaf pages). For special cases where we know that index->is_instant() cannot hold, we may also pass index->n_fields.	2021-04-13 10:28:13 +03:00
Marko Mäkelä	e538cb095f	Merge 10.5 into 10.6	2021-03-27 18:03:03 +02:00
Marko Mäkelä	80459bcbd4	Merge 10.4 into 10.5	2021-03-27 17:37:42 +02:00
Marko Mäkelä	7ae37ff74f	Merge 10.3 into 10.4	2021-03-27 17:12:28 +02:00
Marko Mäkelä	3157fa182a	Merge 10.2 into 10.3	2021-03-27 16:11:26 +02:00
Marko Mäkelä	356c149603	Merge 10.5 into 10.6	2021-03-26 11:50:32 +02:00
Daniel Black	bcb9ca4105	MEM_CHECK_DEFINED: replace HAVE_valgrind HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in `af784385b4`. In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer), and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED being empty. If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of [-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty. With MEM_CHECK_DEFINED empty, there becomes no uses of this of the fixed field and innodb variables in this patch. So we stop using HAVE_valgrind as catchall and use the name HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists. Reviewer: Monty Corrects: `af784385b4`	2021-03-26 07:58:49 +11:00
Marko Mäkelä	0f8caadc96	MDEV-22653: Remove the useless parameter innodb_simulate_comp_failures The debug parameter innodb_simulate_comp_failures injected compression failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent compressed page overflows. A much better check is already achieved by defining UNIV_ZIP_COPY at the compilation time. (Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)	2021-03-22 18:12:44 +02:00
Marko Mäkelä	a43ff483fa	Merge 10.5 into 10.6	2021-03-11 20:20:07 +02:00
Marko Mäkelä	549a70d7f0	MDEV-25031 Not applying INSERT_*_REDUNDANT due to corruption on page page_apply_insert_redundant(): Replace a too strict condition hdr_c > pextra_size. It turns out that page_cur_insert_rec_low() is not even computing the extra_size of cur->rec when it is trying to reuse header bytes of the preceding record.	2021-03-11 14:21:28 +02:00
Marko Mäkelä	7a4fbb55b0	MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,... Historically, InnoDB supported a buggy page checksum algorithm that did not compute a checksum over the full page. Later, well before MySQL 4.1 introduced .ibd files and the innodb_file_per_table option, the algorithm was corrected and the first 4 bytes of each page were redefined to be a checksum. The original checksum was so slow that an option to disable page checksum was introduced for benchmarketing purposes. The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set extension, which includes instructions for faster computation of CRC-32C. In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was implemented to make of that. As that option was changed to be the default in MySQL 5.7, a bug was found on big-endian platforms and some work-around code was added to weaken that checksum further. MariaDB disables that work-around by default since MDEV-17958. Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER and ARM and also for IA-32/AMD64, making use of carry-less multiplication where available. Long story short, innodb_checksum_algorithm=crc32 is faster and more secure than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb. It should have removed any need to use innodb_checksum_algorithm=none. The setting innodb_checksum_algorithm=crc32 is the default in MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5, MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default. It is even faster and more secure. The default settings in MariaDB do allow old data files to be read, no matter if a worse checksum algorithm had been used. (Unfortunately, before innodb_checksum_algorithm=full_crc32, the data files did not identify which checksum algorithm is being used.) The non-default settings innodb_checksum_algorithm=strict_crc32 or innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C checksums. The incompatibility with old data files is why they are not the default. The newest server not to support innodb_checksum_algorithm=crc32 were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life. A valid reason for using innodb_checksum_algorithm=innodb could have been the ability to downgrade. If it is really needed, data files can be converted with an older version of the innochecksum utility. Because there is no good reason to allow data files to be written with insecure checksums, we will reject those option values: innodb_checksum_algorithm=none innodb_checksum_algorithm=innodb innodb_checksum_algorithm=strict_none innodb_checksum_algorithm=strict_innodb Furthermore, the following innochecksum options will be removed, because only strict crc32 will be supported: innochecksum --strict-check=crc32 innochecksum -C crc32 innochecksum --write=crc32 innochecksum -w crc32 If a user wishes to convert a data file to use a different checksum (so that it might be used with the no-longer-supported MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE nor system tablespace format changes that were made in MariaDB 10.3), then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or MySQL 5.7 can be used. Reviewed by: Thirunarayanan Balathandayuthapani	2021-03-11 12:46:18 +02:00
Marko Mäkelä	ff5d306e29	MDEV-21452: Replace ib_mutex_t with mysql_mutex_t SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	3dfeae0e22	Cleanup: Fix Intel compiler warnings about sign conversions	2020-11-25 11:32:49 +02:00
Marko Mäkelä	a8de8f261d	Merge 10.2 into 10.3	2020-10-28 10:01:50 +02:00
Thirunarayanan Balathandayuthapani	3ba8f619e4	MDEV-23370 innodb_fts.innodb_fts_misc failed in buildbot, server crashed in dict_table_autoinc_destroy This issue is caused by MDEV-22456 `ad6171b91c`. Fix involves the backported version of 10.4 patch MDEV-22778 `5f2628d1ee` and few parts of MDEV-17441 (`e9a5f288f2`). dict_table_t::stats_latch_created: Removed dict_table_t::stats_latch: make value member and always lock it for simplicity even for stats cloned table. zip_pad_info_t::mutex_created: Removed zip_pad_info_t::mutex: make member value instead of pointer os0once.h: Removed dict_table_remove_from_cache_low(): Ensure that fts_free() is always called, even if dict_mem_table_free() is deferred until btr_search_lazy_free(). InnoDB would always zip_pad_info_t::mutex and dict_table_t::autoinc_mutex, even for tables are not in ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.	2020-10-25 15:53:17 +05:30
Marko Mäkelä	d5d8756de3	Merge 10.4 into 10.5	2020-08-20 12:52:44 +03:00
Marko Mäkelä	2fa9f8c53a	Merge 10.3 into 10.4	2020-08-20 11:01:47 +03:00
Eugene Kosov	90c8d773ed	MDEV-21251 CHECK TABLE fails to check info_bits of records btr_validate_index(): do not stop checking after some level failed. That way it'll become possible to see errors in leaf pages even when uppers layers are corrupted too. page_validate(): check info_bits and status_bits more	2020-08-15 23:05:09 +03:00
Marko Mäkelä	cf87f3e08c	Merge 10.4 into 10.5	2020-08-14 11:33:35 +03:00
Marko Mäkelä	2f7b37b021	Merge 10.3 into 10.4, except MDEV-22543 Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()	2020-08-13 18:48:41 +03:00
Marko Mäkelä	4bd56a697f	Merge 10.2 into 10.3	2020-08-13 18:18:25 +03:00
Marko Mäkelä	182e2d4a6c	Merge 10.1 into 10.2	2020-08-13 07:38:35 +03:00
Marko Mäkelä	efd8af535a	MDEV-19526 heap number overflow on innodb_page_size=64k InnoDB only reserves 13 bits for the heap number in the record header, limiting the heap number to be at most 8191. But, when using innodb_page_size=64k and secondary index records of 7 bytes each, it is possible to exceed the maximum heap number. btr_cur_optimistic_insert(): Let the operation fail if the maximum number of records would be exceeded. page_mem_alloc_heap(): Move to the same compilation unit with the only caller, and let the operation fail if the maximum heap number has been allocated already.	2020-08-12 18:21:53 +03:00

1 2 3 4 5 ...

449 Commits