1
0
mirror of https://github.com/MariaDB/server.git synced 2025-05-11 13:21:44 +03:00

32154 Commits

Author SHA1 Message Date
Marko Mäkelä
f77329ace9 Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.

This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.

When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.

Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.

The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.

The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.

In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.

btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.

btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().

btr_page_free(): Add a debug assertion that the page was a B-tree page.

btr_lift_page_up(): Return the father block.

btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.

btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.

btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.

btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.

fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.

xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.

fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().

fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.

fsp_free_page(): Add ut_ad(0) to the error outcomes.

fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.

fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().

fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.

fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.

fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().

buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.

mtr_t: Add n_freed_pages (number of pages that have been freed).

page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().

trx_undo_rec_copy(): Add a debug assertion for the length.

trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.

trx_undo_report_row_operation(): Add debug assertions.

trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.

page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.

row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.

row_build(): Tighten a debug assertion about null BLOB pointers.

row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.

rb:939 approved by Jimmy Yang
2012-02-17 11:42:04 +02:00
Igor Babaev
c563ea0717 Fixed LP bug #933117.
The bug was fixed with the code back-ported from the patch for LP bug 800184
pushed into mariadb-5.3.
2012-02-16 16:06:49 -08:00
Vladislav Vaintroub
02724621ca merge from 5.5 2012-02-16 17:33:37 +01:00
Sergey Petrunya
541334c5ac Backport of:
timestamp: Thu 2011-12-01 15:12:10 +0100
Fix for Bug#13430436 PERFORMANCE DEGRADATION IN SYSBENCH ON INNODB DUE TO ICP

When running sysbench on InnoDB there is a performance degradation due
to index condition pushdown (ICP). Several of the queries in sysbench
have a WHERE condition that the optimizer uses for executing these
queries as range scans. The upper and lower limit of the range scan
will ensure that the WHERE condition is fulfilled. Still, the WHERE
condition is part of the queries' condition and if ICP is enabled the
condition will be pushed down to InnoDB as an index condition. 

Due to the range scan's upper and lower limits ensure that the WHERE
condition is fulfilled, the pushed index condition will not filter out
any records. As a result the use of ICP for these queries results in a
performance overhead for sysbench. This overhead comes from using
resources for determining the part of the condition that can be pushed
down to InnoDB and overhead in InnoDB for executing the pushed index
condition.

With the default configuration for sysbench the range scans will use
the primary key. This is a clustered index in InnoDB. Using ICP on a
clustered index provides the lowest performance benefit since the
entire record is part of the clustered index and in InnoDB it has the
highest relative overhead for executing the pushed index condition.

The fix for removing the overhead ICP introduces when running sysbench
is to disable use of ICP when the index used by the query is a
clustered index.

When WL#6061 is implemented this change should be re-evaluated.
2012-02-16 20:15:57 +04:00
Kent Boortz
62bf500cf2 Merge 2012-02-16 11:19:19 +01:00
Kent Boortz
5b37656369 Merge 2012-02-16 11:17:04 +01:00
MySQL Build Team
7a35cb9150 Updated/added copyright headers 2012-02-16 10:48:16 +01:00
unknown
607aab9c1d Counters for Index Condition Pushdown added (MDEV-130). 2012-02-16 08:49:10 +02:00
Sergei Golubchik
0d0e68da6a merge 2012-02-15 19:11:16 +01:00
Sergei Golubchik
25609313ff 5.3.4 merge 2012-02-15 18:08:08 +01:00
Kent Boortz
6a003dd8ef Updated/added copyright headers 2012-02-15 17:21:38 +01:00
unknown
4917073305 Merge XtraDB from Percona-Server-5.5.20-24.1 into MariaDB 5.5. 2012-02-15 15:37:38 +01:00
Marko Mäkelä
ee9359840f Merge mysql-5.1 to mysql-5.5. 2012-02-15 16:37:06 +02:00
Marko Mäkelä
2a6a6abb70 Remove a race condition in innodb_bug53756.test.
Before killing the server, tell mysql-test-run that it is to be expected.

Discussed with Bjorn Munch on IM.
2012-02-15 16:28:00 +02:00
unknown
47a54a2e08 Merge MySQL 5.5.20 into MariaDB 5.5. 2012-02-14 16:06:41 +01:00
unknown
764eeeee74 Fix for LP BUG#910123 MariaDB 5.3.3 causes 1093 error on Drupal
Problem was that now we can merge derived table (subquery in the FROM clause).
Fix: in case of detected conflict and presence of derived table "over" the table which cased the conflict - try materialization strategy.
2012-02-14 16:52:56 +02:00
unknown
bbd20403c5 Fix wrong error code in the test case.
The replication slave sets first error 1913 and immediately after error
1595. Thus it is possible, but unlikely, to get 1913. The original test
seems to realise this, but uses an invalid error code - my guess is
that this was a temporary code used in a feature tree, which was then
forgotten to be fixed when merged to main. The removed "1923" is
something committed by mistake during tests.
2012-02-14 13:24:03 +01:00
Alexey Botchkov
7ab53e8062 MDEV-59 Review and push crashsafe GIS keys.
tests for RTree keys recovery.
2012-02-14 16:04:06 +04:00
Sergey Petrunya
e41e56ae99 Merge fix for BUG#928048 2012-02-14 14:13:10 +04:00
Sergey Petrunya
c9355dc279 BUG#928048: Query containing IN subquery with OR in the where clause returns a wrong result
- Make equality propagation work correctly when done inside the OR branches
2012-02-14 13:58:57 +04:00
Igor Babaev
c8bbe06ac7 Fixed LP bug #925985.
If the flag 'optimize_join_buffer_size' is set to 'off' and the value
of the system variable 'join_buffer_size' is greater than the value of
the system variable 'join_buffer_space_limit' than no join cache can
be employed to join tables of the executed query.
A bug in the function JOIN_CACHE::alloc_buffer allowed to use join
buffer even in this case while another bug in the function 
revise_cache_usage could cause a crash of the server in this case if the
chosen execution plan for the query contained outer join or semi-join
operation.
2012-02-13 23:46:57 -08:00
Tor Didriksen
6b7f9c0aec Bug#13633383 63183: SMALL SORT_BUFFER_SIZE CRASH IN MERGE_BUFFERS
This patch is a backport of some of the cleanups/refactorings that were done
as part of WL#1393 Optimizing filesort with small limit.


mysql-test/t/bug13633383.test:
  New test case.
mysql-test/valgrind.supp:
  Changed signature for find_all_keys
2012-02-14 08:11:28 +01:00
Rohit Kalhans
ba7d726c48 fixing test failure on pb2. 2012-02-13 15:37:50 +05:30
Rohit Kalhans
701cef2d35 BUG#11758263: Modification of indentation in the added code.
Fixed a typo in the comment.
              Fixing test cases which were previouslyno throwing  due
              disable warnings macro. 

sql/sql_base.cc:
  Change in indentation and fixing a typo in the comment.
2012-02-13 14:12:13 +05:30
unknown
bc2c40e71b Fix _another_ race in test case rpl_cant_read_event_incident (seen in 5.5 Buildbot). 2012-02-11 13:32:36 +01:00
unknown
f6cdddf51f Test case for bug lp:905353
The bug itself is fixed by the patch for bug lp:908269.
2012-02-09 23:35:26 +02:00
Rohit Kalhans
93fa4604ee Followup patch for bug#11758263. 2012-02-10 01:43:47 +05:30
Rohit Kalhans
31c990ca57 BUG#11758263 50440: MARK UNORDERED UPDATE WITH AUTOINC UNSAFE
Problem: Statements that write to tables with auto_increment columns
         based on the selection from another table, may lead to master
         and slave going out of sync, as the order in which the rows
         are retrieved from the table may differ on master and slave.
            
Solution: We mark writing to a table with auto_increment table
          based on the rows selected from another table as unsafe. This
          will cause the execution of such statements to throw a warning
          and forces the statement to be logged in ROW if the logging
          format is mixed. 
            
Changes:
       1. All the statements that writes to a table with auto_increment 
          column(s) based on the rows fetched from another table, will now
          be unsafe.
       2. CREATE TABLE with SELECT will now be unsafe.

sql/share/errmsg-utf8.txt:
  Added new warning messages.
sql/sql_base.cc:
  -Created function to check statements that write to 
   tables with auto_increment column and has select.
  -Marked all the statements that write to a table
   with auto_increment column based on rows fetched
   from other table(s) as unsafe.
sql/sql_table.cc:
  mark CREATE TABLE[with auto_increment column] as unsafe.
2012-02-09 23:28:33 +05:30
unknown
6e9b06d90b Fix a number of problems in the test suite (no code bugs):
- mysql-test-run.pl --valgrind complains when all tests succeed.

 - perfschema.all_instances fail on non-linux, where ENABLE_TEMP_POOL
   is not set and therefore BITMAP mutex is not used.

 - MDEV-132: main.mysqldump fails because it depends on exact size of stdio
   buffers.

 - MDEV-99: rpl.rpl_cant_read_event_incident fails due to a race where the
   slave manages to connect while the test case is in the middle of setting up
   the master, causing the slave to replicate extra/wrong events.

 - MDEV-133: rpl.rpl_rotate_purge_deadlock fails because it issues a
   DEBUG_SYNC SIGNAL immediately followed by RESET; this means that sometimes
   the intended receipient has no time to see the signal before it is cleared
   by the RESET, causing wait to timeout.
2012-02-09 13:10:47 +01:00
Vladislav Vaintroub
8877fe7359 merge 2012-02-08 11:18:55 +01:00
Rohit Kalhans
b7430d73e4 Backout the patch for bug#11758263. 2012-02-08 12:10:55 +05:30
Rohit Kalhans
de85a60049 BUG#11758263 50440: MARK UNORDERED UPDATE WITH AUTOINC UNSAFE
Problem: Statements that write to tables with auto_increment columns
      based on the selection from another table, may lead to master
      and slave going out of sync, as the order in which the rows
      are retrived from the table may differ on master and slave.
      
      Solution: We mark writing to a table with auto_increment table
      as unsafe. This will cause the execution of such statements to
      throw a warning and forces the statement to be logged in ROW if
      the logging format is mixed. 
      
      Changes: 
      1. All the statements that writes to a table with auto_increment 
      column(s) based on the rows fetched from another table, will now
      be unsafe.
      2. CREATE TABLE with SELECT will now be unsafe.


sql/share/errmsg-utf8.txt:
  Added new Warning messages
sql/sql_base.cc:
  created a new function that checks for select + write on a autoinc table
  made all such statements to be unsafe.
sql/sql_parse.cc:
  made create autoincremnet tabble + select unsafe
2012-02-08 00:33:08 +05:30
Sergei Golubchik
ae0a7cfd5f making more use of My::Suite object 2012-02-07 18:53:33 +01:00
Martin Hansson
bb9a3be632 Merge of fix for Bug#11765810. 2012-02-07 17:32:04 +01:00
Sergei Golubchik
98ae512014 small cleanup 2012-02-07 17:18:41 +01:00
Sergei Golubchik
2682a280c8 allow suite.pm to skip combinations that originate from test/include files.
storage/innobase/handler/handler0alter.cc:
  for NEWDATE key_type says unsigned, thus col->prtype says unsigned,
  but field->flags says signed. Use the same flag for value retrieval
  that was used for value storage.
2012-02-07 16:22:36 +01:00
Martin Hansson
9ada2f8ec5 Bug #11765810 58813: SERVER THREAD HANGS WHEN JOIN + WHERE + GROUP BY
IS EXECUTED TWICE FROM P

This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
2012-02-07 14:16:09 +01:00
Sergei Golubchik
e83dd9b517 mtr: support for rdiff files 2012-02-06 23:16:21 +01:00
Sergei Golubchik
3320a4bffb per-combination result files 2012-02-06 22:55:17 +01:00
Sergei Golubchik
9c8c572fd3 per-file combinations 2012-02-06 21:36:56 +01:00
Sergei Golubchik
e06c1c70e5 cleanup 2012-02-06 20:29:21 +01:00
Sergei Golubchik
39b1dbc4d2 make %suites hash local to mtr_cases.pm 2012-02-06 20:29:13 +01:00
Sergei Golubchik
6d48dfae99 move --secure-file-priv from hardcoded to a template. remove redundant suite.opt 2012-02-06 20:28:56 +01:00
Sergei Golubchik
2bcb4184ef remove few obscure, unused, or misused mtr features 2012-02-06 18:42:18 +01:00
Georgi Kodinov
91c1c93019 merge mysql-5.5->mysql-5.5-security 2012-02-06 18:26:36 +02:00
Georgi Kodinov
d29ae4918e merged mysql-5.1->mysql-5.1-security 2012-02-06 18:24:51 +02:00
Sergei Golubchik
6f0101186d remove few hard-coded checks from mtr 2012-02-06 16:29:53 +01:00
Sergei Golubchik
e7e2058550 MDEV-117 Assertion: prebuilt->sql_stat_start || trx->conc_state == 1 failed at row0sel.c:3933
DELETE IGNORE should not ignore deadlocks

sql/mdl.cc:
  more DBUG_ENTER/DBUG_RETURN
sql/sql_base.cc:
  more DBUG_ENTER/DBUG_RETURN
2012-03-01 16:24:59 +01:00
Vasil Dimov
56e3f68c72 Merge mysql-5.1 -> mysql-5.5
The actual Bug#11754376 does not exist in MySQL 5.5 because at startup
we drop entries for temporary tables from InnoDB dictionary cache (only
if ROW_FORMAT is not REDUNDANT). But nevertheless the bug in
normalize_table_name_low() is present so we fix it.
2012-02-06 13:00:41 +02:00
Vasil Dimov
1c4fd3bb54 Fix Bug#11754376 45976: INNODB LOST FILES FOR TEMPORARY TABLES ON
GRACEFUL SHUTDOWN

During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.

The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.

The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.

The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".

This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.

Reviewed by:	Marko (http://bur03.no.oracle.com/rb/r/929/)
2012-02-06 12:44:59 +02:00