Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
The bug is not very important per se, but it was helpful to move
Item_func_strcmp out of Item_bool_func2 (to Item_int_func),
for the purposes of "MDEV-4912 Add a plugin to field types (column types)".
The test runs a query in one thread, then in another queries the processlist
and expects to find the first thread in the COM_SLEEP state. The problem is
that the thread signals completion to the client before changing to COM_SLEEP
state, so there is a window where the other thread can see the wrong state.
A previous attempt to fix this was ineffective. It set a DEBUG_SYNC to handle
proper waiting, but unfortunately that DEBUG_SYNC point ended up triggering
already at the end of SET DEBUG_SYNC=xxx, so the wait was ineffective.
Fix it properly now (hopefully) by ensuring that we wait for the DEBUG_SYNC
point to trigger at the end of the SELECT SLEEP(), not just at the end of
SET DEBUG_SYNC=xxx.
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
Merge Facebook commit ca40b4417fd224a68de6636b58c92f133703fc68
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Change the default value for innodb_log_compressed_pages to false
Logging these pages is a waste. We don't want this to be enabled.
One caution here: If the zlib version used by innodb is changed, but
the running version is still the previous version, and the running
version crashes, it is possible crash recovery could fail.
When crash recovery uses a zlib version at all different than the
version used by the crashed instance, it is possible that a redone
compression could fail, where the original did not, because the new
zlib version compresses the same data to a slightly larger size.
Because of the nature of compression, this is even possible when
upgrading to a version of zlib which actually peforms overall better
compression than the previous version.
If this happens, mysql will fail to recover, since a page split can
not be safely triggered during crash recovery.
So, either the exact zlib version must be controlled between builds,
or these rare recovery failures must be accepted. The cost of
logging these pages is quite high, so we consider this limitation to
be worthwhile.
This failure scenario can not happen if there was a clean shutdown.
This is only relevant to restarting crashed instances, or starting an
instance built via a hot backup too (XtraBackup).
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
(Backport to 5.3)
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
(Backport to 5.3)
(variant #2, with fixed coding style)
- Make Mrr_ordered_index_reader::resume_read() restore index position
only if it was saved before with Mrr_ordered_index_reader::interrupt_read().
- TABLE::create_key_part_by_field() should not set PART_KEY_FLAG in field->flags
= The reason is that it is used by hash join code which calls it to create a hash
table lookup structure. It doesn't create a real index.
= Another caller of the function is TABLE::add_tmp_key(). Made it to set the flag itself.
- The differences in join_cache.result could also be observed before this patch: one
could put "FLUSH TABLES" before the queries and get exactly the same difference.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This change is to fix: http://bugs.mysql.com/62534
This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.
This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.
Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct
Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
The method subselect_union_engine::no_rows() must take
into account the fact that now unit->fake_select_lex is
NULL for for select_union_direct objects.
The test case waits for other threads to complete, but the wait is only 2
seconds. This is likely to sometimes be too little on our heavily loaded
buildbot VMs, that can easily stall for more than 2 seconds from time to time.
So let's try to increase the timeout (to about 40 seconds) and see if it
helps.
The test case runs SHOW SLAVE HOSTS. The output of this is only stable after
all slaves have had time to register with the master; this happens
asynchroneously.
The test was waiting for the slave with server_id=3 to appear in the output,
but it was missing a similar wait for server_id=2. Thus, if server_id=2 was
much slower to connect for some reason, it could be missing from the output,
causing the test to fail.
I think I finally found the problem, managed to reproduce locally using a
sleep in the test case to simulate the particular race condition that causes
the test to fail often in Buildbot.
The test starts an ALTER TABLE that does repair by sort in one thread, then
another thread waits for the sort to be visible in SHOW PROCESSLIST and runs a
SHOW statement in parallel.
The problem happens when the sort manages to run to completion before the
other thread has the time to look at SHOW PROCESSLIST. In this case, the wait
times out because the state looked for has already passed.
Earlier I added some DEBUG_SYNC to prevent this race, but it turns out that
DEBUG_SYNC itself changes the state in the processlist. So when the debug sync
point was hit, the processlist was showing the wrong state, so the wait would
still time out.
Fixed now by looking for the processlist to contain either the "Repair by
sorting" state or the debug sync wait stage.
Also clean up previous attempts to fix it. Set the wait timeout back to
reasonable 60 seconds, and simplify the DEBUG_SYNC operations to work closer
to how the original test case was intended.
1. Do not use NULL `info' field in processlist to select the thread of
interest. This can fail if the read of processlist ends up happening after
REAP succeeds, but before the `info' field is reset. Instead, select on the
CONNECTION_ID(), making sure we still scan the whole list to trigger the same
code as in the original test case.
2. Wait for the query to really complete before reading it in the
processlist. When REAP returns, it only means that ack has been sent to
client, the reset of query stage happens a bit later in the code.
Add missing REAP to the test.
A later test failed with strange incorrect values for COM_SELECT
in information_schema.global_status. Since global_status is updated
at the end of session activity, it seems appropriate to ensure that
all background connections have completed before accessing it.
(I checked that the original bug still triggers the test case after
the modification with REAP).
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
innodb_buffer_pool_pages_total depends on page size. On Power8 it is 65k
compared to 4k on Intel. As we round allocations on page size we may get
slightly more memory for buffer pool.
- test_if_skip_sort_order()/create_ref_for_key() may change table
access from EQ_REF(index1) to REF(index2).
- Doing so doesn't make much sense from optimization POV, but since
they are doing it, they should update tab->read_record.unlock_row
accordingly.