Facebook got a case where the page compresses really well so that
btr_cur_optimistic_update() returns DB_UNDERFLOW, but when a record
gets updated, the compression rate radically changes so that
btr_cur_insert_if_possible() can not insert in place despite
reorganizing/recompressing the page, leading to the assertion failing.
rb:1220 approved by Sunny Bains
COMPRESSED PAGE SIZE
This was submitted as MySQL Bug 61456 and a patch provided by
Facebook. This patch follows the same idea, but instead of adding a
parameter to btr_cur_pessimistic_insert(), we simply remove the
btr_cur_optimistic_insert() call there and add it to the only caller
that needs it.
btr_cur_pessimistic_insert(): Do not try btr_cur_optimistic_insert().
btr_insert_on_non_leaf_level_func(): Invoke btr_cur_optimistic_insert()
before invoking btr_cur_pessimistic_insert().
btr_cur_pessimistic_update(): Clarify in a comment why it is not
necessary to invoke btr_cur_optimistic_insert().
btr_root_raise_and_insert(): Assert that the root page is not empty.
This could happen if a pessimistic insert (involving a split or merge)
is performed without first attempting an optimistic (intra-page) insert.
rb:1219 approved by Sunny Bains
btr_cur_optimistic_insert(): Remove a bogus assertion. The insert may
fail after reorganizing the page.
btr_cur_optimistic_update(): Do not attempt to reorganize compressed pages,
because compression may fail after reorganization.
page_copy_rec_list_start(): Use page_rec_get_nth() to restore to the
ret_pos, which may also be the page infimum.
rb:1221
The buffer for the current read row from each partition
(m_ordered_rec_buffer) used for sorted reads was
allocated on open and freed when the ha_partition handler
was closed or destroyed.
For tables with many partitions and big records this could
take up too much valuable memory.
Solution is to only allocate the memory when it is needed
and free it when nolonger needed. I.e. allocate it in
index_init and free it in index_end (and to handle failures
also free it on reset, close etc.)
Also only allocating needed memory, according to
partitioning pruning.
Manually tested that it does not use as much memory and
releases it after queries.
mysql-test/suite/maria/lock.result:
Added test case
mysql-test/suite/maria/lock.test:
Added test case
sql/sql_table.cc:
One can't call HA_EXTRA_FORCE_REOPEN on something that may be opened twice.
It's safe to remove the call in this case as we will call HA_EXTRA_PREPARE_FOR_DROP for the table anyway.
(One nice side effect is that drop is a bit faster as we are not flushing the cache to disk before the drop anymore)
mysql-test/r/partition.result:
Added test case
mysql-test/t/partition.test:
Added test case
sql/sql_partition.cc:
Do mysql_trans_prepare_alter_copy_data() after all original tables are locked.
(We don't want to disable transactions for the original tables, that still may be in the cache)
sql/sql_table.cc:
Fixed two wrong DBUG_ENTER
sql/item_subselect.cc:
Added purecov info
sql/sql_select.cc:
Added cast
storage/innobase/handler/ha_innodb.cc:
Added cast
storage/xtradb/btr/btr0btr.c:
Added buf_block_get_frame_fast() to avoid compiler warning
storage/xtradb/handler/ha_innodb.cc:
Added cast
storage/xtradb/include/buf0buf.h:
Innodb has buf_block_get_frame(block) defined as (block)->frame.
Didn't want to do a big change to break xtradb as it may use block_get_frame() differently, so I mad this quick hack to patch one compiler warning.
Starting the SQL thread might deadlock with reading the values of the
replication filtering options.
The deadlock is due to a lock order violation when the variables are
read or set. For example, reading replicate_ignore_table first
acquires LOCK_global_system_variables in sys_var::value_ptr and later
acquires LOCK_active_mi in Sys_var_rpl_filter::global_value_ptr. This
violates the order established when starting a SQL thread, where
LOCK_active_mi is acquired before start_slave, and ends up creating a
thread (handle_slave_sql) that allocates a THD handle whose
constructor acquires LOCK_global_system_variables in THD::init.
The solution is to unlock LOCK_global_system_variables before the
replication filtering options are set or read. This way the lock
order is preserved and the data being read/set is still protected
given that it acquires LOCK_active_mi.
Problem description:
mysqlhotcopy fails if a view presents in the database.
Analysis:
Before 5.5 'FLUSH TABLES <tbl_name> ... WITH READ LOCK' will able
to get lock for all tables (i.e. base tables and view tables).
In 5.5 onwards 'FLUSH TABLES <tbl_name> ... WITH READ LOCK' for
'view tables' will not work, because taking flush locks on view
tables is not valid.
Fix:
Take flush lock for 'base tables' and read lock for 'view table'
separately.
Note: most of the patch has been backported from bug#13006947's patch
MASTER-MASTER AND USING SET USE
Problem:
=======
In a master-master set-up, a master can show a wrong
'SHOW SLAVE STATUS' output.
Requirements:
- master-master
- log_slave_updates
This is caused when using SET user-variables and then using
it to perform writes. From then on the master that performed
the insert will have a SHOW SLAVE STATUS that is wrong and
it will never get updated until a write happens on the other
master. On"Master A" the "exec_master_log_pos" is not
getting updated.
Analysis:
========
Slave receives a "User_var" event from the master and after
applying the event, when "log_slave_updates" option is
enabled the slave tries to write this applied event into
its own binary log. At the time of writing this event the
slave should use the "originating server-id". But in the
above case the sever always logs the "user var events"
by using its global server-id. Due to this in a
"master-master" replication when the event comes back to the
originating server the "User_var_event" doesn't get skipped.
"User_var_events" are context based events and they always
follow with a query event which marks their end of group.
Due to the above mentioned problem with "User_var_event"
logging the "User_var_event" never gets skipped where as
its corresponding "query_event" gets skipped. Hence the
"User_var" event always waits for the next "query event"
and the "Exec_master_log_position" does not get updated
properly.
Fix:
===
`MYSQL_BIN_LOG::write' function is used to write events
into binary log. Within this function a new object for
"User_var_log_event" is created and this new object is used
to write the "User_var" event in the binlog. "User var"
event is inherited from "Log_event". This "Log_event" has
different overloaded constructors. When a "THD" object
is present "Log_event(thd,...)" constructor should be used
to initialise the objects and in the absence of a valid
"THD" object "Log_event()" minimal constructor should be
used. In the above mentioned problem always default minimal
constructor was used which is incorrect. This minimal
constructor is replaced with "Log_event(thd,...)".
sql/log_event.h:
Replaced the default constructor with another constructor
which takes "THD" object as an argument.
The bug could caused a crash when the server executed a query with
ORDER by and sort_buffer_size was set to a small enough number.
It happened because the small sort buffer did not allow to allocate
all merge buffers in it.
Made sure that the allocated sort buffer would be big enough
to contain all possible merge buffers.
client/mysqldump.c:
Slave needs to be initialized with 0
dbug/dbug.c:
Removed not existing function
plugin/semisync/semisync_master.cc:
Fixed compiler warning
sql/opt_range.cc:
thd needs to be set early as it's used in some error conditions.
sql/sql_table.cc:
Changed to use uchar* to make array indexing portable
storage/innobase/handler/ha_innodb.cc:
Removed not used variable
storage/maria/ma_delete.c:
Fixed compiler warning
storage/maria/ma_write.c:
Fixed compiler warning
fix add_identifier() to distinguish between temporary tables (#sql- prefix) and temporary partitions (#TMP# suffix).
change add_identifier() to use the same name variant constants as sql_partition.cc does.
CONNECTIONS IF SPE
Problem description: -ssl-key value is not validated, you can assign any bogus
text to --ssl-key and it is not verified that it exists, and more importantly,
it allows the client to connect to mysqld.
Fix: Added proper validations checks for --ssl-key.
Note:
1) Documentation changes require for 5.1, 5.5, 5.6 and trunk in the sections
listed below and the details are :
http://dev.mysql.com/doc/refman/5.6/en/ssl-options.html#option_general_ssl
and
REQUIRE SSL section of
http://dev.mysql.com/doc/refman/5.6/en/grant.html
2) Client having with option '--ssl', should able to get ssl connection. This
will be implemented as part of separate fix in 5.6 and trunk.
When resolving outer fields, Item_field::fix_outer_fields()
creates new Item_refs for each execution of a prepared statement, so
these must be allocated in the runtime memroot. The memroot switching
before resolving JOIN::having causes these to be allocated in the
statement root, leaking memory for each PS execution.
sql/item_subselect.cc:
addon, fix for 11829691, item could be created in
runtime memroot, so we need to use real_item instead.
ROWS THAT ARE EXPECTED
For non range/list partitioned tables (i.e. HASH/KEY):
When prune_partitions finds a multi-range list
(or in this test '<>') for a field of the partition index,
even if it cannot make any use of the multi-range,
it will continue with the next field of the partition index
and use that for pruning (even if it the previous
field could not be used). This results in partitions is
pruned away, leaving partitions that only matches
the last field in the partition index, and will exclude
partitions which might match any previous fields.
Fixed by skipping rest of partitioning key fields/parts
if current key field/part could not be used.
Also notice it is the order of the fields in the CREATE TABLE
statement that triggers this bug, not the order of fields in
primary/unique key or PARTITION BY KEY ().
It must not be the last field in the partitioning expression that
is not equal (or have a non single point range).
I.e. the partitioning index is created with the same field order
as in the CREATE TABLE. And for the bug to appear
the last field must be a single point and some previous field
must be a multi-point range.
IN QUERIES
This bug was caused by an incorrect fix of
Bug#13807811 BTR_PCUR_RESTORE_POSITION() CAN SKIP A RECORD
There was nothing wrong with btr_pcur_restore_position(), but with the
use of it in the table scan during index creation.
rb:1206 approved by Jimmy Yang
WHEN STDIN IS A PIPE
Problem: Mysqlbinlog does not accept the input from STDIN when
STDIN is a pipe. This prevents the users from passing the input file
through a shell pipe.
Background: The my_seek() function does not check if the file descriptor
passed to it is regular (seekable) file. The check_header() function in
mysqlbinlog calls the my_b_seek() unconditionally and it fails when
the underlying file is a PIPE.
Resolution: We resolve this problem by checking if the underlying file
is a regular file by using my_fstat() before calling my_b_seek().
If the underlying file is not seekable we skip the call to my_b_seek()
in check_header().
client/mysqlbinlog.cc:
Added a check to avoid the my_b_seek() call if the
underlying file is a PIPE.
SHOW 2012 INSTEAD OF 2011
* Added a new macro to hold the current year :
COPYRIGHT_NOTICE_CURRENT_YEAR
* Modified ORACLE_WELCOME_COPYRIGHT_NOTICE macro
to take the initial year as parameter and pick
current year from the above mentioned macro.
AND LIBCRYPTO
Problem: libmysqlclient_r exports symbols from yaSSL library which
conflict with openSSL symbols. This issue is related to symbols
used by CURL library and are defined in taocrypt. Taocrypt has
dummy implementation of these functions. Due to this when a
program which uses libcurl library functions is compiled using
libmysqlclient_r and libcurl, it hits segmentation fault in
execution phase.
Solution: MySQL should not be exporting such symbols. However, these
functions are not used by MySQL code at all. So avoid compiling
them in the first place.
FOREVER MDL LOCK
Analysis:
----------
While granting MDL lock for the lock requests in wait queue,
first the lock is granted to the high priority lock types
and then to the low priority lock types.
MDL Priority Matrix,
+-------------+----+---+---+---+----+-----+
| Locks | | | | | | |
| has Priority| | | | | | |
| over ---> | S | SR| SW| SU| SNW| SNRW|
+-------------+----+---+---+---+----+-----+
| X | + | + | + | + | + | + |
+-------------|----|---|---|---|----|-----|
| SNRW | - | + | + | - | - | - |
+-------------|----|---|---|---|----|-----|
| SNW | - | - | + | - | - | - |
+-------------+----+---+---+---+----+-----+
Here '+' means, Lock priority is higher.
'-' means, Has same priority
In the scenario where,
*. Lock wait queue has requests of type S/SR/SW/SU.
*. And locks of high priority X/SNRW/SNW are requested
continuously.
In this case, while granting lock, always first high priority
lock requests(X/SNRW/SNW) are considered. Low priority
locks(S/SR/SW/SU) will not get chance and they will
wait forever.
In the scenario for which this bug is reported, application
executed many LOCK TABLES ... WRITE statements concurrently.
These statements request SNRW lock. Also there were some
connections trying to execute DML statements requesting SR
lock. Since SNRW lock request has higher priority (and as
they were too many waiting SNRW requests) lock is always
granted to it. So, lock request SR will wait forever, resulting
in DML starvation.
How is this handled in 5.1?
---------------------------
Even in 5.1 we have low priority lock starvation issue.
But, in 5.1 thread locking, system variable
"max_write_lock_count" can be configured to grant
some pending read lock requests. After
"max_write_lock_count" of write lock grants all the low
priority locks are granted.
Why this issue is seen in 5.5/trunk?
---------------------------------
In 5.5/trunk MDL locking, "max_write_lock_count" system
variable exists but not used in MDL, only thread lock uses
it. So no effect of "max_write_lock_count" in MDL locking.
This means that starvation of metadata locks is possible
even if max_write_lock_count is used.
Looks like, customer was using "max_write_lock_count" in
5.1 and when upgraded to 5.5, starvation is seen because
of not having effect of "max_write_lock_count" in MDL.
Fix:
----------
As a fix, support for max_write_lock_count is added to MDL.
To maintain write lock counter per MDL_lock object, new
member "m_hog_lock_count" is added in MDL_lock.
And following logic is added to increment the counter in
function reschedule_waiters,
(reschedule_waiters function is called while thread is
releasing the lock)
- After granting lock request from the wait queue.
- Check if there are any S/SR/SU/SW exists in the wait queue
- If yes then increment the "m_hog_lock_count"
And following logic is added in the same function to
handle pending S/SU/SR/SW locks
- Before granting locks
- Check if max_write_lock_count <= m_hog_lock_count
- If Yes, then try to grant S/SR/SW/SU locks.
(Since all of these has same priority, all locks are
granted together. But some lock grant may fail because
of grant incompatibility)
- Reset m_hog_lock_count if there no low priority lock
requests in wait queue.
- return
Note:
--------------------------
In the lock priority matrix explained above,
though X has priority over the SNW and SNRW. X locks is
taken mostly for RENAME, TRUNCATE, CREATE ... operations.
So lock type X may not be requested in loop continuously
in real world applications, as compared to other lock
request types. So, lock request of type SNW and SNRW are
not starved. So, we can grant all S/SR/SU/SW in one shot,
without considering SNW & SNRW lock request starvation.
ALTER table operations take SU lock first and then
upgrade to SNW if required. All S, SR, SW, SU have same
lock priority. So while granting SU, request of types
SR, SW, S are also granted in one shot. So, lock request
of type SU->SNW in loop will not make other low priority
lock request to starve.
But, when there is request for lock of type SNRW, lock
requests of lower priority types are not granted. And if
SNRW is requested in loop continuously then all
S, SR, SW, SU are starved.
This patch addresses the latter scenario.
When we have S/SR/SW/SU in wait queue and if
there are
- Continuous SNRW lock requests
- OR one or more X and Continuous SNRW lock requests.
- OR one SNW and Continuous SNRW lock requests.
- OR one SNW, one or more X and continuous SNRW lock
requests.
in wait queue then, S/SR/SW/SU lock request are starved.