The problem from a user point of view was that on Solaris the
time related functions (e.g. NOW(), SYSDATE(), etc) would always
return a fixed time.
This bug was happening due to a logic in the time retrieving
wrapper function which would only call the time() function every
half second. This interval between calls would be calculated
using the gethrtime() and the logic relied on the fact that time
returned by it is monotonic.
Unfortunately, due to bugs in the gethrtime() implementation,
there are some cases where the time returned by it can drift
(See Solaris bug id 6600939), potentially causing the interval
calculation logic to fail.
Since newer versions of Solaris (10+) have alleviated the
performance degradation associated with time(2), the solution is
to simply directly rely on time() at each invocation.
This simplification has an upside that it allows us to eliminate
a lock which was used to control access to the variables used
to track the half second interval, thus improving the overall
scalability of timekeeping related functions (e.g. NOW()).
Benchmarks runs have shown no significant degradation associated
with this change. With this, there are actually improvements in
performance for cases involving many connections.
In summary, the changes introduced by this patch are:
a) my_time() and my_micro_time_and_time() no longer use gethrtime().
Instead, time() and gettimeofdate() are used correspondingly.
b) my_micro_time() is changed to not use gethrtime() so as to
have the same time source as my_micro_time_and_time().
There shouldn't be any performance impact from this change
since this function is used only a few times during statement
execution and, on Solaris, gettimeofday() shows acceptable
performance.
Fixed the test case to be independent of build options used.
Removed the lowercase-table-names constraint, since performance schema tables are now in lowercase.
This is a code cleanup.
The implementation of a storage engine (subclasses of handler) is not supposed
to call my_error() directly inside the engine implementation,
but only return error codes, and report errors later at the demand
of the sql layer only (if needed), using handler::print_error().
This fix removes misplaced calls to my_error(),
and provide an implementation of print_error() instead.
Given that the sql layer implementation of create table, ha_create_table(),
does not use print_error() but returns ER_CANT_CREATE_TABLE directly,
the return code for create table statements using the performance schema
has changed to ER_CANT_CREATE_TABLE.
Adjusted the test suite accordingly.
Before this fix, the test thread_cache failed with spurious failures.
The test used:
-- disconnect X
-- connect Y
while assuming that connection Y would reuse connection X slot in the thread cache.
For this to happen, the disconnect X operation must be given enough time to complete,
otherwise connect Y can be executed in the server before X actually finishes.
This fix uses wait conditions to make the test execution more controlled,
and more reproductible.
Before this fix, the test myisam_file_io executed:
- (a) an update on setup_instrument to disable non myisam file io instruments
- (b) a truncate on events_waits_history_long
and later
- (c) a select on events_waits_history_long
Surprisingly, events that were supposed to be disabled in (a) and removed in (b)
still were found in (c).
This happened for events such as
wait/io/file/innodb/innodb_data_file fil0fil.c: sync
because the sync was started before (a) and completed after (b),
and as a consequence was added in the performance schema history, as expected.
Presence of these records in the history made the test fail.
This fix makes the test script more robust to account for extra spill waits records in (c).
This fix affects the test suite only.
Before this fix, performance schema tests dml_*.test could
fail with spurious failure, depending on the table content.
This fix simplifies the SELECT tests in the dml_*.test scripts,
to only verify that the SELECT operation passed the security checks
and succeeded, which was the original intent of the test.
Usage of
--replace_column 1 # 2 # 3 # 4 # ...
to discard the test output was replaced by a simpler and more maintainable
--disable_result_log
which also work for empty tables.
Text conflict in mysql-test/suite/perfschema/r/dml_setup_instruments.result
Text conflict in mysql-test/suite/perfschema/r/global_read_lock.result
Text conflict in mysql-test/suite/perfschema/r/server_init.result
Text conflict in mysql-test/suite/perfschema/t/global_read_lock.test
Text conflict in mysql-test/suite/perfschema/t/server_init.test
bug #57006 "Deadlock between HANDLER and FLUSH TABLES WITH READ
LOCK" and bug #54673 "It takes too long to get readlock for
'FLUSH TABLES WITH READ LOCK'".
The first bug manifested itself as a deadlock which occurred
when a connection, which had some table open through HANDLER
statement, tried to update some data through DML statement
while another connection tried to execute FLUSH TABLES WITH
READ LOCK concurrently.
What happened was that FTWRL in the second connection managed
to perform first step of GRL acquisition and thus blocked all
upcoming DML. After that it started to wait for table open
through HANDLER statement to be flushed. When the first connection
tried to execute DML it has started to wait for GRL/the second
connection creating deadlock.
The second bug manifested itself as starvation of FLUSH TABLES
WITH READ LOCK statements in cases when there was a constant
stream of concurrent DML statements (in two or more
connections).
This has happened because requests for protection against GRL
which were acquired by DML statements were ignoring presence of
pending GRL and thus the latter was starved.
This patch solves both these problems by re-implementing GRL
using metadata locks.
Similar to the old implementation acquisition of GRL in new
implementation is two-step. During the first step we block
all concurrent DML and DDL statements by acquiring global S
metadata lock (each DML and DDL statement acquires global IX
lock for its duration). During the second step we block commits
by acquiring global S lock in COMMIT namespace (commit code
acquires global IX lock in this namespace).
Note that unlike in old implementation acquisition of
protection against GRL in DML and DDL is semi-automatic.
We assume that any statement which should be blocked by GRL
will either open and acquires write-lock on tables or acquires
metadata locks on objects it is going to modify. For any such
statement global IX metadata lock is automatically acquired
for its duration.
The first problem is solved because waits for GRL become
visible to deadlock detector in metadata locking subsystem
and thus deadlocks like one in the first bug become impossible.
The second problem is solved because global S locks which
are used for GRL implementation are given preference over
IX locks which are acquired by concurrent DML (and we can
switch to fair scheduling in future if needed).
Important change:
FTWRL/GRL no longer blocks DML and DDL on temporary tables.
Before this patch behavior was not consistent in this respect:
in some cases DML/DDL statements on temporary tables were
blocked while in others they were not. Since the main use cases
for FTWRL are various forms of backups and temporary tables are
not preserved during backups we have opted for consistently
allowing DML/DDL on temporary tables during FTWRL/GRL.
Important change:
This patch changes thread state names which are used when
DML/DDL of FTWRL is waiting for global read lock. It is now
either "Waiting for global read lock" or "Waiting for commit
lock" depending on the stage on which FTWRL is.
Incompatible change:
To solve deadlock in events code which was exposed by this
patch we have to replace LOCK_event_metadata mutex with
metadata locks on events. As result we have to prohibit
DDL on events under LOCK TABLES.
This patch also adds extensive test coverage for interaction
of DML/DDL and FTWRL.
Performance of new and old global read lock implementations
in sysbench tests were compared. There were no significant
difference between new and old implementations.
Before this fix, the performance schema tables were defined in UPPERCASE.
This was incompatible with the lowercase_table_names option, and caused
issues with the install / upgrade process, when changing the lower case
table names setting *after* the install or upgrade.
With this fix, all performance schema tables are exposed with lowercase names.
As a result, the name of the performance schema table is always lowercase,
no matter how / if / when the lowercase_table_names setting if changed.
This change is to align the 5.5 performance_schema.THREADS
table definition with the 5.6 performance_schema.THREADS table,
to facilitate the 5.5 -> 5.6 migration later.
In the table performance_schema.THREADS:
- renamed ID to PROCESSLIST_ID, removed not null
- changed NAME from varchar(64) to varchar(128)
to match the columns definitions from 5.6
Adjusted the test cases accordingly.
Note: this fix is for 5.5 only, to null merge into 5.6
Before this fix, the test output for perfschema.server_init would
vary between executions, because some of the objects tested were
not guaranteed to exist in all configurations / code paths.
This fix removes these weak tests.
Also, comments referring to abandonned code have been cleaned up.
Before this fix, the server could crash inside a memcpy when reading data
from the EVENTS_WAITS_CURRENT / HISTORY / HISTORY_LONG tables.
The root cause is that the length used in a memcpy could be corrupted,
when another thread writes data in the wait record being read.
Reading unsafe data is ok, per design choice, and the code does sanitize
the data in general, but did not sanitize the length given to memcpy.
The fix is to also sanitize the schema name / object name / file name
length when extracting the data to produce a row.
With recent changes in the performance schema default sizing parameters,
the memory used by a mysqld binary increased accordingly.
This negatively affects the MTR test suite,
because running several tests in parallel now consumes more ressources.
The fix is to leave the default production values unchanged,
and to configure the MTR environment to limit memory
used when running tests in the test suite, which is ok
because only a few objects are typically used within a test script.
This fix:
- changed the default configuration in MTR to use less memory
- adjusted the performance schema tests accordingly
Note that 1,000 mutex instances was too short and caused test failures
in the past in team trees, so the default used is now 10,000 in MTR.
The amount of memory used by the performance schema itself
can be observed with the statement SHOW ENGINE PERFORMANCE_SCHEMA STATUS
Before this fix, the server did not recognize 'short' (as in -a)
options but only 'long' (as in --ansi) options
in the startup command line, due to earlier changes in 5.5
introduced for the performance schema.
The root cause is that handle_options() did not honor the
my_getopt_skip_unknown flag when parsing 'short' options.
The fix changes handle_options(), so that my_getopt_skip_unknown is
honored in all cases.
Note that there are limitations to this,
see the added doxygen documentation in handle_options().
The current usage of handle_options() by the server to
parse early performance schema options fits within the limitations.
This has been enforced by an assert for PARSE_EARLY options, for safety.
Before this fix, some tests failed due to lack of instrumentation slots
in the performance schema, because the default sizing was too low.
Now that more code has been instrumented, the default sizing has to be adjusted
to match the current instrumentation consumption.
This change:
- increases the number of rwlock classes from 20 to 30,
- increases the number of rwlock and mutex instances to 1 million.
Both are to account for the volume of data instrumented
when the innodb storage engine is used (because of the innodb buffer pool).
Adjusted the test output accordingly.
temp table
This patch introduces two key changes in the replication's behavior.
Firstly, it reverts part of BUG#51894 which puts any update to temporary tables
into the trx-cache. Now, updates to temporary tables are handled according to
the type of their engines as a regular table.
Secondly, an unsafe mixed statement, (i.e. a statement that access transactional
table as well non-transactional or temporary table, and writes to any of them),
are written into the trx-cache in order to minimize errors in the execution when
the statement logging format is in use.
Such changes has a direct impact on which statements are classified as unsafe
statements and thus part of BUG#53259 is reverted.
TABLES <list> WITH READ LOCK are incompatible".
The problem was that FLUSH TABLES <list> WITH READ LOCK
which was issued when other connection has acquired global
read lock using FLUSH TABLES WITH READ LOCK was blocked
and has to wait until global read lock is released.
This issue stemmed from the fact that FLUSH TABLES <list>
WITH READ LOCK implementation has acquired X metadata locks
on tables to be flushed. Since these locks required acquiring
of global IX lock this statement was incompatible with global
read lock.
This patch addresses problem by using SNW metadata type of
lock for tables to be flushed by FLUSH TABLES <list> WITH
READ LOCK. It is OK to acquire them without global IX lock
as long as we won't try to upgrade those locks. Since SNW
locks allow concurrent statements using same table FLUSH
TABLE <list> WITH READ LOCK now has to wait until old
versions of tables to be flushed go away after acquiring
metadata locks. Since such waiting can lead to deadlock
MDL deadlock detector was extended to take into account
waits for flush and resolve such deadlocks.
As a bonus code in open_tables() which was responsible for
waiting old versions of tables to go away was refactored.
Now when we encounter old version of table in open_table()
we don't back-off and wait for all old version to go away,
but instead wait for this particular table to be flushed.
Such approach supported by deadlock detection should reduce
number of scenarios in which FLUSH TABLES aborts concurrent
multi-statement transactions.
Note that active FLUSH TABLES <list> WITH READ LOCK still
blocks concurrent FLUSH TABLES WITH READ LOCK statement
as the former keeps tables open and thus prevents the
latter statement from doing flush.
The reason for the bug above is unclear but
- Modify pfs_upgrade so that it's result is easier to analyze in case something fails
- Fix several minor weaknesses which could cause that a successing test (either an
already existing or a to be developed one) fails because of imperfect cleanup,
too slow disconnected sessions etc.
should either fix the bug or reduce it's probability or at least
make the analysis of failures easier.
DATABASE with open HANDLER"
Remove LOCK_create_db, database name locks, and use metadata locks instead.
This exposes CREATE/DROP/ALTER DATABASE statements to the graph-based
deadlock detector in MDL, and paves the way for a safe, deadlock-free
implementation of RENAME DATABASE.
Database DDL statements will now take exclusive metadata locks on
the database name, while table/view/routine DDL statements take
intention exclusive locks on the database name. This prevents race
conditions between database DDL and table/view/routine DDL.
(e.g. DROP DATABASE with concurrent CREATE/ALTER/DROP TABLE)
By adding database name locks, this patch implements
WL#4450 "DDL locking: CREATE/DROP DATABASE must use database locks" and
WL#4985 "DDL locking: namespace/hierarchical locks".
The patch also changes code to use init_one_table() where appropriate.
The new lock_table_names() function requires TABLE_LIST::db_length to
be set correctly, and this is taken care of by init_one_table().
This patch also adds a simple template to help work with
the mysys HASH data structure.
Most of the patch was written by Konstantin Osipov.
locks for DML statements and changes the way MDL locks
are acquired/granted in contended case.
Instead of backing-off when a lock conflict is encountered
and waiting for it to go away before restarting open_tables()
process we now wait for lock to be released without releasing
any previously acquired locks. If conflicting lock goes away
we resume opening tables. If waiting leads to a deadlock we
try to resolve it by backing-off and restarting open_tables()
immediately.
As result both waiting for possibility to acquire and
acquiring of a metadata lock now always happen within the
same MDL API call. This has allowed to make release of a lock
and granting it to the most appropriate pending request an
atomic operation.
Thanks to this it became possible to wake up during release
of lock only those waiters which requests can be satisfied
at the moment as well as wake up only one waiter in case
when granting its request would prevent all other requests
from being satisfied. This solves thundering herd problem
which occured in cases when we were releasing some lock and
woke up many waiters for SNRW or X locks (this was the issue
in bug#52289 "performance regression for MyISAM in sysbench
OLTP_RW test".
This also allowed to implement more fair (FIFO) scheduling
among waiters with the same priority.
It also opens the door for introducing new types of requests
for metadata locks such as low-prio SNRW lock which is
necessary in order to support LOCK TABLES LOW_PRIORITY WRITE.
Notice that after this sometimes can report ER_LOCK_DEADLOCK
error in cases in which it has not happened before.
Particularly we will always report this error if waiting for
conflicting lock has happened in the middle of transaction
and resulted in a deadlock. Before this patch the error was
not reported if deadlock could have been resolved by backing
off all metadata locks acquired by the current statement.
Clarified error messages related to unsafe statements:
- avoid the internal technical term "row injection"
- use 'binary log' instead of 'binlog'
- avoid the word 'unsafeness'