1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00
Commit Graph

53 Commits

Author SHA1 Message Date
Oleksandr Byelkin
043d69bbcc Merge branch '10.5' into 10.6 2023-05-03 09:51:25 +02:00
Oleksandr Byelkin
d821fd7fab Merge branch 'merge-perfschema-5.7' into 10.5 2023-04-28 08:22:17 +02:00
Oleksandr Byelkin
512dbc4527 5.7.42 (only copyright year in all files changed) 2023-04-28 08:09:26 +02:00
Marko Mäkelä
e441c32a0b Merge 10.5 into 10.6 2023-01-03 18:13:11 +02:00
Marko Mäkelä
8b9b4ab3f5 Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
musvaage
e9e6c7a3c5 header typos 2022-12-20 08:55:48 +11:00
Oleksandr Byelkin
d2f1c3ed6c Merge branch '10.5' into bb-10.6-release 2022-08-03 12:19:59 +02:00
Oleksandr Byelkin
b043e1098e Merge branch 'merge-perfschema-5.7' into 10.5 2022-08-02 09:34:15 +02:00
Oleksandr Byelkin
61d08f7427 mysql-5.7.39 2022-07-29 14:48:01 +02:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
880d543554 Merge branch 'merge-perfschema-5.7' into 10.5 2022-01-28 11:57:52 +01:00
Sergei Golubchik
0b116d160a 5.7.34 2021-05-03 11:22:07 +02:00
Marko Mäkelä
80ac9ec1cc MDEV-24973 Performance schema duplicates rarely executed code for mutex operations
The PERFORMANCE_SCHEMA wrapper for mutex and rw-lock operations is
causing a lot of unlikely code to be inlined in each invocation.
The impact of this may have been emphasized in MariaDB 10.6, because
InnoDB now uses the common implementation of mutexes and condition
variables (MDEV-21452).

By default, we build with cmake -DPLUGIN_PERFSCHEMA enabled,
but at runtime no instrumentation will be enabled. Similar to
commit eba2d10ac5
we had better avoid inlining the rarely executed code in order to reduce
the code size and to improve the efficiency of the instruction cache.

This change was extensively tested by Axel Schwenke with and without
--enable-performance-schema (with no individual instruments enabled).
Removing the inline functions did not cause any performance regression
in either case. There seemed to be a tiny improvement, possibly due
to reduced code size and better instruction cache hit rate.
2021-03-02 14:32:37 +02:00
Marko Mäkelä
7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00
Sergei Golubchik
7af733a5a2 perfschema compilation, test and misc fixes 2020-03-10 19:24:23 +01:00
Sergei Golubchik
0ea717f51a P_S 5.7.28 2020-03-10 19:24:22 +01:00
Oleksandr Byelkin
ade89fc898 Merge branch '10.2' into 10.3 2020-01-21 09:11:14 +01:00
Oleksandr Byelkin
3a1716a7e7 Merge branch '10.1' into 10.2 2020-01-20 16:15:05 +01:00
Oleksandr Byelkin
10eacd5ff7 Merge branch 'merge-perfschema-5.6' into 10.1 2020-01-19 13:11:45 +01:00
Oleksandr Byelkin
3aff3f3679 5.6.47 2020-01-19 12:52:07 +01:00
Sergei Golubchik
36ebd704de 5.7.28 2019-12-11 21:39:26 +01:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
f177f125d4 Merge branch '5.5' into 10.1 2019-05-11 19:15:57 +03:00
Vicențiu Ciorbaru
15f1e03d46 Follow-up to changing FSF address
Some places didn't match the previous rules, making the Floor
address wrong.

Additional sed rules:

sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
2019-05-11 18:30:45 +03:00
Michael Widenius
70c1110a29 Optimize performance schema likely/unlikely
Performance schema likely/unlikely assume that performance schema
is enabled by default, which causes a performance degradation for
default installations that doesn't have performance schema enabled.

Fixed by changing the likely/unlikely in PS to assume it's
not enabled. This can be changed by compiling with
-DPSI_ON_BY_DEFAULT

Other changes:
- Added psi_likely/psi_unlikely that is depending on
  PSI_ON_BY_DEFAULT. psi_likely() is assumed to be true
  if PS is enabled.
- Added likely/unlikely to some PS interface code.
- Moved pfs_enabled to mysys (was initialized but not used before)
- Added "if (pfs_likely(pfs_enabled))" around calls to PS to avoid
  an extra call if PS is not enabled.
- Moved checking flag_global_instrumention before other flags
  to speed up the case when PS is not enabled.
2018-05-07 00:07:33 +03:00
luz.paz
3dd01669b4 Misc. typos
Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt  ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`
2018-04-05 15:26:57 +04:00
Sergei Golubchik
e52a237fe9 remove ifdefs around PSI_THREAD_CALL
same change as for PSI_TABLE_CALL
2018-01-09 14:21:20 +03:00
Sergei Golubchik
15f60c1a73 5.7.13 2016-07-28 15:52:12 +02:00
Sergei Golubchik
51ed64a520 5.6.31 2016-06-21 14:22:52 +02:00
Vladislav Vaintroub
b436db98fe Fix compilation 2016-02-10 00:20:23 +01:00
Sergey Vojtovich
e562b43222 MDEV-8111 - remove "fast mutexes"
They aren't faster than normal mutexes. They're disabled by default for years,
so de facto it's dead code, never used.
2015-11-26 11:34:17 +04:00
Sergei Golubchik
374f0751a2 null-merge from perfschema-5.6 merge tree
(only new files and small style changes are accepted)
2014-05-07 10:21:41 +02:00
Sergei Golubchik
04bce7b569 5.6.17 2014-05-07 10:04:30 +02:00
Sergei Golubchik
7226287c06 perfschema 5.6.10 initial commit.
include/mysql/psi/*
2014-05-07 10:02:35 +02:00
Michael Widenius
e63c03db8d Merge with 10.0-base
Automatic merge, except for server_audit.cc that had to be modified slightly
Changes to xtradb and innobase where ignored was these made no sence for 10.0
2014-03-13 16:43:11 +02:00
Michael Widenius
b07f9f72dc Fixed MDEV-5780 "create-big fails in 10.0"
The issue was that create...trigger part of the test suite used a debug_sync point that before was never triggered (in other words, wrong meaningless test).
With the new create ... replace code the debug sync point is triggered and the test case could not handled that.

I fixed this by adding a wait and go for the debug syncpoint in the test.

Removed some compiler warnings from mysql_cond_timedwait


include/mysql/psi/mysql_thread.h:
  Removed compiler warnings
mysql-test/r/create-big.result:
  New test result
mysql-test/t/create-big.test:
  Fixed test case as create_table_select_before_check_if_exists was not before triggered by the code.
2014-03-10 14:08:12 +02:00
Sergei Golubchik
0dc23679c8 10.0-base merge 2014-02-26 15:28:07 +01:00
Sergei Golubchik
84651126c0 MySQL-5.5.36 merge
(without few incorrect bugfixes and with 1250 files where only a copyright year was changed)
2014-02-17 11:00:51 +01:00
Marc Alff
63819ccb34 Bug#17702677 WRONG INSTRUMENTATION INTERFACE FOR MYSQL_COND_TIMEDWAIT
The pthread_cond_timedwait(3P) api
uses a const struct timespec for parameter 3.

The instrumentation api for the same, mysql_cond_timedwait,
which expands to inline_mysql_cond_timedwait,
should also take a const parameter for the timespec.

This fix add the missing const to inline_mysql_cond_timedwait.
2013-11-06 10:22:00 +01:00
Michael Widenius
068c61978e Temporary commit of 10.0-merge 2013-03-26 00:03:13 +02:00
Michael Widenius
edc89f7511 Buildbot fixes and cleanups:
- Added --verbose to BUILD scripts to get make to write out compile commands.
- Detect if AM_EXTRA_MAKEFLAGS=VERBOSE=1 was used with build scripts.
- Don't write warnings about replication variables when doing bootstrap.
- Fixed that mysql_cond_wait() and mysql_cond_timedwait() will report original source file in case of errors.
- Ignore some compiler warnings

BUILD/FINISH.sh:
  Detect if AM_EXTRA_MAKEFLAGS=VERBOSE=1 or --verbose was used
BUILD/SETUP.sh:
  Added --verbose to print out the full compile lines
  Updated help message
client/mysqltest.cc:
  Fixed that one can use 'replace' with cat_file
cmake/configure.pl:
  If --verbose is used, get make to write out compile commands
debian/dist/Debian/rules:
  Added $AM_EXTRA_MAKEFLAGS to get VERBOSE=1 on buildbot builds
debian/dist/Ubuntu/rules:
  Added $AM_EXTRA_MAKEFLAGS to get VERBOSE=1 on buildbot builds
include/my_pthread.h:
  Made set_timespec_time_nsec() more portable.
include/mysql/psi/mysql_thread.h:
  Fixed that mysql_cond_wait() and mysql_cond_timedwait() will report original source file in case of errors.
mysql-test/suite/innodb/r/auto_increment_dup.result:
  Fixed wrong DBUG_SYNC
mysql-test/suite/innodb/t/auto_increment_dup.test:
  Fixed wrong DBUG_SYNC
mysql-test/suite/perfschema/include/upgrade_check.inc:
  Make test more portable for changes in *.sql files
mysql-test/suite/perfschema/r/pfs_upgrade.result:
  Updated test results
mysql-test/valgrind.supp:
  Ignore running Aria checkpoint thread
scripts/mysqlaccess.sh:
  Changed reference of bugs database
  Ensure that also client-server group is read.
sql/handler.cc:
  Added missing syncpoint
sql/mysqld.cc:
  Don't write warnings about replication variables when doing bootstrap
sql/mysqld.h:
  Don't write warnings about replication variables when doing bootstrap
sql/rpl_rli.cc:
  Don't write warnings about replication variables when doing bootstrap
sql/sql_insert.cc:
  Don't mask SERVER_SHUTDOWN in insert_delayed
  This is done to be able to distingush between shutdown and interrupt errors
support-files/compiler_warnings.supp:
  Ignore some compiler warnings in xtradb,innobase, oqgraph, yassl, string3.h
2013-01-11 02:03:43 +02:00
Michael Widenius
1d0f70c2f8 Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6 2012-08-01 17:27:34 +03:00
Sergei Golubchik
86a2d70779 safe_mutex deadlock detector post-merge fixes 2011-10-19 21:53:14 +02:00
Sergei Golubchik
9809f05199 5.5-merge 2011-07-02 22:08:51 +02:00
Sergei Golubchik
0accbd0364 lots of post-merge changes 2011-04-25 17:22:25 +02:00
Dmitry Lenev
0afd0a18fe A better fix for bug #56405 "Deadlock in the MDL deadlock
detector" that doesn't introduce bug #56715 "Concurrent
transactions + FLUSH result in sporadical unwarranted
deadlock errors".

Deadlock could have occurred when workload containing a mix
of DML, DDL and FLUSH TABLES statements affecting the same
set of tables was executed in a heavily concurrent environment.

This deadlock occurred when several connections tried to
perform deadlock detection in the metadata locking subsystem.
The first connection started traversing wait-for graph,
encountered a sub-graph representing a wait for flush, acquired
LOCK_open and dived into sub-graph inspection. Then it
encountered sub-graph corresponding to wait for metadata lock
and blocked while trying to acquire a rd-lock on
MDL_lock::m_rwlock, since some,other thread had a wr-lock on it.
When this wr-lock was released it could have happened (if there
was another pending wr-lock against this rwlock) that the rd-lock
from the first connection was left unsatisfied but at the same
time the new rd-lock request from the second connection sneaked
in and was satisfied (for this to be possible the second
rd-request should come exactly after the wr-lock is released but
before pending the wr-lock manages to grab rwlock, which is
possible both on Linux and in our own rwlock implementation).
If this second connection continued traversing the wait-for graph
and encountered a sub-graph representing a wait for flush it tried
to acquire LOCK_open and thus the deadlock was created.

The previous patch tried to workaround this problem by not
allowing the deadlock detector to lock LOCK_open mutex if
some other thread doing deadlock detection already owns it
and current search depth is greater than 0. Instead deadlock
was reported. As a result it has introduced bug #56715.

This patch solves this problem in a different way.
It introduces a new rw_pr_lock_t implementation to be used
by MDL subsystem instead of one based on Linux rwlocks or
our own rwlock implementation. This new implementation
never allows situation in which an rwlock is rd-locked and
there is a blocked pending rd-lock. Thus the situation which
has caused this bug becomes impossible with this implementation.

Due to fact that this implementation is optimized for
wr-lock/unlock scenario which is most common in the MDL
subsystem it doesn't introduce noticeable performance
regressions in sysbench tests. Moreover it significantly
improves situation for POINT_SELECT test when many
connections are used.

No test case is provided as this bug is very hard to repeat
in MTR environment but is repeatable with the help of RQG
tests.
This patch also doesn't include a test for bug #56715
"Concurrent transactions + FLUSH result in sporadical
unwarranted deadlock errors" as it takes too much time to
be run as part of normal test-suite runs.

config.h.cmake:
  We no longer need to check for presence of
  pthread_rwlockattr_setkind_np as we no longer
  use Linux-specific implementation of rw_pr_lock_t
  which uses this function.
configure.cmake:
  We no longer need to check for presence of
  pthread_rwlockattr_setkind_np as we no longer
  use Linux-specific implementation of rw_pr_lock_t
  which uses this function.
configure.in:
  We no longer need to check for presence of
  pthread_rwlockattr_setkind_np as we no longer
  use Linux-specific implementation of rw_pr_lock_t
  which uses this function.
include/my_pthread.h:
  Introduced new implementation of rw_pr_lock_t.
  Since it never allows situation in which rwlock is rd-locked
  and there is a blocked pending rd-lock it is not affected by
  bug #56405 "Deadlock in the MDL deadlock detector".
  This implementation is also optimized for wr-lock/unlock
  scenario which is most common in MDL subsystem. So it doesn't
  introduce noticiable performance regressions in sysbench tests
  (compared to old Linux-specific implementation). Moreover it
  significantly improves situation for POINT_SELECT test when
  many connections are used.
  As part of this change removed try-lock part of API for
  this type of lock. It is not used in our code and it would
  be hard to implement correctly within constraints of new
  implementation.
  Finally, removed support of preferring readers from
  my_rw_lock_t implementation as the only user of this
  feature was old rw_pr_lock_t implementation.
include/mysql/psi/mysql_thread.h:
  Removed try-lock part of prlock API.
  It is not used in our code and it would be hard
  to implement correctly within constraints of new
  prlock implementation.
mysys/thr_rwlock.c:
  Introduced new implementation of rw_pr_lock_t.
  Since it never allows situation in which rwlock is rd-locked
  and there is a blocked pending rd-lock it is not affected by
  bug #56405 "Deadlock in the MDL deadlock detector".
  This implementation is also optimized for wr-lock/unlock
  scenario which is most common in MDL subsystem. So it doesn't
  introduce noticiable performance regressions in sysbench tests
  (compared to old Linux-specific implementation). Moreover it
  significantly improves situation for POINT_SELECT test when
  many connections are used.
  Also removed support of preferring readers from
  my_rw_lock_t implementation as the only user of this
  feature was old rw_pr_lock_t implementation.
2010-09-29 16:09:07 +04:00
Konstantin Osipov
265a6edd23 A pre-requisite patch for the fix for Bug#52044.
Implement a few simple asserts in my_rwlock_t locks.

include/my_pthread.h:
  Declare two simple assert functions.
include/mysql/psi/mysql_thread.h:
  Add wrappers for new assert functions.
mysys/thr_rwlock.c:
  Add asserts.
sql/sql_base.cc:
  Silence a compiler warning for the case when
  SAFE_MUTEX is not ON.
2010-08-11 01:12:01 +04:00
Marc Alff
ec41287630 Bug#55087 Performance schema: optimization of the instrumentation interface
This change is for performance optimization.

Fixed the performance schema instrumentation interface as follows:
- simplified mysql_unlock_mutex()
- simplified mysql_unlock_rwlock()
- simplified mysql_cond_signal()
- simplified mysql_cond_broadcast()

Changed the get_thread_XXX_locker apis to have one extra parameter,
to provide memory to the instrumentation implementation.
This API change allows to use memory provided by the caller,
to avoid having to use thread local storage.
Using this extra parameter will be done in a separate fix,
this change is for the interface only.

Adjusted all the code and unit tests accordingly.
2010-07-09 17:00:24 -06:00
Marc Alff
6e3fb7bb07 Fixed headers in include/mysql/psi 2010-07-08 11:04:07 -06:00