Update conn->queue_connect_share in spider_check_trx_and_get_conn to
avoid use-after-free.
There are two branches in spider_check_trx_and_get_conn, often called
at the beginning of a spider DML, depending on whether an update of
various spider fields is needed. If it is determined to be needed, the
updating may NULL the connections associated with the spider handler,
which subsequently causes a call to spider_get_conn() which updates
conn->queued_connect_share with the SPIDER_SHARE associated with the
spider handler.
We make it so that conn->queued_connect_share is updated regardless of
the branch it enters, so that it will not be a stale and potentially
already freed one.
Each spider connection is identified with a connection key, which is
an encoding of the backend parameters.
The first byte of the key is by default 0, and in rare circumstances
it is changed to a different value: when semi_table_lock is set to 1;
and when using casual read. When this happens, often a new connection
is created with the new key. Neither case is useful: the description
of semi_table_lock has nothing to do with creation of new connections
and the parameter itself was deprecated for 10.7+ (MDEV-28829) and
marked for deletion (MDEV-28830); while new threads created by
non-zero spider_casual_read causes only threads to be idle, thus not
achieving any gain, see MDEV-26151, and the param has also been
deprecated in 11.5+ (MDEV-31789). The relevant code adds unnecessary
complexity to the spider code. This change does not reduce
parallelism, because already when bgs mode is on a background thread
is created per partition, and there is no evidence spider creates
multiple threads for one partition. If the needs of such cases arise
it will be a separate issue.
The conn_kind, which stands for "connection kind", is no longer useful
because the HandlerSocket support is deleted and Spider now has only
one connection kind, SPIDER_CONN_KIND_MYSQL. Remove conn_kind and
related code.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
Change the type of my_hash_get_key to:
1) Return const
2) Change the context parameter to be const void*
Also fix casting in hash adjacent areas.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
- document tmp_share, which are temporary spider shares with only one
link (no ha)
- simplify spider_get_sys_tables_connect_info() where link_idx is
always 0
A SPIDER_TRX_HA associated with a SPIDER_TRX could have longer
lifetime than its associated SPIDER_SHARE. And it is identified with
the associated table name. When the SPIDER_SHARE no longer valid, e.g.
when the associated spider table has been dropped and recreated, the
SPIDER_TRX_HA should be reset too.
Since spider could create a new SPIDER_SHARE with the exact same
address of a freed SPIDER_SHARE, we try to mark all SPIDER_TRX_HAs
associated with a SPIDER_SHARE invalid when the SPIDER_SHARE is about
to be freed.
Remove the dead-code, in Spider, which is related to the Spider's
HandlerSocket support. The code has been disabled for a long time
and it is unlikely that the code will be enabled.
- rm all files under storage/spider/hs_client/ except hs_compat.h
- rm storage/spider/spd_db_handlersocket.*
- unifdef -UHS_HAS_SQLCOM -UHAVE_HANDLERSOCKET \
-m storage/spider/spd_* storage/spider/ha_spider.* storage/spider/hs_client/*
- remove relevant files from storage/spider/CMakeLists.txt
Same as MDEV-29579. For some reason, libodbc does not clean up
properly if unloaded too early with the dlclose() of spider. So we add
UNIQUE symbols to spider so the spider does not reload in dlclose().
This change, however, uncovers some hidden problems in the spider
codebase, for which we move the initialisation of some spider global
variables into the initialisation of spider itself.
Spider has some global variables. Their initialisation should be done
in the initialisation of spider itself, otherwise, if spider were
re-initialised without these symbol being unloaded, the values could
be inconsistent and causing issues.
One such issue is caused by the variables
spider_mon_table_cache_version and spider_mon_table_cache_version_req.
They are used for resetting the spider monitoring table cache and have
initial values of 0 and 1 respectively. We have that always
spider_mon_table_cache_version_req >= spider_mon_table_cache_version,
and when the relation is strict, the cache is reset,
spider_mon_table_cache_version is brought to be equal to
spider_mon_table_cache_version_req, and the cache is searched for
matching table_name, db_name and link_idx. If the relation is equal,
no reset would happen and the cache would be searched directly.
When spider is re-inited without resetting the values of
spider_mon_table_cache_version and spider_mon_table_cache_version_req
that were set to be equal in the previous cache reset action, the
cache was emptied in the previous spider deinit, which would result in
HA_ERR_KEY_NOT_FOUND unexpectedly.
An alternative way to fix this issue would be to call the spider udf
spider_flush_mon_cache_table(), which increments
spider_mon_table_cache_version_req thus making sure the inequality is
strict. However, there's no reason for spider to initialise these
global variables on dlopen(), rather than on spider init, which is
cleaner and "purer".
To reproduce this issue, simply revert the changes involving the two
variables and then run:
mtr --no-reorder spider.ha{,_part}
This will avoid issues like MDEV-32486
IDs used in
- spider_alloc_calc_mem_init()
- spider_string::init_calc_mem()
- spider_malloc()
- spider_bulk_alloc_mem()
- spider_bulk_malloc()
Spider populates its lock lists (a hash) in store_lock(), and normally
clears them in the actual lock_tables(). However, if lock_tables()
fails, there's no reset_lock() method for storage engine handlers,
which can cause bad things to happen. For example, if one of the table
involved is dropped and recreated, or simply TRUNCATEd, when executing
LOCK TABLES again, the lock lists would be queried again in
store_lock(), which could cause access to freed space associated with
the dropped table.