1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-29 05:21:33 +03:00

MDEV-21117: refine the server binlog-based recovery for semisync

Problem:
=======
When the semisync master is crashed and restarted as slave it could
recover transactions that former slaves may never have seen.
A known method existed to clear out all prepared transactions
with --tc-heuristic-recover=rollback does not care to adjust
binlog accordingly.

Fix:
===
The binlog-based recovery is made to concern of the slave semisync role of
post-crash restarted server.
No changes in behavior is done to the "normal" binloggging server
and the semisync master.

When the restarted server is configured with
  --rpl-semi-sync-slave-enabled=1
the refined recovery attempts to roll back prepared transactions
and truncate binlog accordingly.
In case of a partially committed (that is committed at least
in one of the engine participants) such transaction gets committed.
It's guaranteed no (partially as well) committed transactions
exist beyond the truncate position.
In case there exists a non-transactional replication event
(being in a way a committed transaction) past the
computed truncate position the recovery ends with an error.

As after master crash and failover to slave, the demoted-to-slave
ex-master must be ready to face and accept its own (generated by)
events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are relaxed for the semisync slave
to accept own events without that option.
While gtid_strict_mode ON ensures no duplicate transaction can be
(re-)executed the master_use_gtid=none slave has to be
configured with --replicate-same-server-id.

*NOTE* for reviewers.

This patch does not handle the user XA which is done
in next git commit.
This commit is contained in:
Sujatha
2020-04-09 20:45:45 +05:30
committed by Andrei Elkin
parent 82c07b178a
commit 6c39eaeb12
25 changed files with 2619 additions and 127 deletions

View File

@ -44,6 +44,7 @@
#include <mysql/psi/mysql_table.h>
#include "sql_sequence.h"
#include "mem_root_array.h"
#include <utility> // pair
class Alter_info;
class Virtual_column_info;
@ -931,6 +932,32 @@ struct xid_t {
};
typedef struct xid_t XID;
/*
Enumerates a sequence in the order of
their creation that is in the top-down order of the index file.
Ranges from zero through MAX_binlog_id.
Not confuse the value with the binlog file numerical suffix,
neither with the binlog file line in the binlog index file.
*/
typedef uint Binlog_file_id;
const Binlog_file_id MAX_binlog_id= UINT_MAX;
/*
Compound binlog-id and byte offset of transaction's first event
in a sequence (e.g the recovery sequence) of binlog files.
Binlog_offset(0,0) is the minimum value to mean
the first byte of the first binlog file.
*/
typedef std::pair<Binlog_file_id, my_off_t> Binlog_offset;
/* binlog-based recovery transaction descriptor */
struct xid_recovery_member
{
my_xid xid;
uint in_engine_prepare; // number of engines that have xid prepared
bool decided_to_commit;
Binlog_offset binlog_coord; // semisync recovery binlog offset
};
/* for recover() handlerton call */
#define MIN_XID_LIST_SIZE 128
#define MAX_XID_LIST_SIZE (1024*128)
@ -5320,7 +5347,8 @@ int ha_commit_one_phase(THD *thd, bool all);
int ha_commit_trans(THD *thd, bool all);
int ha_rollback_trans(THD *thd, bool all);
int ha_prepare(THD *thd);
int ha_recover(HASH *commit_list);
int ha_recover(HASH *commit_list, MEM_ROOT *mem_root= NULL);
uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord= NULL);
/* transactions: these functions never call handlerton functions directly */
int ha_enable_transaction(THD *thd, bool on);
@ -5448,4 +5476,8 @@ int del_global_index_stat(THD *thd, TABLE* table, KEY* key_info);
int del_global_table_stat(THD *thd, const LEX_CSTRING *db, const LEX_CSTRING *table);
uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info);
bool non_existing_table_error(int error);
uint ha_count_rw_2pc(THD *thd, bool all);
uint ha_check_and_coalesce_trx_read_only(THD *thd, Ha_trx_info *ha_list,
bool all);
#endif /* HANDLER_INCLUDED */