The replication slave sets first error 1913 and immediately after error
1595. Thus it is possible, but unlikely, to get 1913. The original test
seems to realise this, but uses an invalid error code - my guess is
that this was a temporary code used in a feature tree, which was then
forgotten to be fixed when merged to main. The removed "1923" is
something committed by mistake during tests.
The previous patch partially fixed things by waiting for the old dump thread
on the master to exit before injecting the DBUG error. This prevents the error
injection going to the wrong thread.
However, there is still the problem that the old dump thread may never exit,
causing the wait to time out. This happens if the dump thread manages to write
all events down the socket before the socket is closed by the slave. The
master dump thread only checks for slave gone when writing a new event, so if
no new events are generated, old dump threads can hang around forever on the
master after the slave disconnects.
Fix by explicitly killing the old dump thread if it is still around.
There is a potential race when we stop the slave. It may take some time for
the master to detect that the slave connection is closed (eg. if scheduling
delays the TCP RSET packet or whatever). Since we inject only a single corrupt
binlog event, we may be unfortunate enough to inject it on the wrong
connection, to a slave io thread that's already stopped.
Fix by waiting for the old dump thread on the master to go away before
injecting the corrupt event.