When clusters are running with `replica-server-stale-data no`, replicas
will return a MASTERDOWN error under two conditions:
1. The primary has failed and we are not serving requests.
2. A replica has just started and has not yet synced from the primary.
The former, primary has failed and we are not serving requests, is
similar to a CLUSTERDOWN error and should be similarly retriable.
When a replica has just started and has not yet synced from the primary
the request should be retried on other available nodes in the shard.
Otherwise a percentage of the read requests to the shard will fail.
Examples when `replica-server-stale-data no` is enabled:
1. In a cluster using `ReadOnly` with a single read replica, every
read request will return errors to the client because MASTERDOWN is
not a retriable error.
2. In a cluster using `RouteRandomly` a percentage of the requests
will return errors to the client based on if this server was
selected.
Co-authored-by: Nedyalko Dyakov <nedyalko.dyakov@gmail.com>
* fix: recycle connections in some Redis Cluster scenarios
This issue was surfaced in a Cloud Provider solution that used for
rolling out new nodes using the same address (hostname) of the nodes
that will be replaced in a Redis Cluster, while the former ones once
depromoted as Slaves would continue in service during some mintues
for redirecting traffic.
The solution basically identifies when the connection could be stale
since a MOVED response will be returned using the same address (hostname)
that is being used by the connection. At that moment we consider the
connection as no longer usable forcing to recycle the connection.