From 694a7101d4d041508ff1c30af936aa5a4400f6e4 Mon Sep 17 00:00:00 2001
From: Justin <8886628+justinmir@users.noreply.github.com>
Date: Mon, 24 Mar 2025 06:28:20 -0700
Subject: [PATCH] Make MASTERDOWN a retriable error in RedisCluster client
 (#3164)

When clusters are running with `replica-server-stale-data no`, replicas
will return a MASTERDOWN error under two conditions:
  1. The primary has failed and we are not serving requests.
  2. A replica has just started and has not yet synced from the primary.

The former, primary has failed and we are not serving requests, is
similar to a CLUSTERDOWN error and should be similarly retriable.

When a replica has just started and has not yet synced from the primary
the request should be retried on other available nodes in the shard.
Otherwise a percentage of the read requests to the shard will fail.

Examples when `replica-server-stale-data no` is enabled:
  1. In a cluster using `ReadOnly` with a single read replica, every
     read request will return errors to the client because MASTERDOWN is
     not a retriable error.
  2. In a cluster using `RouteRandomly` a percentage of the requests
     will return errors to the client based on if this server was
     selected.

Co-authored-by: Nedyalko Dyakov <nedyalko.dyakov@gmail.com>
---
 error.go | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/error.go b/error.go
index ec2224c0..6f47f7cf 100644
--- a/error.go
+++ b/error.go
@@ -75,6 +75,9 @@ func shouldRetry(err error, retryTimeout bool) bool {
 	if strings.HasPrefix(s, "READONLY ") {
 		return true
 	}
+	if strings.HasPrefix(s, "MASTERDOWN ") {
+		return true
+	}
 	if strings.HasPrefix(s, "CLUSTERDOWN ") {
 		return true
 	}