1
0
mirror of https://github.com/redis/go-redis.git synced 2025-11-26 06:23:09 +03:00
Commit Graph

280 Commits

Author SHA1 Message Date
Nedyalko Dyakov
042610b79d fix(conn): conn to have state machine (#3559)
* wip

* wip, used and unusable states

* polish state machine

* correct handling OnPut

* better errors for tests, hook should work now

* fix linter

* improve reauth state management. fix tests

* Update internal/pool/conn.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* better timeouts

* empty endpoint handoff case

* fix handoff state when queued for handoff

* try to detect the deadlock

* try to detect the deadlock x2

* delete should be called

* improve tests

* fix mark on uninitialized connection

* Update internal/pool/conn_state_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn_state_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/pool.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn_state.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix error from copilot

* address copilot comment

* fix(pool): pool performance  (#3565)

* perf(pool): replace hookManager RWMutex with atomic.Pointer and add predefined state slices

- Replace hookManager RWMutex with atomic.Pointer for lock-free reads in hot paths
- Add predefined state slices to avoid allocations (validFromInUse, validFromCreatedOrIdle, etc.)
- Add Clone() method to PoolHookManager for atomic updates
- Update AddPoolHook/RemovePoolHook to use copy-on-write pattern
- Update all hookManager access points to use atomic Load()

Performance improvements:
- Eliminates RWMutex contention in Get/Put/Remove hot paths
- Reduces allocations by reusing predefined state slices
- Lock-free reads allow better CPU cache utilization

* perf(pool): eliminate mutex overhead in state machine hot path

The state machine was calling notifyWaiters() on EVERY Get/Put operation,
which acquired a mutex even when no waiters were present (the common case).

Fix: Use atomic waiterCount to check for waiters BEFORE acquiring mutex.
This eliminates mutex contention in the hot path (Get/Put operations).

Implementation:
- Added atomic.Int32 waiterCount field to ConnStateMachine
- Increment when adding waiter, decrement when removing
- Check waiterCount atomically before acquiring mutex in notifyWaiters()

Performance impact:
- Before: mutex lock/unlock on every Get/Put (even with no waiters)
- After: lock-free atomic check, only acquire mutex if waiters exist
- Expected improvement: ~30-50% for Get/Put operations

* perf(pool): use predefined state slices to eliminate allocations in hot path

The pool was creating new slice literals on EVERY Get/Put operation:
- popIdle(): []ConnState{StateCreated, StateIdle}
- putConn(): []ConnState{StateInUse}
- CompareAndSwapUsed(): []ConnState{StateIdle} and []ConnState{StateInUse}
- MarkUnusableForHandoff(): []ConnState{StateInUse, StateIdle, StateCreated}

These allocations were happening millions of times per second in the hot path.

Fix: Use predefined global slices defined in conn_state.go:
- validFromInUse
- validFromCreatedOrIdle
- validFromCreatedInUseOrIdle

Performance impact:
- Before: 4 slice allocations per Get/Put cycle
- After: 0 allocations (use predefined slices)
- Expected improvement: ~30-40% reduction in allocations and GC pressure

* perf(pool): optimize TryTransition to reduce atomic operations

Further optimize the hot path by:
1. Remove redundant GetState() call in the loop
2. Only check waiterCount after successful CAS (not before loop)
3. Inline the waiterCount check to avoid notifyWaiters() call overhead

This reduces atomic operations from 4-5 per Get/Put to 2-3:
- Before: GetState() + CAS + waiterCount.Load() + notifyWaiters mutex check
- After: CAS + waiterCount.Load() (only if CAS succeeds)

Performance impact:
- Eliminates 1-2 atomic operations per Get/Put
- Expected improvement: ~10-15% for Get/Put operations

* perf(pool): add fast path for Get/Put to match master performance

Introduced TryTransitionFast() for the hot path (Get/Put operations):
- Single CAS operation (same as master's atomic bool)
- No waiter notification overhead
- No loop through valid states
- No error allocation

Hot path flow:
1. popIdle(): Try IDLE → IN_USE (fast), fallback to CREATED → IN_USE
2. putConn(): Try IN_USE → IDLE (fast)

This matches master's performance while preserving state machine for:
- Background operations (handoff/reauth use UNUSABLE state)
- State validation (TryTransition still available)
- Waiter notification (AwaitAndTransition for blocking)

Performance comparison per Get/Put cycle:
- Master: 2 atomic CAS operations
- State machine (before): 5 atomic operations (2.5x slower)
- State machine (after): 2 atomic CAS operations (same as master!)

Expected improvement: Restore to baseline ~11,373 ops/sec

* combine cas

* fix linter

* try faster approach

* fast semaphore

* better inlining for hot path

* fix linter issues

* use new semaphore in auth as well

* linter should be happy now

* add comments

* Update internal/pool/conn_state.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comment

* slight reordering

* try to cache time if for non-critical calculation

* fix wrong benchmark

* add concurrent test

* fix benchmark report

* add additional expect to check output

* comment and variable rename

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* initConn sets IDLE state

- Handle unexpected conn state changes

* fix precision of time cache and usedAt

* allow e2e tests to run longer

* Fix broken initialization of idle connections

* optimize push notif

* 100ms -> 50ms

* use correct timer for last health check

* verify pass auth on conn creation

* fix assertion

* fix unsafe test

* fix benchmark test

* improve remove conn

* re doesn't support requirepass

* wait more in e2e test

* flaky test

* add missed method in interface

* fix test assertions

* silence logs and faster hooks manager

* address linter comment

* fix flaky test

* use read instad of control

* use pool size for semsize

* CAS instead of reading the state

* preallocate errors and states

* preallocate state slices

* fix flaky test

* fix fast semaphore that could have been starved

* try to fix the semaphore

* should properly notify the waiters

- this way a waiter that timesout at the same time
a releaser is releasing, won't throw token. the releaser
will fail to notify and will pick another waiter.

this hybrid approach should be faster than channels and maintains FIFO

* waiter may double-release (if closed/times out)

* priority of operations

* use simple approach of fifo waiters

* use simple channel based semaphores

* address linter and tests

* remove unused benchs

* change log message

* address pr comments

* address pr comments

* fix data race

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-11 17:38:29 +02:00
Nedyalko Dyakov
a15e76394c fix(pool): Pool ReAuth should not interfere with handoff (#3547)
* fix(pool): wip, pool reauth should not interfere with handoff

* fix credListeners map

* fix race in tests

* better conn usable timeout

* add design decision comment

* few small improvements

* update marked as queued

* add Used to clarify the state of the conn

* rename test

* fix(test): fix flaky test

* lock inside the listeners collection

* address pr comments

* Update internal/auth/cred_listeners.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/buffer_size_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* wip refactor entraid

* fix maintnotif pool hook

* fix mocks

* fix nil listener

* sync and async reauth based on conn lifecycle

* be able to reject connection OnGet

* pass hooks so the tests can observe reauth

* give some time for the background to execute commands

* fix tests

* only async reauth

* Update internal/pool/pool.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/auth/streaming/pool_hook.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore(redisotel): use metric.WithAttributeSet to avoid copy (#3552)

In order to improve performance replace `WithAttributes` with `WithAttributeSet`.
This avoids the slice allocation and copy that is done in `WithAttributes`.

For more information see https://github.com/open-telemetry/opentelemetry-go/blob/v1.38.0/metric/instrument.go#L357-L376

* chore(docs): explain why MaxRetries is disabled for ClusterClient (#3551)

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>

* exponential backoff

* address pr comments

* address pr comments

* remove rlock

* add some comments

* add comments

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Warnar Boekkooi <wboekkooi@impossiblecloud.com>
Co-authored-by: Justin <justindsouza80@gmail.com>
2025-10-22 12:45:30 +03:00
Nedyalko Dyakov
75ddeb3d5a feat(e2e-testing): maintnotifications e2e and refactor (#3526)
* e2e wip

* cleanup

* remove unused fault injector mock

* errChan in test

* remove log messages tests

* cleanup log messages

* s/hitless/maintnotifications/

* fix moving when none

* better logs

* test with second client after action has started

* Fixes

Signed-off-by: Elena Kolevska <elena@kolevska.com>

* Test fix

Signed-off-by: Elena Kolevska <elena@kolevska.com>

* feat(e2e-test): Extended e2e tests

* imroved e2e test resiliency

---------

Signed-off-by: Elena Kolevska <elena@kolevska.com>
Co-authored-by: Elena Kolevska <elena@kolevska.com>
Co-authored-by: Elena Kolevska <elena-kolevska@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-09-26 19:17:09 +03:00
cxljs
113a18ae75 fix: pipeline repeatedly sets the error (#3525)
* fix: pipeline repeatedly sets the error

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

* add test

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

* CI

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

---------

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>
Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-09-17 17:32:24 +03:00
Nedyalko Dyakov
0ef6d0727d feat: RESP3 notifications support & Hitless notifications handling [CAE-1088] & [CAE-1072] (#3418)
- Adds support for handling push notifications with RESP3. 
- Using this support adds handlers for hitless upgrades.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-09-10 22:18:01 +03:00
Nedyalko Dyakov
a264ffb8a4 fix: SetErr on Cmd if the command cannot be queued correctly in multi/exec (#3509)
* set error if queued fails

* try fix for cluster

* add errors to cmds in pipeline if about to be returned
2025-09-09 18:45:37 +03:00
cxljs
375fa5d083 chore(doc): improve code readability (#3446)
- replace two similar functions `appendUniqueNode` and `appendIfNotExists` with a generic function.

- simplify the implementation of the `get` method in `clusterNodes`

- keep the member name `_generation` of `clusterNodes` consistent with other types.

- rename a data member `_masterAddr` to `masterAddr`.

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>
2025-08-04 17:22:16 +03:00
cxljs
82b00cc520 chore: remove a redundant method (#3401)
Signed-off-by: fukua95 <fukua95@gmail.com>
Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-06-16 16:55:23 +03:00
Nedyalko Dyakov
86d418f940 feat: Introducing StreamingCredentialsProvider for token based authentication (#3320)
* wip

* update documentation

* add streamingcredentialsprovider in options

* fix: put back option in pool creation

* add package level comment

* Initial re authentication implementation

Introduces the StreamingCredentialsProvider as the CredentialsProvider
with the highest priority.

TODO: needs to be tested

* Change function type name

Change CancelProviderFunc to UnsubscribeFunc

* add tests

* fix race in tests

* fix example tests

* wip, hooks refactor

* fix build

* update README.md

* update wordlist

* update README.md

* refactor(auth): early returns in cred listener

* fix(doctest): simulate some delay

* feat(conn): add close hook on conn

* fix(tests): simulate start/stop in mock credentials provider

* fix(auth): don't double close the conn

* docs(README): mark streaming credentials provider as experimental

* fix(auth): streamline auth err proccess

* fix(auth): check err on close conn

* chore(entraid): use the repo under redis org
2025-05-27 16:25:20 +03:00
fukua95
28a3c97409 chore: set the default value for the options.protocol in the init() of options (#3387)
* chore: set the default value for the `options.protocol` in the `init()` of `options`

Signed-off-by: fukua95 <fukua95@gmail.com>

* add a test

Signed-off-by: fukua95 <fukua95@gmail.com>

---------

Signed-off-by: fukua95 <fukua95@gmail.com>
2025-05-27 14:53:41 +03:00
Nedyalko Dyakov
d54e848055 feat(options): panic when options are nil (#3363)
Client creation should panic when options are nil.
2025-04-30 09:33:40 +03:00
Nedyalko Dyakov
d236865b0c fix: handle network error on SETINFO (#3295) (CVE-2025-29923)
* fix: handle network error on SETINFO

This fix addresses potential out of order responses as described in `CVE-2025-29923`

* fix: deprecate DisableIndentity and introduce DisableIdentity

Both options will work before V10. In v10 DisableIndentity will be dropped. The preferred flag to use is `DisableIdentity`.
2025-03-19 19:02:36 +02:00
LINKIWI
1c9309fdc2 Set client name in HELLO RESP handshake (#3294) 2025-03-13 14:55:28 +02:00
LINKIWI
9db1286414 Reinstate read-only lock on hooks access in dialHook (#3225) 2025-02-11 17:50:31 +02:00
LINKIWI
080e051124 Eliminate redundant dial mutex causing unbounded connection queue contention (#3088)
* Eliminate redundant dial mutex causing unbounded connection queue contention

* Dialer connection timeouts unit test

---------

Co-authored-by: ofekshenawa <104765379+ofekshenawa@users.noreply.github.com>
2024-11-20 13:38:06 +02:00
ofekshenawa
04005cbdc7 Support Resp 3 Redis Search Unstable Mode (#3098)
* Updated module version that points to retracted package version (#3074)

* Updated module version that points to retracted package version

* Updated testing image to latest

* support raw parsing for problematic Redis Search types

* Add UnstableResp3SearchModule to client options

* Add tests for Resp3 Search unstable mode

* Add tests for Resp3 Search unstable mode

* Add readme note

* Add words to spellcheck

* Add UnstableResp3SearchModule check to assertStableCommand

* Fix assertStableCommand logic

* remove go.mod changes

* Check panic occur on tests

* rename method

* update errors

* Rename flag to UnstableResp3

---------

Co-authored-by: Vladyslav Vildanov <117659936+vladvildanov@users.noreply.github.com>
Co-authored-by: vladvildanov <divinez122@outlook.com>
2024-09-12 11:26:10 +03:00
Monkey
2d8fa02ac2 fix: fix #2681 (#2998)
Signed-off-by: monkey92t <golang@88.com>
2024-05-29 10:55:28 +08:00
ofekshenawa
5da49b1aba bug: Fix SETINFO ensuring it is set-and-forget (#2915)
* Exexcute set-info without validation

* Fix tests

* Remove spaces from runtime.Version

* fix typo

* Send setinfo after auth

* Add pipline

* fix golangci

* revert fixing typo

* support sentinel
2024-02-20 17:34:35 +02:00
ofekshenawa
35de49a8da Speed up connections by sending SetInfo via a pipeline (#2880)
* Send Client SetInfo via pipe

* Fix ACL test

* Add client set info to acl command rules
2024-02-15 12:48:56 +02:00
Tiago Peczenyj
21ed15bbed Add helpers to set libinfo without panic (#2724)
* add helpers to set library name and library info without risk of panic if we try to set both

* refactor code to use helpers

* add example

* refactor tests

* fix testable example

* simplify example

* rename exampl

* fix ring.go

* update example

---------

Co-authored-by: ofekshenawa <104765379+ofekshenawa@users.noreply.github.com>
2024-02-14 22:40:20 +02:00
ofekshenawa
a32be3d93d Add Suffix support to default client set info (#2852)
* Add Suffix support to defualt client set info

* Change ClientNameSuffix to IdentitySuffix

* add tests
2024-01-04 14:40:14 +02:00
RyoMiyashita
a109302230 fix: #2730 data race at hooksMixin (#2814) 2023-12-10 12:04:13 +02:00
ofekshenawa
0b6be62b71 Identify client on connect (#2708)
* Update CLIENT-SETINFO to support suffixes

* Update CLIENT-SETINFO

* fix acl log test

* add setinfo option to cluster

* change to DisableIndentity

* change to DisableIndentity
2023-09-20 10:33:09 +03:00
ljun20160606
391798880c feat: add protocol option (#2598) 2023-05-16 22:02:22 +08:00
Monkey
7b4f2179cb feat: no longer verify HELLO error messages (#2515)
Signed-off-by: monkey92t <golang@88.com>
2023-04-12 20:35:32 +08:00
Euclid
30a6f7107e fixed #2462 v9 continue support dragonfly, it's Hello command return "NOAUTH Authentication required" error (#2479)
* fixed #2462 error NOAUTH and support dragonfly

* check add comment

* alignment

---------

Co-authored-by: Monkey <golang@88.com>
2023-03-10 21:21:24 +08:00
monkey92t
21e1954745 fix(conn): releaseConn should be executed correctly
Signed-off-by: monkey92t <golang@88.com>
2023-01-28 15:44:06 +08:00
Vladimir Mihailenco
97b491aace chore: update import path 2023-01-23 08:48:54 +02:00
Vladimir Mihailenco
767109c632 chore: cleanup names 2023-01-21 10:30:02 +02:00
monkey
a5aeb1659b docs: update hook doc
Signed-off-by: monkey <golang@88.com>
2023-01-21 00:20:50 +08:00
monkey
0ed4a4420f fix: fix the withHook func
Signed-off-by: monkey <golang@88.com>
2023-01-21 00:02:44 +08:00
monkey
97697f488f feat: hook mode is changed to FIFO
Signed-off-by: monkey <golang@88.com>
2023-01-20 23:19:49 +08:00
monkey92t
d42dd1007c docs: add a description of the hook
Signed-off-by: monkey92t <golang@88.com>
2023-01-07 16:30:56 +08:00
Monkey
c7bc54b4d0 Merge pull request #2333 from monkey92t/fix_2312
feat: add ClientName option
2022-12-28 22:31:25 +08:00
monkey92t
a872c35b1a feat: add ClientName option
Signed-off-by: monkey92t <golang@88.com>
2022-12-28 22:14:52 +08:00
Monkey
a4336cbd43 feat(scan): add Scanner interface (#2317)
Signed-off-by: monkey92t <golang@88.com>
2022-12-24 22:29:45 +08:00
Vladimir Mihailenco
5053db2f9c fix: wrap cmds in Conn.TxPipeline 2022-11-22 14:30:27 +02:00
Dom Parfitt
d1bfaba549 fix: capture error correctly in withConn 2022-11-02 09:11:36 +00:00
Vladimir Mihailenco
2ec03d9b37 fix: late binding for dial hook 2022-10-12 15:00:06 +03:00
Vladimir Mihailenco
0dff3d1461 feat: add OpenTelemetry metrics instrumentation 2022-10-12 11:09:41 +03:00
Vladimir Mihailenco
58f7149e38 feat: add ContextTimeoutEnabled to respect context timeouts and deadlines 2022-10-11 10:22:42 +03:00
Vladimir Mihailenco
4bb485d044 fix: retry dial errors from pipelines 2022-10-11 09:38:10 +03:00
Vladimir Mihailenco
dd9a200427 fix: improve pipelines retry logic (#2232)
* fix: improve pipelines retry logic
2022-10-06 14:05:55 +03:00
Vladimir Mihailenco
b0231c659e chore: always retry write timeouts 2022-10-06 10:06:02 +03:00
Vladimir Mihailenco
6725851465 Merge pull request #2071 from ray2011/master
fix retryTimeout in baseClient _process
2022-10-06 10:05:25 +03:00
jianghang
baa48a4415 feat(pubsub): support sharded pub/sub 2022-08-04 00:05:22 +08:00
Knut Zuidema
e061db8c13 refactor: remove unused context attributes (#2154)
* refactor: remove unused context field
2022-07-14 13:43:42 +03:00
Knut Zuidema
092a692384 refactor: remove unused context attribute from conn (#2150) 2022-07-13 08:49:28 +03:00
Back Yu
b6d2a92529 fix: #2114 for redis-server not support Hello
Using `strings.HasPrefix` instead of `equal`
2022-06-10 21:19:02 +08:00
Vladimir Mihailenco
4ddd7d1803 chore: remove WithContext 2022-06-05 09:58:05 +03:00