1
0
mirror of https://github.com/redis/go-redis.git synced 2025-12-03 18:31:14 +03:00
Commit Graph

313 Commits

Author SHA1 Message Date
Nedyalko Dyakov
5fa97c826c add missed method in interface 2025-10-30 21:10:22 +02:00
Nedyalko Dyakov
ef3e06fd71 Merge remote-tracking branch 'origin/master' into ndyakov/state-machine-conn 2025-10-30 19:44:45 +02:00
cyningsun
ae5434ce66 feat(pool): Improve success rate of new connections (#3518)
* async create conn

* update default values and testcase

* fix comments

* fix data race

* remove context.WithoutCancel, which is a function introduced in Go 1.21

* fix TestDialerRetryConfiguration/DefaultDialerRetries, because tryDial are likely done in async flow

* change to share failed to delivery connection to other waiting

* remove chinese comment

* fix: optimize WantConnQueue benchmarks to prevent memory exhaustion

- Fix BenchmarkWantConnQueue_Dequeue timeout issue by limiting pre-population
- Use object pooling in BenchmarkWantConnQueue_Enqueue to reduce allocations
- Optimize BenchmarkWantConnQueue_EnqueueDequeue with reusable wantConn pool
- Prevent GitHub Actions benchmark failures due to excessive memory usage

Before: BenchmarkWantConnQueue_Dequeue ran for 11+ minutes and was killed
After: All benchmarks complete in ~8 seconds with consistent performance

* format

* fix turn leaks

---------

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-10-30 19:21:12 +02:00
Nedyalko Dyakov
d207749af5 flaky test 2025-10-30 19:19:20 +02:00
Nedyalko Dyakov
59da35ba2d improve remove conn 2025-10-29 16:23:21 +02:00
Nedyalko Dyakov
2965e3d35c fix benchmark test 2025-10-29 16:21:21 +02:00
Nedyalko Dyakov
43eeae70ab fix unsafe test 2025-10-29 16:19:04 +02:00
Nedyalko Dyakov
62eecaa75e fix assertion 2025-10-29 16:11:27 +02:00
Nedyalko Dyakov
dccf01f396 use correct timer for last health check 2025-10-29 16:00:14 +02:00
Nedyalko Dyakov
600dfe2581 100ms -> 50ms 2025-10-29 15:31:31 +02:00
Nedyalko Dyakov
54281d687c optimize push notif 2025-10-28 23:32:27 +02:00
Nedyalko Dyakov
0752aecdfb Fix broken initialization of idle connections 2025-10-28 19:45:50 +02:00
Nedyalko Dyakov
d5db5340cb fix precision of time cache and usedAt 2025-10-28 12:34:09 +02:00
Nedyalko Dyakov
b862bf53de Merge remote-tracking branch 'origin/master' into ndyakov/state-machine-conn 2025-10-28 12:24:30 +02:00
iliya
9c77386b08 chore(tests): internal/proto/peek_push_notification_test : Refactor test helpers to… (#3563)
* internal/proto/peek_push_notification_test : Refactor test helpers to use fmt.Fprintf for buffers

Replaced buf.WriteString(fmt.Sprintf(...)) with fmt.Fprintf or fmt.Fprint in test helper functions for improved clarity and efficiency. This change affects push notification and RESP3 test utilities.

* peek_push_notification_test: revert prev formatting

* all: replace buf.WriteString with fmt.FprintF for consistency

---------

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-10-28 11:41:45 +02:00
Nedyalko Dyakov
9448059c01 initConn sets IDLE state
- Handle unexpected conn state changes
2025-10-28 00:49:39 +02:00
Nedyalko Dyakov
080a33c3a8 fix(pool): pool performance (#3565)
* perf(pool): replace hookManager RWMutex with atomic.Pointer and add predefined state slices

- Replace hookManager RWMutex with atomic.Pointer for lock-free reads in hot paths
- Add predefined state slices to avoid allocations (validFromInUse, validFromCreatedOrIdle, etc.)
- Add Clone() method to PoolHookManager for atomic updates
- Update AddPoolHook/RemovePoolHook to use copy-on-write pattern
- Update all hookManager access points to use atomic Load()

Performance improvements:
- Eliminates RWMutex contention in Get/Put/Remove hot paths
- Reduces allocations by reusing predefined state slices
- Lock-free reads allow better CPU cache utilization

* perf(pool): eliminate mutex overhead in state machine hot path

The state machine was calling notifyWaiters() on EVERY Get/Put operation,
which acquired a mutex even when no waiters were present (the common case).

Fix: Use atomic waiterCount to check for waiters BEFORE acquiring mutex.
This eliminates mutex contention in the hot path (Get/Put operations).

Implementation:
- Added atomic.Int32 waiterCount field to ConnStateMachine
- Increment when adding waiter, decrement when removing
- Check waiterCount atomically before acquiring mutex in notifyWaiters()

Performance impact:
- Before: mutex lock/unlock on every Get/Put (even with no waiters)
- After: lock-free atomic check, only acquire mutex if waiters exist
- Expected improvement: ~30-50% for Get/Put operations

* perf(pool): use predefined state slices to eliminate allocations in hot path

The pool was creating new slice literals on EVERY Get/Put operation:
- popIdle(): []ConnState{StateCreated, StateIdle}
- putConn(): []ConnState{StateInUse}
- CompareAndSwapUsed(): []ConnState{StateIdle} and []ConnState{StateInUse}
- MarkUnusableForHandoff(): []ConnState{StateInUse, StateIdle, StateCreated}

These allocations were happening millions of times per second in the hot path.

Fix: Use predefined global slices defined in conn_state.go:
- validFromInUse
- validFromCreatedOrIdle
- validFromCreatedInUseOrIdle

Performance impact:
- Before: 4 slice allocations per Get/Put cycle
- After: 0 allocations (use predefined slices)
- Expected improvement: ~30-40% reduction in allocations and GC pressure

* perf(pool): optimize TryTransition to reduce atomic operations

Further optimize the hot path by:
1. Remove redundant GetState() call in the loop
2. Only check waiterCount after successful CAS (not before loop)
3. Inline the waiterCount check to avoid notifyWaiters() call overhead

This reduces atomic operations from 4-5 per Get/Put to 2-3:
- Before: GetState() + CAS + waiterCount.Load() + notifyWaiters mutex check
- After: CAS + waiterCount.Load() (only if CAS succeeds)

Performance impact:
- Eliminates 1-2 atomic operations per Get/Put
- Expected improvement: ~10-15% for Get/Put operations

* perf(pool): add fast path for Get/Put to match master performance

Introduced TryTransitionFast() for the hot path (Get/Put operations):
- Single CAS operation (same as master's atomic bool)
- No waiter notification overhead
- No loop through valid states
- No error allocation

Hot path flow:
1. popIdle(): Try IDLE → IN_USE (fast), fallback to CREATED → IN_USE
2. putConn(): Try IN_USE → IDLE (fast)

This matches master's performance while preserving state machine for:
- Background operations (handoff/reauth use UNUSABLE state)
- State validation (TryTransition still available)
- Waiter notification (AwaitAndTransition for blocking)

Performance comparison per Get/Put cycle:
- Master: 2 atomic CAS operations
- State machine (before): 5 atomic operations (2.5x slower)
- State machine (after): 2 atomic CAS operations (same as master!)

Expected improvement: Restore to baseline ~11,373 ops/sec

* combine cas

* fix linter

* try faster approach

* fast semaphore

* better inlining for hot path

* fix linter issues

* use new semaphore in auth as well

* linter should be happy now

* add comments

* Update internal/pool/conn_state.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comment

* slight reordering

* try to cache time if for non-critical calculation

* fix wrong benchmark

* add concurrent test

* fix benchmark report

* add additional expect to check output

* comment and variable rename

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-27 15:06:30 +02:00
Nedyalko Dyakov
13a4b3f366 fix error from copilot 2025-10-25 21:53:13 +03:00
Nedyalko Dyakov
33696fb002 Update internal/pool/conn.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-25 21:49:41 +03:00
Nedyalko Dyakov
bc4230766a Update internal/pool/conn_state.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-25 21:49:12 +03:00
Nedyalko Dyakov
0964dccbf1 Update internal/pool/pool.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-25 21:49:01 +03:00
Nedyalko Dyakov
65a6ece947 Update internal/pool/conn_state_test.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-25 21:48:55 +03:00
Nedyalko Dyakov
ffbe1e59f7 Update internal/pool/conn_state_test.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-25 21:44:27 +03:00
Nedyalko Dyakov
a2c7a25196 fix mark on uninitialized connection 2025-10-25 00:48:35 +03:00
Nedyalko Dyakov
84e856e382 improve tests 2025-10-25 00:40:07 +03:00
Nedyalko Dyakov
9ad62883ff try to detect the deadlock x2 2025-10-24 23:45:02 +03:00
Nedyalko Dyakov
c4ed467a59 try to detect the deadlock 2025-10-24 18:00:47 +03:00
Nedyalko Dyakov
3a53e1bdcc fix handoff state when queued for handoff 2025-10-24 17:30:36 +03:00
Nedyalko Dyakov
94fa9204ce better timeouts 2025-10-24 15:28:28 +03:00
Nedyalko Dyakov
de2f8ba0a1 Update internal/pool/conn.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:20:25 +03:00
Nedyalko Dyakov
3f29463299 Update internal/pool/conn.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:20:16 +03:00
Nedyalko Dyakov
21bd243bf5 improve reauth state management. fix tests 2025-10-24 15:06:55 +03:00
Nedyalko Dyakov
7526e67f17 better errors for tests, hook should work now 2025-10-24 14:52:12 +03:00
Nedyalko Dyakov
663a60e47f correct handling OnPut 2025-10-24 14:13:13 +03:00
Nedyalko Dyakov
5721512a79 polish state machine 2025-10-24 13:09:30 +03:00
Nedyalko Dyakov
606264ef7f wip, used and unusable states 2025-10-23 17:09:22 +03:00
Nedyalko Dyakov
27591cd045 wip 2025-10-23 14:05:31 +03:00
Nedyalko Dyakov
a15e76394c fix(pool): Pool ReAuth should not interfere with handoff (#3547)
* fix(pool): wip, pool reauth should not interfere with handoff

* fix credListeners map

* fix race in tests

* better conn usable timeout

* add design decision comment

* few small improvements

* update marked as queued

* add Used to clarify the state of the conn

* rename test

* fix(test): fix flaky test

* lock inside the listeners collection

* address pr comments

* Update internal/auth/cred_listeners.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/buffer_size_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* wip refactor entraid

* fix maintnotif pool hook

* fix mocks

* fix nil listener

* sync and async reauth based on conn lifecycle

* be able to reject connection OnGet

* pass hooks so the tests can observe reauth

* give some time for the background to execute commands

* fix tests

* only async reauth

* Update internal/pool/pool.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/auth/streaming/pool_hook.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update internal/pool/conn.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore(redisotel): use metric.WithAttributeSet to avoid copy (#3552)

In order to improve performance replace `WithAttributes` with `WithAttributeSet`.
This avoids the slice allocation and copy that is done in `WithAttributes`.

For more information see https://github.com/open-telemetry/opentelemetry-go/blob/v1.38.0/metric/instrument.go#L357-L376

* chore(docs): explain why MaxRetries is disabled for ClusterClient (#3551)

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>

* exponential backoff

* address pr comments

* address pr comments

* remove rlock

* add some comments

* add comments

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Warnar Boekkooi <wboekkooi@impossiblecloud.com>
Co-authored-by: Justin <justindsouza80@gmail.com>
2025-10-22 12:45:30 +03:00
Feng.YJ
3ad9f9cb23 fix: add missing error variable for non-unix build constraints (#3538)
* fix: add missing error variable for non-unix build constraints

* chore: name "_" for unused parameters

---------

Co-authored-by: Elena Kolevska <elena-kolevska@users.noreply.github.com>
2025-09-29 09:35:04 +03:00
Nedyalko Dyakov
75ddeb3d5a feat(e2e-testing): maintnotifications e2e and refactor (#3526)
* e2e wip

* cleanup

* remove unused fault injector mock

* errChan in test

* remove log messages tests

* cleanup log messages

* s/hitless/maintnotifications/

* fix moving when none

* better logs

* test with second client after action has started

* Fixes

Signed-off-by: Elena Kolevska <elena@kolevska.com>

* Test fix

Signed-off-by: Elena Kolevska <elena@kolevska.com>

* feat(e2e-test): Extended e2e tests

* imroved e2e test resiliency

---------

Signed-off-by: Elena Kolevska <elena@kolevska.com>
Co-authored-by: Elena Kolevska <elena@kolevska.com>
Co-authored-by: Elena Kolevska <elena-kolevska@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-09-26 19:17:09 +03:00
Nedyalko Dyakov
0ef6d0727d feat: RESP3 notifications support & Hitless notifications handling [CAE-1088] & [CAE-1072] (#3418)
- Adds support for handling push notifications with RESP3. 
- Using this support adds handlers for hitless upgrades.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-09-10 22:18:01 +03:00
cxljs
e07f55bed1 chore(buffers): Set the default read/write buffer size of Redis connection to 32KiB (#3483)
* update README.md

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

* typo: 0.5MiB -> 256KiB

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

* Set the default read/write buffer size of Redis connection to 32KiB

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>

---------

Signed-off-by: Xiaolong Chen <fukua95@gmail.com>
2025-08-18 20:04:55 +03:00
Nedyalko Dyakov
94cfffa417 fix(options): Add buffer sizes to failover. Update README (#3468)
* fix(options): Add buffer sizes to failover. Update README

* fix(spellcheck): add KiB in wordlist

* fix(comment): fix defaul value in comment

* fixes #3465
2025-08-11 16:01:24 +03:00
Monkey
6a48d3fec1 feat: recover addIdleConn may occur panic (#2445)
* feat: recover addIdleConn may occur panic

Signed-off-by: monkey92t <golang@88.com>

* fix test race

Signed-off-by: monkey92t <golang@88.com>

---------

Signed-off-by: monkey92t <golang@88.com>
Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-08-05 15:31:58 +03:00
ofekshenawa
1eed165f9d feat(proto): add configurable buffer sizes for Redis connections (#3453)
* add configurable buffer sizes for Redis connections

* add MiB to wordlist

* Add description for buffer size parameter
2025-08-04 09:16:54 +03:00
Pete Woods
4ac591c7c4 Set correct cluster slot for scan commands, similarly to Java's Jedis client (#2623)
- At present, the `scan` command is dispatched to a random slot.
- As far as I can tell, the scanX family of commands are not cluster aware (e.g. don't redirect the client to the correct slot).
- You can see [here](869dc0bb66/src/main/java/redis/clients/jedis/ShardedCommandObjects.java (L101)), the Jedis client calling `processKey` on the match argument, and this is what this PR also does.

We've had this patch running in production, and it seems to work well for us.

For further thought:
- Continuing looking at other Redis clients (e.g. Jedis), they outright [reject as invalid](869dc0bb66/src/main/java/redis/clients/jedis/ShardedCommandObjects.java (L98)) any scan command that does not include a hash-tag. Presumably this has the advantage of users not being surprised when their scan produces no results when a random server is picked.
- Perhaps it would be sensible for go-redis to do the same also?

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-06-24 13:43:03 +03:00
WeizhongTu
7d97cc1c59 feat: optimize connection pool waitTurn (#3412) 2025-06-20 12:07:14 +03:00
fukua95
eb40ac8328 perf: reduce unnecessary memory allocation (#3399)
Signed-off-by: fukua95 <fukua95@gmail.com>
Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-06-09 11:59:58 +03:00
Nedyalko Dyakov
86d418f940 feat: Introducing StreamingCredentialsProvider for token based authentication (#3320)
* wip

* update documentation

* add streamingcredentialsprovider in options

* fix: put back option in pool creation

* add package level comment

* Initial re authentication implementation

Introduces the StreamingCredentialsProvider as the CredentialsProvider
with the highest priority.

TODO: needs to be tested

* Change function type name

Change CancelProviderFunc to UnsubscribeFunc

* add tests

* fix race in tests

* fix example tests

* wip, hooks refactor

* fix build

* update README.md

* update wordlist

* update README.md

* refactor(auth): early returns in cred listener

* fix(doctest): simulate some delay

* feat(conn): add close hook on conn

* fix(tests): simulate start/stop in mock credentials provider

* fix(auth): don't double close the conn

* docs(README): mark streaming credentials provider as experimental

* fix(auth): streamline auth err proccess

* fix(auth): check err on close conn

* chore(entraid): use the repo under redis org
2025-05-27 16:25:20 +03:00
LINKIWI
c149644da7 Unit test for pool acquisition timeout (#3381)
Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-05-19 19:46:19 +03:00