1
0
mirror of https://github.com/redis/go-redis.git synced 2025-12-02 06:22:31 +03:00
Commit Graph

2850 Commits

Author SHA1 Message Date
Nedyalko Dyakov
ed43bd6dbd wip, remove ring 2025-11-03 00:40:47 +02:00
Nedyalko Dyakov
c637c0824e wip 2025-11-03 00:14:24 +02:00
Nedyalko Dyakov
ffa32a5370 wip 2025-10-31 17:13:11 +02:00
Nedyalko Dyakov
f672885808 wip 2025-10-31 13:04:46 +02:00
Nedyalko Dyakov
9eaeea1347 Merge branch 'ndyakov/state-machine-conn' into playground/autopipeline 2025-10-30 23:53:53 +02:00
Nedyalko Dyakov
d91800d640 fix test assertions 2025-10-30 22:20:14 +02:00
Nedyalko Dyakov
5fa97c826c add missed method in interface 2025-10-30 21:10:22 +02:00
Nedyalko Dyakov
ef3e06fd71 Merge remote-tracking branch 'origin/master' into ndyakov/state-machine-conn 2025-10-30 19:44:45 +02:00
cyningsun
ae5434ce66 feat(pool): Improve success rate of new connections (#3518)
* async create conn

* update default values and testcase

* fix comments

* fix data race

* remove context.WithoutCancel, which is a function introduced in Go 1.21

* fix TestDialerRetryConfiguration/DefaultDialerRetries, because tryDial are likely done in async flow

* change to share failed to delivery connection to other waiting

* remove chinese comment

* fix: optimize WantConnQueue benchmarks to prevent memory exhaustion

- Fix BenchmarkWantConnQueue_Dequeue timeout issue by limiting pre-population
- Use object pooling in BenchmarkWantConnQueue_Enqueue to reduce allocations
- Optimize BenchmarkWantConnQueue_EnqueueDequeue with reusable wantConn pool
- Prevent GitHub Actions benchmark failures due to excessive memory usage

Before: BenchmarkWantConnQueue_Dequeue ran for 11+ minutes and was killed
After: All benchmarks complete in ~8 seconds with consistent performance

* format

* fix turn leaks

---------

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
Co-authored-by: Hristo Temelski <hristo.temelski@redis.com>
2025-10-30 19:21:12 +02:00
Nedyalko Dyakov
d207749af5 flaky test 2025-10-30 19:19:20 +02:00
Nedyalko Dyakov
5f0b58ba14 Merge branch 'master' into ndyakov/state-machine-conn 2025-10-30 18:34:45 +02:00
Nedyalko Dyakov
fc2da240f8 wait more in e2e test 2025-10-30 18:27:15 +02:00
Nedyalko Dyakov
09a2f07ac3 re doesn't support requirepass 2025-10-29 16:34:01 +02:00
Nedyalko Dyakov
59da35ba2d improve remove conn 2025-10-29 16:23:21 +02:00
Nedyalko Dyakov
2965e3d35c fix benchmark test 2025-10-29 16:21:21 +02:00
Nedyalko Dyakov
43eeae70ab fix unsafe test 2025-10-29 16:19:04 +02:00
Nedyalko Dyakov
62eecaa75e fix assertion 2025-10-29 16:11:27 +02:00
pvragov
7f48276660 feat(otel): Add a 'error_type' metrics attribute to separate context errors (#3566)
Co-authored-by: vragov_pf <vragov_pf@magnit.ru>
2025-10-29 16:09:12 +02:00
Nedyalko Dyakov
7201275eb5 verify pass auth on conn creation 2025-10-29 16:06:35 +02:00
Nedyalko Dyakov
dccf01f396 use correct timer for last health check 2025-10-29 16:00:14 +02:00
Nedyalko Dyakov
600dfe2581 100ms -> 50ms 2025-10-29 15:31:31 +02:00
Nedyalko Dyakov
93eade2695 Merge remote-tracking branch 'origin/master' into playground/autopipeline 2025-10-29 13:49:55 +02:00
Nedyalko Dyakov
b6d7cdbd84 chore(ci): Add redis 8.4-RC1-pre & examples (#3572)
* add disable maintnotifications example

* add 8.4-RC1-pre

* println -> printf for linter

* address jit comment

Fix broken initialization of idle connections

optimize push notif

wip

wip

wip

wip
2025-10-29 13:49:32 +02:00
Nedyalko Dyakov
1510181ece allow e2e tests to run longer 2025-10-29 13:49:31 +02:00
Nedyalko Dyakov
c2d525f688 fix precision of time cache and usedAt 2025-10-29 13:49:31 +02:00
dependabot[bot]
7fd4e70bf2 chore(deps): bump rojopolis/spellcheck-github-actions (#3569)
Bumps [rojopolis/spellcheck-github-actions](https://github.com/rojopolis/spellcheck-github-actions) from 0.52.0 to 0.53.0.
- [Release notes](https://github.com/rojopolis/spellcheck-github-actions/releases)
- [Changelog](https://github.com/rojopolis/spellcheck-github-actions/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rojopolis/spellcheck-github-actions/compare/0.52.0...0.53.0)

---
updated-dependencies:
- dependency-name: rojopolis/spellcheck-github-actions
  dependency-version: 0.53.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-29 13:49:31 +02:00
iliya
05295860ac chore(tests): internal/proto/peek_push_notification_test : Refactor test helpers to… (#3563)
* internal/proto/peek_push_notification_test : Refactor test helpers to use fmt.Fprintf for buffers

Replaced buf.WriteString(fmt.Sprintf(...)) with fmt.Fprintf or fmt.Fprint in test helper functions for improved clarity and efficiency. This change affects push notification and RESP3 test utilities.

* peek_push_notification_test: revert prev formatting

* all: replace buf.WriteString with fmt.FprintF for consistency

---------

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-10-29 13:49:31 +02:00
Sourabh
fcc9443896 fix(panic): Return error instead of panic from commands (#3568)
Instead of panic in few commands, we can return an error to avoid unexpected panics in application code.
2025-10-29 13:49:31 +02:00
Nedyalko Dyakov
4950a2ff45 initConn sets IDLE state
- Handle unexpected conn state changes
2025-10-29 13:49:30 +02:00
Nedyalko Dyakov
d476a3813e fix(pool): pool performance (#3565)
* perf(pool): replace hookManager RWMutex with atomic.Pointer and add predefined state slices

- Replace hookManager RWMutex with atomic.Pointer for lock-free reads in hot paths
- Add predefined state slices to avoid allocations (validFromInUse, validFromCreatedOrIdle, etc.)
- Add Clone() method to PoolHookManager for atomic updates
- Update AddPoolHook/RemovePoolHook to use copy-on-write pattern
- Update all hookManager access points to use atomic Load()

Performance improvements:
- Eliminates RWMutex contention in Get/Put/Remove hot paths
- Reduces allocations by reusing predefined state slices
- Lock-free reads allow better CPU cache utilization

* perf(pool): eliminate mutex overhead in state machine hot path

The state machine was calling notifyWaiters() on EVERY Get/Put operation,
which acquired a mutex even when no waiters were present (the common case).

Fix: Use atomic waiterCount to check for waiters BEFORE acquiring mutex.
This eliminates mutex contention in the hot path (Get/Put operations).

Implementation:
- Added atomic.Int32 waiterCount field to ConnStateMachine
- Increment when adding waiter, decrement when removing
- Check waiterCount atomically before acquiring mutex in notifyWaiters()

Performance impact:
- Before: mutex lock/unlock on every Get/Put (even with no waiters)
- After: lock-free atomic check, only acquire mutex if waiters exist
- Expected improvement: ~30-50% for Get/Put operations

* perf(pool): use predefined state slices to eliminate allocations in hot path

The pool was creating new slice literals on EVERY Get/Put operation:
- popIdle(): []ConnState{StateCreated, StateIdle}
- putConn(): []ConnState{StateInUse}
- CompareAndSwapUsed(): []ConnState{StateIdle} and []ConnState{StateInUse}
- MarkUnusableForHandoff(): []ConnState{StateInUse, StateIdle, StateCreated}

These allocations were happening millions of times per second in the hot path.

Fix: Use predefined global slices defined in conn_state.go:
- validFromInUse
- validFromCreatedOrIdle
- validFromCreatedInUseOrIdle

Performance impact:
- Before: 4 slice allocations per Get/Put cycle
- After: 0 allocations (use predefined slices)
- Expected improvement: ~30-40% reduction in allocations and GC pressure

* perf(pool): optimize TryTransition to reduce atomic operations

Further optimize the hot path by:
1. Remove redundant GetState() call in the loop
2. Only check waiterCount after successful CAS (not before loop)
3. Inline the waiterCount check to avoid notifyWaiters() call overhead

This reduces atomic operations from 4-5 per Get/Put to 2-3:
- Before: GetState() + CAS + waiterCount.Load() + notifyWaiters mutex check
- After: CAS + waiterCount.Load() (only if CAS succeeds)

Performance impact:
- Eliminates 1-2 atomic operations per Get/Put
- Expected improvement: ~10-15% for Get/Put operations

* perf(pool): add fast path for Get/Put to match master performance

Introduced TryTransitionFast() for the hot path (Get/Put operations):
- Single CAS operation (same as master's atomic bool)
- No waiter notification overhead
- No loop through valid states
- No error allocation

Hot path flow:
1. popIdle(): Try IDLE → IN_USE (fast), fallback to CREATED → IN_USE
2. putConn(): Try IN_USE → IDLE (fast)

This matches master's performance while preserving state machine for:
- Background operations (handoff/reauth use UNUSABLE state)
- State validation (TryTransition still available)
- Waiter notification (AwaitAndTransition for blocking)

Performance comparison per Get/Put cycle:
- Master: 2 atomic CAS operations
- State machine (before): 5 atomic operations (2.5x slower)
- State machine (after): 2 atomic CAS operations (same as master!)

Expected improvement: Restore to baseline ~11,373 ops/sec

* combine cas

* fix linter

* try faster approach

* fast semaphore

* better inlining for hot path

* fix linter issues

* use new semaphore in auth as well

* linter should be happy now

* add comments

* Update internal/pool/conn_state.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comment

* slight reordering

* try to cache time if for non-critical calculation

* fix wrong benchmark

* add concurrent test

* fix benchmark report

* add additional expect to check output

* comment and variable rename

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-29 13:49:26 +02:00
Nedyalko Dyakov
54281d687c optimize push notif 2025-10-28 23:32:27 +02:00
Nedyalko Dyakov
0752aecdfb Fix broken initialization of idle connections 2025-10-28 19:45:50 +02:00
Nedyalko Dyakov
f1c8884250 Merge branch 'master' into ndyakov/state-machine-conn 2025-10-28 15:47:54 +02:00
Nedyalko Dyakov
5771fa474a chore(ci): Add redis 8.4-RC1-pre & examples (#3572)
* add disable maintnotifications example

* add 8.4-RC1-pre

* println -> printf for linter

* address jit comment
2025-10-28 15:47:39 +02:00
Nedyalko Dyakov
dcd8f9cf7f allow e2e tests to run longer 2025-10-28 15:43:58 +02:00
Nedyalko Dyakov
d5db5340cb fix precision of time cache and usedAt 2025-10-28 12:34:09 +02:00
Nedyalko Dyakov
b862bf53de Merge remote-tracking branch 'origin/master' into ndyakov/state-machine-conn 2025-10-28 12:24:30 +02:00
dependabot[bot]
f7a8a1c1d7 chore(deps): bump rojopolis/spellcheck-github-actions (#3569)
Bumps [rojopolis/spellcheck-github-actions](https://github.com/rojopolis/spellcheck-github-actions) from 0.52.0 to 0.53.0.
- [Release notes](https://github.com/rojopolis/spellcheck-github-actions/releases)
- [Changelog](https://github.com/rojopolis/spellcheck-github-actions/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rojopolis/spellcheck-github-actions/compare/0.52.0...0.53.0)

---
updated-dependencies:
- dependency-name: rojopolis/spellcheck-github-actions
  dependency-version: 0.53.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-28 11:44:06 +02:00
iliya
9c77386b08 chore(tests): internal/proto/peek_push_notification_test : Refactor test helpers to… (#3563)
* internal/proto/peek_push_notification_test : Refactor test helpers to use fmt.Fprintf for buffers

Replaced buf.WriteString(fmt.Sprintf(...)) with fmt.Fprintf or fmt.Fprint in test helper functions for improved clarity and efficiency. This change affects push notification and RESP3 test utilities.

* peek_push_notification_test: revert prev formatting

* all: replace buf.WriteString with fmt.FprintF for consistency

---------

Co-authored-by: Nedyalko Dyakov <1547186+ndyakov@users.noreply.github.com>
2025-10-28 11:41:45 +02:00
Sourabh
7be00c8725 fix(panic): Return error instead of panic from commands (#3568)
Instead of panic in few commands, we can return an error to avoid unexpected panics in application code.
2025-10-28 11:32:34 +02:00
Nedyalko Dyakov
a9640cd811 hybrid approach, test agains previous commit 2025-10-28 10:35:32 +02:00
Nedyalko Dyakov
9448059c01 initConn sets IDLE state
- Handle unexpected conn state changes
2025-10-28 00:49:39 +02:00
Nedyalko Dyakov
7198f47baa autopipeline playground 2025-10-27 16:08:40 +02:00
Nedyalko Dyakov
080a33c3a8 fix(pool): pool performance (#3565)
* perf(pool): replace hookManager RWMutex with atomic.Pointer and add predefined state slices

- Replace hookManager RWMutex with atomic.Pointer for lock-free reads in hot paths
- Add predefined state slices to avoid allocations (validFromInUse, validFromCreatedOrIdle, etc.)
- Add Clone() method to PoolHookManager for atomic updates
- Update AddPoolHook/RemovePoolHook to use copy-on-write pattern
- Update all hookManager access points to use atomic Load()

Performance improvements:
- Eliminates RWMutex contention in Get/Put/Remove hot paths
- Reduces allocations by reusing predefined state slices
- Lock-free reads allow better CPU cache utilization

* perf(pool): eliminate mutex overhead in state machine hot path

The state machine was calling notifyWaiters() on EVERY Get/Put operation,
which acquired a mutex even when no waiters were present (the common case).

Fix: Use atomic waiterCount to check for waiters BEFORE acquiring mutex.
This eliminates mutex contention in the hot path (Get/Put operations).

Implementation:
- Added atomic.Int32 waiterCount field to ConnStateMachine
- Increment when adding waiter, decrement when removing
- Check waiterCount atomically before acquiring mutex in notifyWaiters()

Performance impact:
- Before: mutex lock/unlock on every Get/Put (even with no waiters)
- After: lock-free atomic check, only acquire mutex if waiters exist
- Expected improvement: ~30-50% for Get/Put operations

* perf(pool): use predefined state slices to eliminate allocations in hot path

The pool was creating new slice literals on EVERY Get/Put operation:
- popIdle(): []ConnState{StateCreated, StateIdle}
- putConn(): []ConnState{StateInUse}
- CompareAndSwapUsed(): []ConnState{StateIdle} and []ConnState{StateInUse}
- MarkUnusableForHandoff(): []ConnState{StateInUse, StateIdle, StateCreated}

These allocations were happening millions of times per second in the hot path.

Fix: Use predefined global slices defined in conn_state.go:
- validFromInUse
- validFromCreatedOrIdle
- validFromCreatedInUseOrIdle

Performance impact:
- Before: 4 slice allocations per Get/Put cycle
- After: 0 allocations (use predefined slices)
- Expected improvement: ~30-40% reduction in allocations and GC pressure

* perf(pool): optimize TryTransition to reduce atomic operations

Further optimize the hot path by:
1. Remove redundant GetState() call in the loop
2. Only check waiterCount after successful CAS (not before loop)
3. Inline the waiterCount check to avoid notifyWaiters() call overhead

This reduces atomic operations from 4-5 per Get/Put to 2-3:
- Before: GetState() + CAS + waiterCount.Load() + notifyWaiters mutex check
- After: CAS + waiterCount.Load() (only if CAS succeeds)

Performance impact:
- Eliminates 1-2 atomic operations per Get/Put
- Expected improvement: ~10-15% for Get/Put operations

* perf(pool): add fast path for Get/Put to match master performance

Introduced TryTransitionFast() for the hot path (Get/Put operations):
- Single CAS operation (same as master's atomic bool)
- No waiter notification overhead
- No loop through valid states
- No error allocation

Hot path flow:
1. popIdle(): Try IDLE → IN_USE (fast), fallback to CREATED → IN_USE
2. putConn(): Try IN_USE → IDLE (fast)

This matches master's performance while preserving state machine for:
- Background operations (handoff/reauth use UNUSABLE state)
- State validation (TryTransition still available)
- Waiter notification (AwaitAndTransition for blocking)

Performance comparison per Get/Put cycle:
- Master: 2 atomic CAS operations
- State machine (before): 5 atomic operations (2.5x slower)
- State machine (after): 2 atomic CAS operations (same as master!)

Expected improvement: Restore to baseline ~11,373 ops/sec

* combine cas

* fix linter

* try faster approach

* fast semaphore

* better inlining for hot path

* fix linter issues

* use new semaphore in auth as well

* linter should be happy now

* add comments

* Update internal/pool/conn_state.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comment

* slight reordering

* try to cache time if for non-critical calculation

* fix wrong benchmark

* add concurrent test

* fix benchmark report

* add additional expect to check output

* comment and variable rename

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-27 15:06:30 +02:00
Nedyalko Dyakov
9f3f8b7c7b comment and variable rename 2025-10-27 09:19:17 +02:00
Nedyalko Dyakov
471a828ab1 add additional expect to check output 2025-10-27 08:21:27 +02:00
Nedyalko Dyakov
da5fe33cdf fix benchmark report 2025-10-27 08:05:26 +02:00
Nedyalko Dyakov
316aeb7b3c add concurrent test 2025-10-27 07:57:04 +02:00
Nedyalko Dyakov
4a3066384b fix wrong benchmark 2025-10-27 07:46:00 +02:00
Nedyalko Dyakov
55c502dde4 try to cache time if for non-critical calculation 2025-10-27 07:30:09 +02:00