-
Notifications
You must be signed in to change notification settings - Fork 1k
[Hitless Upgrades] React to maintenance events #3345 #3354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
tishun
wants to merge
5
commits into
main
Choose a base branch
from
feature/maintenance-events
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* v0.1 * Simple reconnect now working * Bind address from message is now considered * Self-register the handler * Format code * Filter push messages in a more stable way * (very hacky) Relax comand expire timers globbaly * Configure if timeout relaxing should be applied * Proper way to close channel * Configure the timneout relaxing * Sequential handover implemented * Did not address formatting * Prolong the rebind windwow for relaxed tiemouts * PubSub no longer required; CommandExpiryWriter is now channel aware; Polishing * Use the new MOVING push message from the RE server * Unit test was not chaining delgates in the same way that the RedisClient/RedisClusterClient was * Fix REBIND message validation * Fixed the expiry mechanism * Polishing * Fix NPE. Seems like AttributeMap.attr is not accurate and actually return's null causing some unit test failures. * Add support for MIGRATING/MIGRATED message handling in command expiry This commit adds the ability to listen for MIGRATING and MIGRATED messages and trigger extended command expiry timeouts during Redis shard migration. Key changes: - Enhanced RebindAwareConnectionWatchdog to detect MIGRATING/MIGRATED messages - RebindAwareExpiryWriter to trigger timeout relaxation whenever MIGRATING message is received This feature allows commands to have relaxed timeouts during shard migration operations, preventing unnecessary timeouts when Redis is temporarily busy with migration tasks. * formating * Fix Disabling relaxTimeouts after upgrade can interfere with an ongoing one from re-bind * Additional fix for timeout relaxing disabled * Fix push message listener registered multiple times after rebind. * Fix: Report correct command timeout when relaxTimeout is configured * Disable relaxedTimeout after configured grace period - Introduce a delay before disabling relaxedTimeout - Grace period duration is provided via push notification * Code clean up - Remove reading from pub/sub chanel and relay only on push notifications * Add FAILING_OVER/FAILED_OVER * Polishing : Rename components to use the word 'maintenace' --------- Co-authored-by: Igor Malinovskiy <[email protected]> Co-authored-by: ggivo <[email protected]>
(#3356) * Unit tests for the maintanence aware classes * Did not format properly * Proper license
ggivo
reviewed
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7f3c59b
to
a133028
Compare
* initial WIP, with lots of debugging, and some non-working tests * debug * more attemtps at debugging * Refactor: Move cluster state management methods from MaintenanceNotificationTest to RedisEnterpriseConfig - Moved refreshClusterConfig, recordOriginalClusterState, and restoreOriginalClusterState methods - Updated call sites in MaintenanceNotificationTest to use RedisEnterpriseConfig methods - Added required imports and static variables to RedisEnterpriseConfig - Maintained existing functionality while improving code organization Improvements to RelaxedTimeoutConfigurationTest: - Simplified traffic generation logic by removing complex multi-phase testing - Streamlined BLPOP command execution with better timeout detection - Added relaxed timeout detection and recording during maintenance events - Improved logging and error handling for timeout analysis - Enhanced test assertions to focus on relaxed timeout detection rather than success counts - Added MOVING operation duration tracking for better test analysis * Improve test reliability and cleanup: Add @AfterEach cleanup, enhance endpoint tracking, and improve logging * add un-relaxed tests. will investigate further why they got broken at some point via diff * CAE-1130: Update timeout configuration test and watchdog implementation * Reset MaintenanceAwareConnectionWatchdog.java and log4j2-test.xml to upstream versions * Clean up debug info and outdated comments from timeout tests - Remove large debug block with reflection-based debugging in RelaxedTimeoutConfigurationTest - Simplify excessive debug logging and verbose markers (*** and ===) - Clean up maintenance notification test logging - Improve push notification monitor message formatting - Maintain all test functionality while improving code readability * Refactor: Move inline comments above code and fix string comparisons - Move all inline comments to be above the code they reference in: * RelaxedTimeoutConfigurationTest.java * RedisEnterpriseConfig.java * MaintenanceNotificationTest.java - Replace string != comparisons with .equals() for proper string comparison - Apply code formatting via Maven formatter This improves code readability and follows Java best practices for string comparison.
…t-in (#3380) * Support for Client-side opt-in A client can tell the server if it wants to receive maintenance push notifications via the following command: CLIENT MAINT_NOTIFICATIONS <ON | OFF> [parameter value parameter value ...] * update maintenance events to latest format - MIGRATING <seq_number> <time> <shard_id-s>: A shard migration is going to start within <time> seconds. - MIGRATED <seq_number> <shard_id-s>: A shard migration ended. - FAILING_OVER <seq_number> <time> <shard_id-s>: A shard failover of a healthy shard started. - FAILED_OVER <seq_number> <shard_id-s>: A shard failover of a healthy shard ended. - MOVING <seq_number> <time> <endpoint>: A specific endpoint is going to move to another node within <time> seconds * clean up * Update FAILED_OVER & MIGRATED to include additional time field * update is private reserver check & add unit tests update is private reserver check * add unit tests for handshake with enabled maintenance events * add missing copyrights/docs * format * address review comments * Revert address after rebind operation expires * Update event's validation spec - MIGRATING <seq_number> <time> <shard_id-s>: A shard migration is going to start within <time> seconds. - MIGRATED <seq_number> <shard_id-s>: A shard migration ended. - FAILING_OVER <seq_number> <time> <shard_id-s>: A shard failover of a healthy shard started. - FAILED_OVER <seq_number> <shard_id-s>: A shard failover of a healthy shard ended. - MOVING <seq_number> <time> <endpoint>: A specific endpoint is going to move to another node within <time> seconds * rebase * format after rebase * Apply suggestions from code review Co-authored-by: Tihomir Krasimirov Mateev <[email protected]> * javadoc updated * Update src/main/java/io/lettuce/core/internal/NetUtils.java Co-authored-by: Tihomir Krasimirov Mateev <[email protected]> --------- Co-authored-by: Tihomir Krasimirov Mateev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft for introducing maintenance events to Lettuce
Make sure that:
mvn formatter:format
target. Don’t submit any formatting related changes.