HA Recovery
Recover a Failed Node¶
1. Fix the Original Node¶
- Ensure the original master is operational and properly configured.
- If the failed node is the master you will need to update your /etc/redis/redis.conf’s replicaof setting, or do the following:
- The original master cannot automatically reclaim its role as the master. You need to make it a replica of the current master (promoted replica).
- Use the
replicaof
command on the original master:1
redis-cli -h <original_master_ip> -p <original_master_port> replicaof <new_master_ip> <new_master_port>
- This command re-synchronizes the data on the original master with the current master.
- Start the Redis instance on the original master node.
2. Verify Synchronization¶
- Check the synchronization status by running:
1
redis-cli -h <original_master_ip> -p <original_master_port> info replication
- Look for
role:slave
and ensure themaster_sync_in_progress
is0
.
3. Let Sentinel Manage Failover (Optional)¶
- Sentinel will now monitor the reconfigured original master as a replica.
- If the current master fails in the future, Sentinel can promote any healthy replica, including the original master, back to the master role.
Optional: Force Revert to Original Master**¶
If you want to make the original master the primary master again (not recommended unless necessary), follow these steps:
- Step 1: Stop all writes to the current master to prevent split-brain or data loss.
- Step 2: Demote the current master to a replica using:
1
redis-cli -h <current_master_ip> -p <current_master_port> replicaof <original_master_ip> <original_master_port>
- Step 3: Promote the original master back to its role by setting
replicaof no one
:1
redis-cli -h <original_master_ip> -p <original_master_port> replicaof no one
- Step 4: Update Sentinels to monitor the new configuration. Restart Sentinels if needed.
You Should Automate¶
- Use tools like Redis Operator for Kubernetes or custom scripts to automate the process of restoring the original master.