Skip to content

HA Recovery

Recover a Failed Node

1. Fix the Original Node

  • Ensure the original master is operational and properly configured.
  • If the failed node is the master you will need to update your /etc/redis/redis.conf’s replicaof setting, or do the following:
    • The original master cannot automatically reclaim its role as the master. You need to make it a replica of the current master (promoted replica).
    • Use the replicaof command on the original master:
      1
      redis-cli -h <original_master_ip> -p <original_master_port> replicaof <new_master_ip> <new_master_port>
      
    • This command re-synchronizes the data on the original master with the current master.
  • Start the Redis instance on the original master node.

2. Verify Synchronization

  • Check the synchronization status by running:
    1
    redis-cli -h <original_master_ip> -p <original_master_port> info replication
    
  • Look for role:slave and ensure the master_sync_in_progress is 0.

3. Let Sentinel Manage Failover (Optional)

  • Sentinel will now monitor the reconfigured original master as a replica.
  • If the current master fails in the future, Sentinel can promote any healthy replica, including the original master, back to the master role.

Optional: Force Revert to Original Master**

If you want to make the original master the primary master again (not recommended unless necessary), follow these steps:

  • Step 1: Stop all writes to the current master to prevent split-brain or data loss.
  • Step 2: Demote the current master to a replica using:
    1
    redis-cli -h <current_master_ip> -p <current_master_port> replicaof <original_master_ip> <original_master_port>
    
  • Step 3: Promote the original master back to its role by setting replicaof no one:
    1
    redis-cli -h <original_master_ip> -p <original_master_port> replicaof no one
    
  • Step 4: Update Sentinels to monitor the new configuration. Restart Sentinels if needed.

You Should Automate

  • Use tools like Redis Operator for Kubernetes or custom scripts to automate the process of restoring the original master.