HAProxy advanced Redis health check


Redis is an opensource nosql database working on a key/value model.
One interesting feature in Redis is that it is able to write data to disk as well as a master can synchronize many slaves.

HAProxy can load-balance Redis servers with no issues at all.
There is even a built-in health check for redis in HAProxy.
Unfortunately, there was no easy way for HAProxy to detect the status of a redis server: master or slave node. Hence people usually hacks this part of the architecture.

As written in the title of this post, we’ll learn today how to make a simple Redis infrastructure thanks to newest HAProxy advanced send/expect health checks.
This feature is available in HAProxy 1.5-dev20 and above.

Purpose is to make the redis infrastructure as simple as possible and ease fail over for the web servers. HAProxy will have to detect which node is MASTER and route all the connection to it.

Redis high availability diagram with HAProxy

Below, an ascii art diagram of HAProxy load-balancing Redis servers:

+----+ +----+ +----+ +----+
| W1 | | W2 | | W3 | | W4 |   Web application servers
+----+ +----+ +----+ +----+
     \     |   |     /
      \    |   |    /
       \   |   |   /
        | HAProxy |
           /   \
       +----+ +----+
       | R1 | | R2 |           Redis servers
       +----+ +----+

The scenario is simple:
  * 4 web application servers need to store and retrieve data to/from a Redis database
  * one (better using 2) HAProxy servers which load-balance redis connections
  * 2 (at least) redis servers in an active/standby mode with replication


Below, is the HAProxy configuration for the

defaults REDIS
 mode tcp
 timeout connect  4s
 timeout server  30s
 timeout client  30s

frontend ft_redis
 bind name redis
 default_backend bk_redis

backend bk_redis
 option tcp-check
 tcp-check send PING\r\n
 tcp-check expect string +PONG
 tcp-check send info\ replication\r\n
 tcp-check expect string role:master
 tcp-check send QUIT\r\n
 tcp-check expect string +OK
 server R1 check inter 1s
 server R2 check inter 1s

The HAProxy health check sequence above allows to consider the Redis master server as UP in the farm and redirect connections to it.
When the Redis master server fails, the remaining nodes elect a new one. HAProxy will detect it thanks to its health check sequence.

It does not require third party tools and make fail over transparent.


7 Responses to HAProxy advanced Redis health check

  1. Koen says:

    You are missing the “tcp-check connect” statement that make the connection. Took me a few minutes to figure that out.

  2. whaa says:

    funny how a haproxy 1.5.22 on ubuntu 12.04 worked -without- tcp-check connect, and the same haproxy on a centos 6.5 didn’t work without tcp-check connect.

    thanks for the tip.

    • Hi,

      This is not fun, it deserves an understanding!
      Could you send a mail to the ML with more information about your environment and your configuration (anonimized) as well?


  3. adichiru says:


    I have been testing a very similar scenarios for quite a while and there is a problem that haproxy needs to handle properly. The main idea is that haproxy is looking to detect redis masters by querying each back-end server, which is expected for load-balancing but in the same time problematic for HA in the special case of redis. Redis has a master-slave sync option which combined with Sentinel works pretty good. However, haproxy does not get the current master details from sentinel so if a net partition occurs, a slave is being promoted to master while the old master is isolated; if the old master comes back it will come back as master and haproxy will see it as a valid back-end so it will send queries to it for a few good seconds until sentinel will reconfigure it to be slave.

    I believe this can be “easily” solved by making haproxy get ip and port for the current master from sentinel, since sentinel is the authority in that redis infrastructure. Just playing with inter and rise to allow sentinel enough time to fix this 2 masters in the same time problem is not reliable and inserts a huge delay in the fail-over scenario.

    What do you think about this? Is there something I am missing?

