Client IP persistence OR source IP hash load-balancing?

Client or Source IP ???


Well, this is roughly the same! Depends on people, environment, products, etc… I may use both of them in this article, but be aware that both of them points to the IP that is being used to get connected on the service whose being load-balanced.

Load-Balancing and Stickiness


Load-Balancing is the ability to spread requests among a server pool which deliver the same service. By definition, it means that any request can be sent to any server in the pool.
Some applications require stickiness between a client and a server: it means all the requests from a client must be sent to the same server. Otherwise, the application session may be broken and that may have a negative impact on the client.

Source IP stickiness


We may have many ways to stick a user to a server, which has already been discussed on this blog (Read load balancing, affinity, persistence, sticky sessions: what you need to know) (and many other article may follow).

That said, sometimes, the only information we can “rely” on to perform stickiness is the client (or source) IP address.
Note this is not optimal because:
  * many clients can be “hidden” behind a single IP address (Firewall, proxy, etc…)
  * a client can change its IP address during the session
  * a client can use multiple IP addresses
  * etc…

Performing source IP affinity


There are two ways of performing source IP affinity:
  1. Using a dedicated load-balancing algorithm: a hash on the source IP
  2. Using a stick table in memory (and a roundrobin load-balancing algorithm)

Actually, the main purpose of this article was to introduce both methods which are quite often misunderstood, and to show pros and cons of each, so people can make the right decision when configuring their Load-Balancer.

Source IP hash load-balancing algorithm


This algorithm is deterministic. It means that if no elements involved in the hash computation, then the result will be the same. 2 equipment are able to apply the same hash, hence load-balance the same way, making load-balancer failover transparent.
A hash function is applied on the source IP address of the incoming request. The hash must take into account the number of servers and each server’s weight.
The following events can make the hash change and so may redirect traffic differently over the time:
  * a server in the pool goes down
  * a server in the pool goes up
  * a server weight change

The main issue with source IP hash loadbalancing algorithm, is that each change can redirect EVERYBODY to a different server!!!
That’s why, some good load-balancers have implemented a consistent hashing method which ensure that if a server fails, for example, only the client connected to this server are redirected.
The counterpart of consistent hashing is that it doesn’t provide a perfect hash, and so, in a farm of 4 servers, some may receive more clients than others.
Note that when a failed server comes back, its “sticked” users (determined by the hash) will be redirected to it.
There is no overhead in term of CPU or memory when using such algorithm.

Configuration example in HAProxy or in the ALOHA Load-Balancer:

balance source
hash-type consistent

Source IP persistence using a stick-table


A table in memory is created by the Load-balancer to store the source IP address and the affected server from the pool.
We can rely on any non-deterministic load-balancing algorithm, such as roundrobin or leastconn (it usually depends on the type of application you’re load-balancing).
Once a client is sticked to a server, he’s sticked until the entry in the table expires OR the server fails.
There is an overhead in memory, to store stickiness information. In HAProxy, the overhead is pretty low: 40MB for 1.000.000 IPv4 addresses.
One main advantage of using a stick table is that when a failed server comes back, no existing sessions will be redirected to it. Only new incoming IPs can reach it. So no impact on users.
It is also possible to synchronize tables in memory between multiple HAProxy or ALOHA Load-Balancers, making a LB failover transparent.

Configuration example in HAProxy or in the ALOHA Load-Balancer:

stick-table type ip size 1m expire 1h
stick on src

Links

8 thoughts on “Client IP persistence OR source IP hash load-balancing?”

  1. Hello Baptiste!’

    Does “hash-type consistent” works consistently across 2 HAProxy? Image 2 requests from particular Source IP:
    – one request comes at first HAProxy instance
    – one request comes to second HAProxy instance
    These 2 requests must end up on one backend server to provide session affinity.

    Will it work?

    Thanks!

    Pavel

    1. Hi Pavel,

      As long as the items used to compute the hash is the same for your two HAProxy servers, then the result of the hash will be the same.
      Items used are:

      • number of servers available (up and running) in the farm
      • servers’ weight and total weight available in the farm
      • client IP address (in your case)

      Baptiste

      1. Hi Baptiste!

        Thanks for your answer!

        When talking about consistency across 2 devices there is a caveat that I’ve faced with a switch vendor. They claimed that the result of the hash (that time it was Link Aggregation) will be the same for a particular flow at 2 different switches. But there was no guarantee that the same hash value maps to the same outgoing switch port at 2 different switches!

        How HAProxy maps hash to the server? What guaranties that 2 HAProxy instances will map several flows from a particular client to the same server (of course, if number of servers in a farm, servers’ weight and total weight are the same at 2 HAProxy servers)?

        Thanks!

        Pavel

        1. The guarantee is provided by the way the hash is computed.
          As long as 2 HAProxy processes runs the same configuration and the server status is the same for both processes, then the hash is computed the same way.
          An other point for HAProxy: when you “reload” or “restart HAProxy, then a new process is spawned. This new process is supposed to load-balance traffic the same way the previous process did.
          This means two HAProxy processes computes the hash the same way, whether they run in the same box or not has no impact.

  2. Hi Baptiste!

    Is there any way to let the developer choose the server where he wants requests to be load balanced? For example the request will come to HAProxy with extra params (?server=myservername), then HAProxy will know that he must redirect request to “myservername” server. Can this be achived by HAProxy?

    Thanks

  3. hash-type consistent
    stick-table type ip size 10k expire 600s
    stick on src

    Hello.

    This config avoid the issue that when a server comes back are not redirected, doesnt it? I mean, the users sticked by its hash are not redirected.

    Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *