Tag Archives: scalability

Load Balancing and Application Delivery for the Enterprise [Webinar]

Do you know what makes HAProxy Enterprise Edition different from HAProxy Community Edition?

HAPEE isn’t just HAProxy Community with paid support, and unlike some other products based on open source projects, HAPEE doesn’t strip away any of the capabilities of HAProxy Community.

You can learn what HAPEE is all about and how it can provide additional benefits to enterprises during our webinar on March 2nd.

In this webinar, we’ll present:

  • How HAProxy Enterprise Edition (HAPEE) is different from HAProxy Community Edition
  • Why HAPEE is the most up-to-date, secure, and stable version of HAProxy
  • How enterprises leverage HAPEE to scale-out environments in the cloud
  • How enterprises can increase their admin productivity using HAPEE
  • How HAPEE enables advanced DDOS protection and helps mitigate other attacks

Sign up here.

HAProxy and container IP changes in Docker

HAProxy and Docker containers

Docker is a nice tool to handle containers: it allows building and running your apps in a simple and efficient way.

When used in production together with HAProxy, devops teams face a big challenge: how to followup a container IP change when restarting a container?

This blog article aims at giving a first answer to this question.

The version of docker used for this article is 1.8.1 (very important, since docker’s default behavior has changed in 1.9.0…)

HAProxy, webapp and docker diagram

The diagram below shows how docker runs on my laptop:

  • A docker0 network interface, with IP 172.16.0.1
  • all containers run in the subnet 172.16.0.0/16
+-------------------------------------host--------------------------------------+
|                                                                               |
|  +-----------------------docker-----------------------+ 172.16.0.0/16         |
|  |                                                    | docker0: 172.16.0.1   |
|  |  +---------+  +---------+       +----------+       |                       |
|  |  | HAProxy |  | appsrv1 |       | rsyslogd |       |                       |
|  |  +---------+  +---------+       +----------+       |                       |
|  |                                                    |                       |
|  +----------------------------------------------------+                       |
|                                                                               |
+-------------------------------------------------------------------------------+

The IP address associated to docker0 will be used to export some services.

In this article, we’ll start up 3 containers:

  1. rsyslogd: where HAProxy will send all its logs
  2. appsrv1: our application server, which may be restarted at any time
  3. haproxy: our load-balancer, which must follow-up appsrv1‘s IP

Building and running the lab

Building


First, we need rsyslogd and haproxy containers. They can be build from the following Dockerfiles:

Then run:

docker build -t blog:haproxy_dns ~/tmp/haproxy/blog/haproxy_docker_dns_link/blog_haproxy_dns/
docker build -t blog:rsyslogd ~/tmp/haproxy/blog/haproxy_docker_dns_link/blog_rsyslogd/

I consider appsrv container as yours: it’s your application.

Starting up our lab


Docker assign IPs to containers in the order they are started up, incrementing last byte for each new container.

To make it simpler, let’s restart docker first, so our container IPs are predictible:

sudo /etc/rc.d/docker restart

Then let’s start up our rsyslogd container:

docker run --detach --name rsyslogd --hostname=rsyslogd \
	--publish=172.16.0.1:8514:8514/udp \
	blog:rsyslogd

And let’s attach a terminal to it:

docker attach rsyslogd

Now, run appsrv container as appsrv1:

docker run --detach --name appsrv1 --hostname=appsrv1 demo:appsrv

And finally, let’s start HAProxy, with a docker link to appsrv1:

docker run --detach --name haproxy --hostname=haproxy \
	--link appsrv1:appsrv1 \
	blog:haproxy_dns

Docker links, /etc/hosts file updated and DNS


When using the ”–link” option, docker creates a new entry in the containers /etc/hosts file with the IP address and name provided by the ”link” directive.
Docker will also update this file when the remote container (here appsrv1) IP address is changed (IE when restarting the container).

If you’re familiar with HAProxy, you already know it doesn’t do file system IOs at run time. Furthermore, HAProxy doesn’t use /etc/hosts file directly. The glibc might use it when HAProxy asks for DNS resolution when parsing the configuration file. (read below for DNS resolution at runtime)

That said, if appsrv1 IP get changed, then /etc/hosts file is updated accordingly, then HAProxy is not aware of the change and the application may fail.
A quick solution would be to reload HAProxy process in its container, to force it taking into account the new IP.

A more reliable solution, is to use HAProxy 1.6 DNS resolution capability to follow-up the IP change. With this purpose in mind, we added 2 tools into our HAProxy container:

  1. dnsmasq: tiny software which can act as a DNS server which takes /etc/hosts file as its database
  2. inotifytools: watch changes on /etc/hosts file and force dnsmasq to reload it when necessary

I guess now you got it:

  • when appsrv1 is restarted, then docker gives it a new IP
  • Docker populates then this IP address into all /etc/hosts file required (those using ”link” directives)
  • Once populated, inotify tool detect the file change and triggers a dnsmasq reload
  • HAProxy periodically (can be configured) probes DNS and will get the new IP address quickly from dnsmasq

Docker container restart and HAProxy followup in action


At this stage, we should have a container attached to rsyslogd and we should be able to see HAProxy logging. Let’s give it a try:

curl http://172.16.0.4/

Nov 17 09:29:09 172.16.0.1 haproxy[10]: 172.16.0.1:55093 [17/Nov/2015:09:29:09.729] f_myapp b_myapp/appsrv1 0/0/0/1/1 200 858 - - ---- 1/1/0/1/0 0/0 "GET / HTTP/1.1"

Now, let’s consider your dev team delivered a new version of your application, so you build it and need to restart its running container:

docker restart appsrv1

and voilà:

==> /var/log/haproxy/events <==
Nov 17 09:29:29 172.16.0.1 haproxy[10]: b_myapp/appsrv1 changed its IP from 172.16.0.3 to 172.16.0.5 by docker/dnsmasq.
Nov 17 09:29:29 172.16.0.1 haproxy[10]: b_myapp/appsrv1 changed its IP from 172.16.0.3 to 172.16.0.5 by docker/dnsmasq.

Let’s test the application again:

curl http://172.16.0.4/

Nov 17 09:29:31 172.16.0.1 haproxy[10]: 172.16.0.1:59450 [17/Nov/2015:09:29:31.013] f_myapp b_myapp/appsrv1 0/0/0/0/0 200 858 - - ---- 1/1/0/1/0 0/0 "GET / HTTP/1.1"

Limitations


There are a few limitations in this mechanism:

  • it’s painful to maintaint ”link” directive when you have a 10s or 100s or more of containers….
  • the host computer, and computers in the host network can’t easily access our containers, because we don’t know their IPs and their hostnames are resolved in HAProxy container only
  • if we want to add more appserver in HAProxy‘s farm we still need to restart HAProxy‘s container (and update configuration accordingly)

To fix some of the issues above, we can dedicate a container to perform DNS resolution within our docker world and deliver responses to any running containers or hosts in the network. We’ll see that in a next blog article

Links

What’s new in HAProxy 1.6

[ANNOUNCE] HAProxy 1.6.0 released


Yesterday, 13th of October, Willy has announced the release of HAProxy 1.6.0, after 16 months of development!
First good news is that release cycle goes a bit faster and we aim to carry on making it as fast.
A total of 1156 commits from 59 people were committed since the release of 1.5.0, 16 months ago.

Please find here the official announce: [ANNOUNCE] haproxy-1.6.0 now released!.

In his mail, Willy detailed all the features that have been added to this release. The purpose of this blog article is to highlight a few of them, providing the benefits and some configuration examples.

NOTE: Most of the features below were already backported and integrated into our HAProxy Enterprise and ALOHA products.

HAProxy Enterprise is our Open Source version of HAProxy based on HAProxy community stable branch where we backport many features from dev branch and we package it to make the most stable, reliable, advanced and secured version of HAProxy. It also comes with third party software to fill the gap between a simple HAProxy process and a load-balancer (VRRP, syslog, SNMP, Route Health Injection, etc…). Cherry on the cake, we provide “enterprise” support on top of it.

NOTE 2: the list of new features introduced here is not exhaustive. Example are proposed in a quick and dirty way to teach you how to start with the feature. Don’t run those examples in production 🙂

It’s 2015, let’s use QUOTE in configuration file


Those who uses HAProxy for a long time will be happy to know that ‘\ ‘ (backslash-space) sequence is an old painful souvenir with 1.6 🙂
We can now write:

reqirep "^Host: www.(.*)" "Host: foobar\1"

or

option httpchk GET / "HTTP/1.1\r\nHost: www.domain.com\r\nConnection: close"

Lua Scripting


Maybe the biggest change that occurred is the integration of Lua.
Quote from Lua’s website: “Lua is a powerful, fast, lightweight, embeddable scripting language.“.

Basically, everyone has now the ability to extend HAProxy by writing and running their own Lua scripts. No need to write C code, maintain patches, etc…
If some Lua snipplets are very popular, we may write the equivalent feature in C and make it available in HAProxy mainline.

One of the biggest challenge Thierry faced when integrating Lua is to give it the ability to propose non-blocking processing of Lua code and non-blocking network socket management.

HAProxy requires Lua 5.3 or above.

With Lua, we can add new functions to the following HAProxy elements:

  • Service
  • Action
  • Sample-fetch
  • Converter

Compiling HAProxy and Lua


Installing LUA 5.3

cd /usr/src
curl -R -O http://www.lua.org/ftp/lua-5.3.0.tar.gz
tar zxf lua-5.3.0.tar.gz
cd lua-5.3.0
make linux
sudo make INSTALL_TOP=/opt/lua53 install

LUA 5.3 library and include files are now installed in /opt/lua53.

Compiling HAProxy with Lua support

make TARGET=linux2628 USE_OPENSSL=1 USE_PCRE=1 USE_LUA=1 LUA_LIB=/opt/lua53/lib/ LUA_INC=/opt/lua53/include/

HAProxy/Lua simple Hello world example


A simple Hello world! in Lua could be written like this:

  • The lua code in a file called hello_world.lua:
    core.register_service("hello_world", "tcp", function(applet)
       applet:send("hello world\n")
    end)
  • The haproxy configuration
    global
       lua-load hello_world.lua
    
    listen proxy
       bind 127.0.0.1:10001
       tcp-request content use-service lua.hello_world
    

More with Lua


Please read the doc, ask your questions on HAProxy‘s ML (no registration needed): haproxy@formilux.org.
Of course, we’ll write more articles later on this blog about Lua integration.

Captures


HAProxy‘s running context is very important when writing configuration. In HAProxy, each context is isolated. IE: you can’t use a request header when processing the response.
With HAProxy 1.6, this is now possible: you can declare capture slots, store data in it and use it at any time during a session.

defaults 
 mode http

frontend f_myapp
 bind :9001
 declare capture request len 32 # id=0 to store Host header
 declare capture request len 64 # id=1 to store User-Agent header
 http-request capture req.hdr(Host) id 0
 http-request capture req.hdr(User-Agent) id 1
 default_backend b_myapp

backend b_myapp
 http-response set-header Your-Host %[capture.req.hdr(0)]
 http-response set-header Your-User-Agent %[capture.req.hdr(1)]
 server s1 10.0.0.3:4444 check

Two new headers are inserted in the response:
Your-Host: 127.0.0.1:9001
Your-User-Agent: curl/7.44.0

Multiprocess, peers and stick-tables


In 1.5, we introduced “peers” to synchronize stick-table content between HAProxy servers. This feature was not compatible with multi-process mode.
We can now, in 1.6, synchronize content of the table it is stick to one process. It allows creating configurations for massive SSL processing pointing to a single backend sticked on a single process where we can use stick tables and synchronize its content.

peers article
 peer itchy 127.0.0.1:1023

global
 pidfile /tmp/haproxy.pid
 nbproc 3

defaults
 mode http

frontend f_scalessl
 bind-process 1,2
 bind :9001 ssl crt /home/bassmann/haproxy/ssl/server.pem
 default_backend bk_lo

backend bk_lo
 bind-process 1,2
 server f_myapp unix@/tmp/f_myapp send-proxy-v2

frontend f_myapp
 bind-process 3
 bind unix@/tmp/f_myapp accept-proxy
 default_backend b_myapp

backend b_myapp
 bind-process 3
 stick-table type ip size 10k peers article
 stick on src
 server s1 10.0.0.3:4444 check

Log


log-tag


It is now possible to position a syslog tag per process, frontend or backend. Purpose is to ease the job of syslog servers when classifying logs.
If no log-tag is provided, the default value is the program name.

Example applied to the configuration snipplet right above:

frontend f_scalessl
 log-tag SSL
[...]

frontend f_myapp
 log-tag CLEAR
[...]

New log format variables


New log format variables have appeared:

  • %HM: HTTP method (ex: POST)
  • %HP: HTTP request URI without query string (path)
  • %HQ: HTTP request URI query string (ex: ?bar=baz)
  • %HU: HTTP request URI (ex: /foo?bar=baz)
  • %HV: HTTP version (ex: HTTP/1.0)

Server IP resolution using DNS at runtime


In 1.5 and before, HAProxy performed DNS resolution when parsing configuration, in a synchronous mode and using the glibc (hence /etc/resolv.conf file).
Now, HAProxy can perform DNS resolution at runtime, in an asynchronous way and update server IP on the fly. This is very convenient in environment like Docker or Amazon Web Service where server IPs can be changed at any time.
Configuration example applied to docker. A dnsmasq is used as an interface between /etc/hosts file (where docker stores server IPs) and HAProxy:

resolvers docker
 nameserver dnsmasq 127.0.0.1:53

defaults
 mode http
 log global
 option httplog

frontend f_myapp
 bind :80
 default_backend b_myapp

backend b_myapp
 server s1 nginx1:80 check resolvers docker resolve-prefer ipv4

Then, let’s restart s1 with the command “docker restart nginx1” and let’s have a look at the magic in the logs:
(...) haproxy[15]: b_myapp/nginx1 changed its IP from 172.16.0.4 to 172.16.0.6 by docker/dnsmasq.

HTTP rules


New HTTP rules have appeared:

  • http-request: capture, set-method, set-uri, set-map, set-var, track-scX, sc-in-gpc0, sc-inc-gpt0, silent-drop
  • http-response: capture, set-map, set-var, sc-inc-gpc0, sc-set-gpt0, silent-drop, redirect

Variables


We often used HTTP header fields to store temporary data in HAProxy. With 1.6, we can now define variables.
A variable is available for a scope: session, transaction (request or response), request, response.
A variable name is prefixed by its scope (sess, txn, req, res), a dot ‘.’ and a tag composed by ‘a-z’, ‘A-Z’, ‘0-9’ and ‘_’.

Let’s rewrite the capture example using variables

global
 # variables memory consumption, in bytes
 tune.vars.global-max-size 1048576
 tune.vars.reqres-max-size     512
 tune.vars.sess-max-size      2048
 tune.vars.txn-max-size        256

defaults
 mode http

frontend f_myapp
 bind :9001
 http-request set-var(txn.host) req.hdr(Host)
 http-request set-var(txn.ua) req.hdr(User-Agent)
 default_backend b_myapp

backend b_myapp
 http-response set-header Your-Host %[var(txn.host)]
 http-response set-header Your-User-Agent %[var(txn.ua)]
 server s1 10.0.0.3:4444 check

mailers


Now, HAProxy can send emails when server states change (mainly goes DOWN), so your sysadmins/devos won’t sleep anymore :). It used to be able to log only before.

mailers mymailers
 mailer smtp1 192.168.0.1:587 
 mailer smtp2 192.168.0.2:587 
 
backend mybackend 
 mode tcp 
 balance roundrobin
 email-alert mailers mymailers
 email-alert from test1@horms.org
 email-alert to test2@horms.org
 server srv1 192.168.0.30:80
 server srv2 192.168.0.31:80

Processing of HTTP request body


Until 1.5 included, HAProxy could process only HTTP request headers. It can now access request body.
Simply enable the statement below in your frontend or backend to give HAProxy this ability:

option http-buffer-request

Protection against slow-POST attacks


slow-POST attacks are like the slowlorys one, except that HTTP header are sent quicly, but request body are sent very slowly.
Once enabled, the timeout http-request parameters also apply to the POSTED data.

Fetch methods


A few new fetch methods now exists to play with the body: req.body, req.body_param, req.body_len, req.body_size, etc…

Short example to detect “SELECT *” string in a request POST body, and of course how to deny it:

defaults
 mode http

frontend f_mywaf
 bind :9001
 option http-buffer-request
 http-request deny if { req.body -m reg "SELECT \*" }
 default_backend b_myapp

backend b_myapp
 server s1 10.0.0.3:4444 check

New converters


1.5 introduced the converters but only a very few of them were available.
1.6 adds many of them. The list is too long, but let’s give the most important ones: json, in_table, field, reg_sub, table_* (to access counters from stick-tables), etc…

Device Identification


Through our company, we have some customer who want us to integrate into HAProxy the ability to detect device type and characteristics and report it to the backend server.
We got a couple of contributions from 2 companies experts in this domain: 51 degrees and deviceatlas.
You can now load those libraries in HAProxy in order to fully qualify a client capabilities and set up some headers your application server can rely on to adapt content delivered to the client or let the varnish cache server use it to cache multiple flavor of the same object based on client capabilities.

More on this blog later on how to integrate each product.

Seamless server states


Prior 1.6, when being reloaded, HAProxy considers all the servers are UP until the first check is performed.
Since 1.6, we can dump server states into a flat file right before performing the reload and let the new process know where the states are stored. That way, the old and new processes owns exactly the same server states (hence seamless).
The following information are reported:

  • server IP address when resolved by DNS
  • operational state (UP/DOWN/…)
  • administrative state (MAINT/DRAIN/…)
  • Weight (including slowstart relative weight)
  • health check status
  • rise / fall current counter
  • check state (ENABLED/PAUSED/…)
  • agent check state (ENABLED/PAUSED/…)

The state could be applied globally (all server found) or per backend.
Example:
Simple HAProxy configuration:

global
 stats socket /tmp/socket
 server-state-file /tmp/server_state

backend bk
 load-server-state-from-file global
 server s1 10.0.0.3:4444 check weight 11
 server s2 10.0.0.4:4444 check weight 12

Before reloading HAProxy, we save the server states using the following command:

socat /tmp/socket - <<< "show servers state" > /tmp/server_state

Here is the content of /tmp/server_state file:

1
# <field names skipped for the blog article>
1 bk 1 s1 10.0.0.3 2 0 11 11 4 6 3 4 6 0 0
1 bk 2 s2 10.0.0.4 2 0 12 12 4 6 3 4 6 0 0

Now, let’s proceed with reload as usual.
Of course, the best option is to export the server states using the init script.

External check


HAProxy can run a script to perform complicated health checks.
Just be aware about the security concerns when enabling this feature!!

Configuration:

global
 external-check

backend b_myapp
 external-check path "/usr/bin:/bin"
 external-check command /bin/true
 server s1 10.0.0.3:4444 check

TLS / SSL


NOTE: some of the features introduced here may need a recent openssl library.

Detection of ECDSA-able clients


This has already been documented by Nenad on this blog: Serving ECC and RSA certificates on same IP with HAProxy

SSL certificate forgery on the fly


Since 1.6, HAProxy can forge SSL certificate on the fly!
Yes, you can use HAProxy with your company’s CA to inspect content.

Support of Certificate Transparency (RFC6962) TLS extension


When loading PEM files, HAProxy also checks for the presence of file at the same path suffixed by “.sctl”. If such file is found, support for Certificate Transparency (RFC6962) TLS extension is enabled.
The file must contain a valid Signed Certificate Timestamp List, as described in RFC. File is parsed to check basic syntax, but no signatures are verified.

TLS Tickets key load through stats socket


A new stats socket command is available to update TLS ticket keys at runtime. The new key is used for encryption/decryption while the old ones are used for decryption only.

Server side SNI


Many application servers now takes benefit from Server Name Indication TLS extension.
Example:

backend b_myapp_ssl
 mode http
 server s1 10.0.0.3:4444 check ssl sni req.hdr(Host)

Peers v2


The peers protocol, used to synchronized data from stick-tables between two HAProxy servers can now synchronise more than just the sample and the server-id.
It can synchronize all the tracked counters. Note that each node push its local counter to a peer. So this must be used for safe reload and server failover only.
Don’t expect to see 10 HAProxy servers to sync and aggregate counters in real time.

That said, this protocol has been extended to support different data type, so we may see more features soon relying on it 😉

HTTP connection sharing


On the road to HTTP/2, HAProxy must be able to support connection pooling.
And on the road to the connection pools, we have the ability to share a server side connection between multiple clients.
The server side connection may be used by multiple clients, until the owner (the client side connection) of this session dies. Then new connections may be established.

The new keyword is http-reuse and have different level of sharing connections:

  • never: no connections are shared
  • safe: first request of each client is sent over its own connection, subsequent request may used an other connection. It works like a regular HTTP keepalive
  • aggressive: send request to connections that has proved reliably support connection reuse (no quick connection close after a response has been sent)
  • always: send request to established connections, whatever happens. If the server was closing the connection in the mean time, the request is lost and the client must resend it.

Get rid of 408 in logs


Simply use the new option

option http-ignore-probes

.

Web application name to backend mapping in HAProxy

Synopsis

Let’s take a web application platform where many HTTP Host header points to.
Of course, this platform hosts many backends and HAProxy is used to perform content switching based on the Host header to route HTTP traffic to each backend.

HAProxy map


HAProxy 1.5 introduced a cool feature: converters. One converter type is map.
Long story made short: a map allows to map a data in input to an other one on output.

A map is stored in a flat file which is loaded by HAProxy on startup. It is composed by 2 columns, on the left the input string, on the right the output one:

in out

Basically, if you call the map above and give it the in strings, it will return out.

Mapping

Now, the interesting part of the article 🙂

As stated in introduction, we want to map hundreds of Host headers to tens of backends.

The old way of mapping: acl and use_backend rules

Before the map, we had to use acls and use_backend rules.

like below:

frontend ft_allapps
 [...]
 use_backend bk_app1 if { hdr(Host) -i app1.domain1.com app1.domain2.com }
 use_backend bk_app2 if { hdr(Host) -i app2.domain1.com app2.domain2.com }
 default_backend bk_default

Add one statement per use_backend rule.

This works nicely for a few backends and a few domain names. But this type of configuration is hardly scallable…

The new way of mapping: one map and one use_backend rule

Now we can use map to achieve the same purpose.

First, let’s create a map file called domain2backend.map, with the following content: on the left, the domain name, on the right, the backend name:

#domainname  backendname
app1.domain1.com bk_app1
app1.domain2.com bk_app1
app2.domain1.com bk_app2
app2.domain2.com bk_app2

And now, HAProxy configuration:

frontend ft_allapps
 [...]
 use_backend %[req.hdr(host),lower,map_dom(/etc/hapee-1.5/domain2backend.map,bk_default)]

Here is what HAProxy will do:

  1. req.hdr(host) ==> fetch the Host header from the HTTP request
  2. lower ==> convert the string into lowercase
  3. map_dom(/etc/hapee-1.5/domain2backend.map) ==> look for the lowercase Host header in the map and return the backend name if found. If not found, the name of a default backend is returned
  4. route traffic to the backend name returned by the map

Now, adding a new content switching rule means just add one new line in the map content (and reload HAProxy). No regexes, map data is stored in a tree, so processing time is very low compared to matching many string in many ACLs for many use_backend rules.

simple is beautiful!!!

HAProxy map content auto update


If you are an HAPEE user (and soon available for the ALOHA), you can use the lb-update content to download the content of the map automatically.
Add the following statement in your configuration:

dynamic-update
 update id domain2backend.map url https://10.0.0.1/domain2backend.map delay 60s timeout 5s retries 3 map

Links

HAProxy, high mysql request rate and TCP source port exhaustion

Synopsys


At HAProxy Technologies, we do provide professional services around HAPRoxy: this includes HAProxy itself, of course, but as well the underlying OS tuning, advice and recommendation about the architecture and sometimes we can also help customers troubleshooting application layer issues.
We don’t fix issues for the customer, but using information provided by HAProxy, we are able to reduce the investigation area, saving customer’s time and money.
The story I’m relating today is issued of one of this PS.

One of our customer is an hosting company which hosts some very busy PHP / MySQL websites. They used successfully HAProxy in front of their application servers.
They used to have a single MySQL server which was some kind of SPOF and which had to handle several thousands requests per seconds.
Sometimes, they had issues with this DB: it was like the clients (hence the Web servers) can’t hangs when using the DB.

So they decided to use MySQL replication and build an active/passive cluster. They also decided to split reads (SELECT queries) and writes (DELETE, INSERT, UPDATE queries) at the application level.
Then they were able to move the MySQL servers behind HAProxy.

Enough for the introduction 🙂 Today’s article will discuss about HAProxy and MySQL at high request rate, and an error some of you may already have encountered: TCP source port exhaustion (the famous high number of sockets in TIME_WAIT).

Diagram


So basically, we have here a standard web platform which involves HAProxy to load-balance MySQL:
haproxy_mysql_replication

The MySQL Master server is used to send WRITE requests and the READ request are “weighted-ly” load-balanced (the slaves have a higher weight than the master) against all the MySQL servers.

MySql scalability

One way of scaling MySQL, is to use the replication method: one MySQL server is designed as master and must manages all the write operations (DELETE, INSERT, UPDATE, etc…). for each operation, it notifies all the MySQL slave servers. We can use slaves for reading only, offloading these types of requests from the master.
IMPORTANT NOTE: The replication method allows scalability of the read part, so if your application require much more writes, then this is not the method for you.

Of course, one MySQL slave server can be designed as master when the master fails! This also ensure MySQL high-availability.

So, where is the problem ???

This type of platform works very well for the majority of websites. The problem occurs when you start having a high rate of requests. By high, I mean several thousands per second.

TCP source port exhaustion

HAProxy works as a reverse-proxy and so uses its own IP address to get connected to the server.
Any system has around 64K TCP source ports available to get connected to a remote IP:port. Once a combination of “source IP:port => dst IP:port” is in use, it can’t be re-used.
First lesson: you can’t have more than 64K opened connections from a HAProxy box to a single remote IP:port couple. I think only people load-balancing MS Exchange RPC services or sharepoint with NTLM may one day reach this limit…
(well, it is possible to workaround this limit using some hacks we’ll explain later in this article)

Why does TCP port exhaustion occur with MySQL clients???


As I said, the MySQL request rate was a few thousands per second, so we never ever reach this limit of 64K simultaneous opened connections to the remote service…
What’s up then???
Well, there is an issue with MySQL client library: when a client sends its “QUIT” sequence, it performs a few internal operations before immediately shutting down the TCP connection, without waiting for the server to do it. A basic tcpdump will show it to you easily.
Note that you won’t be able to reproduce this issue on a loopback interface, because the server answers fast enough… You must use a LAN connection and 2 different servers.

Basically, here is the sequence currently performed by a MySQL client:

Mysql Client ==> "QUIT" sequence ==> Mysql Server
Mysql Client ==>       FIN       ==> MySQL Server
Mysql Client <==     FIN ACK     <== MySQL Server
Mysql Client ==>       ACK       ==> MySQL Server

Which leads the client connection to remain unavailable for twice the MSL (Maximum Segment Life) time, which means 2 minutes.
Note: this type of close has no negative impact when the connection is made over a UNIX socket.

Explication of the issue (much better that I could explain it myself):
“There is no way for the person who sent the first FIN to get an ACK back for that last ACK. You might want to reread that now. The person that initially closed the connection enters the TIME_WAIT state; in case the other person didn’t really get the ACK and thinks the connection is still open. Typically, this lasts one to two minutes.” (Source)

Since the source port is unavailable for the system for 2 minutes, this means that over 534 MySQL requests per seconds you’re in danger of TCP source port exhaustion: 64000 (available ports) / 120 (number of seconds in 2 minutes) = 533.333.
This TCP port exhaustion appears on the MySQL client server itself, but as well on the HAProxy box because it forwards the client traffic to the server… And since we have many web servers, it happens much faster on the HAProxy box !!!!

Remember: at spike traffic, my customer had a few thousands requests/s….

How to avoid TCP source port exhaustion?


Here comes THE question!!!!
First, a “clean” sequence should be:

Mysql Client ==> "QUIT" sequence ==> Mysql Server
Mysql Client <==       FIN       <== MySQL Server
Mysql Client ==>     FIN ACK     ==> MySQL Server
Mysql Client <==       ACK       <== MySQL Server

Actually, this sequence happens when both MySQL client and server are hosted on the same box and uses the loopback interface, that’s why I said sooner that if you want to reproduce the issue you must add “latency” between the client and the server and so use 2 boxes over the LAN.
So, until MySQL rewrite the code to follow the sequence above, there won’t be any improvement here!!!!

Increasing source port range


By default, on a Linux box, you have around 28K source ports available (for a single destination IP:port):

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    61000

In order to get 64K source ports, just run:

$ sudo sysctl net.ipv4.ip_local_port_range="1025 65000"

And don’t forget to update your /etc/sysctl.conf file!!!

Note: this should definitively be applied also on the web servers….

Allow usage of source port in TIME_WAIT


A few sysctls can be used to tell the kernel to reuse faster the connection in TIME_WAIT:

net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_recycle

tw_reuse can be used safely, be but careful with tw_recycle… It could have side effects. Same people behind a NAT might be able to get connected on the same device. So only use if your HAProxy is fully dedicated to your MySql setup.

anyway, these sysctls were already properly setup (value = 1) on both HAProxy and web servers.

Note: this should definitively be applied also on the web servers….
Note 2: tw_reuse should definitively be applied also on the web servers….

Using multiple IPs to get connected to a single server


In HAProxy configuration, you can precise on the server line the source IP address to use to get connected to a server, so just add more server lines with different IPs.
In the example below, the IPs 10.0.0.100 and 10.0.0.101 are configured on the HAProxy box:

[...]
  server mysql1     10.0.0.1:3306 check source 10.0.0.100
  server mysql1_bis 10.0.0.1:3306 check source 10.0.0.101
[...]

This allows us to open up to 128K source TCP port…
The kernel is responsible to affect a new TCP port when HAProxy requests it. Dispite improving things a bit, we still reach some source port exhaustion… We could not get over 80K connections in TIME_WAIT with 4 source IPs…

Let HAProxy manage TCP source ports


You can let HAProxy decides which source port to use when opening a new TCP connection, on behalf of the kernel. To address this topic, HAProxy has built-in functions which make it more efficient than a regular kernel.

Let’s update the configuration above:

[...]
  server mysql1     10.0.0.1:3306 check source 10.0.0.100:1025-65000
  server mysql1_bis 10.0.0.1:3306 check source 10.0.0.101:1025-65000
[...]

We managed to get 170K+ connections in TIME_WAIT with 4 source IPs… and not source port exhaustion anymore !!!!

Use a memcache


Fortunately, the devs from this customer are skilled and write flexible code 🙂
So they managed to move some requests from the MySQL DB to a memcache, opening much less connections.

Use MySQL persistant connections


This could prevent fine Load-Balancing on the Read-Only farm, but it would be very efficient on the MySQL master server.

Conclusion

  • If you see some SOCKERR information messages in HAProxy logs (mainly on health check), you may be running out of TCP source ports.
  • Have skilled developers who write flexible code, where moving from a DB to an other is made easy
  • This kind of issue can happen only with protocols or applications which make the client closing the connection first
  • This issue can’t happen on HAProxy in HTTP mode, since it let the server closes the connection before sending a TCP RST

Links

Websockets load-balancing with HAProxy

Why Websocket ???


HTTP protocol is connection-less and only the client can request information from a server. In any case, a server can contact a client… HTTP is purely half-duplex. Furthermore, a server can answer only one time to a client request.
Some websites or web applications require the server to update client from time to time. There were a few ways to do so:

  • the client request the server at a regular interval to check if there is a new information available
  • the client send a request to the server and the server answers as soon as he has an information to provide to the client (also known as long time polling)

But those methods have many drawbacks due to HTTP limitation.
So a new protocol has been designed: websockets, which allows a 2 ways communication (full duplex) between a client and a server, over a single TCP connection. Furthermore, websockets re-use the HTTP connection it was initialized on, which means it uses the standard TCP port.

How does websocket work ???

Basically, a websocket start with a HTTP request like the one below:

GET / HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: avkFOZvLE0gZTtEyrZPolA==
Host: localhost:8080
Sec-WebSocket-Protocol: echo-protocol

The most important part is the “Connection: Upgrade” header which let the client know to the server it wants to change to an other protocol, whose name is provided by “Upgrade: websocket” header.

When a server with websocket capability receive the request above, it would answer a response like below:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: tD0l5WXr+s0lqKRayF9ABifcpzY=
Sec-WebSocket-Protocol: echo-protocol

The most important part is the status code 101 which acknoledge the protocol switch (from HTTP to websocket) as well as the “Connection: Upgrade” and “Upgrade: websocket” headers.

From now, the TCP connection used for the HTTP request/response challenge is used for the websocket: whenever a peer wants to interact with the other peer, it can use the it.

The socket finishes as soon as one peer decides it or the TCP connection is closed.

HAProxy and Websockets


As seen above, there are 2 protocols embedded in websockets:

  1. HTTP: for the websocket setup
  2. TCP: websocket data exchange

HAProxy must be able to support websockets on these two protocols without breaking the TCP connection at any time.
There are 2 things to take care of:

  1. being able to switch a connection from HTTP to TCP without breaking it
  2. smartly manage timeouts for both protocols at the same time

Fortunately, HAProxy embeds all you need to load-balance properly websockets and can meet the 2 requirements above.
It can even route regular HTTP traffic from websocket traffic to different backends and perform websocket aware health check (setup phase only).

The diagram below shows how things happens and HAProxy timeouts involved in each phase:
diagram websocket

During the setup phase, HAProxy can work in HTTP mode, processing layer 7 information. It detects automatically the Connection: Upgrade exchange and is ready to switch to tunnel mode if the upgrade negotiation succeeds. During this phase, there are 3 timeouts involved:

  1. timeout client: client inactivity
  2. timeout connect: allowed TCP connection establishment time
  3. timeout server: allowed time to the server to process the request

If everything goes well, the websocket is established, then HAProxy fails over to tunnel mode, no data is analyzed anymore (and anyway, websocket does not speak HTTP). There is a single timeout invloved:

  1. timeout tunnel: take precedence over client and server timeout

timeout connect is not used since the TCP connection is already established 🙂

Testing websocket with node.js

node.js is a platform which can host applications. It owns a websocket module we’ll use in the test below.

Here is the procedure to install node.js and the websocket module on Debian Squeeze.
Example code is issued from https://github.com/Worlize/WebSocket-Node, at the bottom of the page.

So basically, I’ll have 2 servers, each one hosting web pages on Apache and an echo application on websocket application hosted by nodejs. High-availability and routing is managed by HAProxy.

Configuration


Simple configuration


In this configuration, the websocket and the web server are on the same application.
HAProxy switches automatically from HTTP to tunnel mode when the client request a websocket.

defaults
  mode http
  log global
  option httplog
  option  http-server-close
  option  dontlognull
  option  redispatch
  option  contstats
  retries 3
  backlog 10000
  timeout client          25s
  timeout connect          5s
  timeout server          25s
# timeout tunnel available in ALOHA 5.5 or HAProxy 1.5-dev10 and higher
  timeout tunnel        3600s
  timeout http-keep-alive  1s
  timeout http-request    15s
  timeout queue           30s
  timeout tarpit          60s
  default-server inter 3s rise 2 fall 3
  option forwardfor

frontend ft_web
  bind 192.168.10.3:80 name http
  maxconn 10000
  default_backend bk_web

backend bk_web                      
  balance roundrobin
  server websrv1 192.168.10.11:8080 maxconn 10000 weight 10 cookie websrv1 check
  server websrv2 192.168.10.12:8080 maxconn 10000 weight 10 cookie websrv2 check

Advanced Configuration


The configuration below allows to route requests based on either Host header (if you have a dedicated host for your websocket calls) or Connection and Upgrade header (required to switch to websocket).
In the backend dedicated to websocket, HAProxy validates the setup phase and also ensure the user is requesting a right application name.
HAProxy also performs a websocket health check, sending a Connection upgrade request and expecting a 101 response status code. We can’t go further for now on the health check for now.
Optional: the web server is hosted on Apache, but could be hosted by node.js as well.

defaults
  mode http
  log global
  option httplog
  option  http-server-close
  option  dontlognull
  option  redispatch
  option  contstats
  retries 3
  backlog 10000
  timeout client          25s
  timeout connect          5s
  timeout server          25s
# timeout tunnel available in ALOHA 5.5 or HAProxy 1.5-dev10 and higher
  timeout tunnel        3600s
  timeout http-keep-alive  1s
  timeout http-request    15s
  timeout queue           30s
  timeout tarpit          60s
  default-server inter 3s rise 2 fall 3
  option forwardfor



frontend ft_web
  bind 192.168.10.3:80 name http
  maxconn 60000

## routing based on Host header
  acl host_ws hdr_beg(Host) -i ws.
  use_backend bk_ws if host_ws

## routing based on websocket protocol header
  acl hdr_connection_upgrade hdr(Connection)  -i upgrade
  acl hdr_upgrade_websocket  hdr(Upgrade)     -i websocket

  use_backend bk_ws if hdr_connection_upgrade hdr_upgrade_websocket
  default_backend bk_web



backend bk_web                                                   
  balance roundrobin                                             
  option httpchk HEAD /                                          
  server websrv1 192.168.10.11:80 maxconn 100 weight 10 cookie websrv1 check
  server websrv2 192.168.10.12:80 maxconn 100 weight 10 cookie websrv2 check



backend bk_ws                                                    
  balance roundrobin

## websocket protocol validation
  acl hdr_connection_upgrade hdr(Connection)                 -i upgrade
  acl hdr_upgrade_websocket  hdr(Upgrade)                    -i websocket
  acl hdr_websocket_key      hdr_cnt(Sec-WebSocket-Key)      eq 1
  acl hdr_websocket_version  hdr_cnt(Sec-WebSocket-Version)  eq 1
  http-request deny if ! hdr_connection_upgrade ! hdr_upgrade_websocket ! hdr_w
ebsocket_key ! hdr_websocket_version

## ensure our application protocol name is valid 
## (don't forget to update the list each time you publish new applications)
  acl ws_valid_protocol hdr(Sec-WebSocket-Protocol) echo-protocol
  http-request deny if ! ws_valid_protocol

## websocket health checking
  option httpchk GET / HTTP/1.1rnHost:\ ws.domain.comrnConnection:\ Upgrade\r\nUpgrade:\ websocket\r\nSec-WebSocket-Key:\ haproxy\r\nSec-WebSocket-Version:\ 13\r\nSec-WebSocket-Protocol:\ echo-protocol
  http-check expect status 101

  server websrv1 192.168.10.11:8080 maxconn 30000 weight 10 cookie websrv1 check
  server websrv2 192.168.10.12:8080 maxconn 30000 weight 10 cookie websrv2 check

Note that HAProxy could also be used to select different Websocket application based on the Sec-WebSocket-Protocol header of the setup phase. (later I’ll write the article about it).

Related links

Links

high performance WAF platform with Naxsi and HAProxy

Synopsis

I’ve already described WAF in a previous article, where I spoke about WAF scalability with apache and modsecurity.
One of the main issue with Apache and modsecurity is the performance. To address this issue, an alternative exists: naxsi, a Web Application Firewall module for nginx.

So using Naxsi and HAProxy as a load-balancer, we’re able to build a platform which meets the following requirements:

  • Web Application Firewall: achieved by Apache and modsecurity
  • High-availability: application server and WAF monitoring, achieved by HAProxy
  • Scalability: ability to adapt capacity to the upcoming volume of traffic, achieved by HAProxy
  • DDOS protection: blind and brutal attacks protection, slowloris protection, achieved by HAProxy
  • Content-Switching: ability to route only dynamic requests to the WAF, achieved by HAProxy
  • Reliability: ability to detect capacity overusage, this is achieved by HAProxy
  • Performance: deliver response as fast as possible, achieved by the whole platform

The picture below provides a better overview:

The LAB platform is composed by 6 boxes:

  • 2 ALOHA Load-Balancers (could be replaced by HAProxy 1.5-dev)
  • 2 WAF servers: CentOS 6.0, nginx and Naxsi
  • 2 Web servers: Debian + apache + PHP + dokuwiki

Nginx and Naxsi installation on CentOS 6

Purpose of this article is not to provide such procedue. So please read this wiki article which summarizes how to install nginx and naxsi on CentOS 6.0.

Diagram

The diagram below shows the platform with HAProxy frontends (prefixed by ft_) and backends (prefixed by bk_). Each farm is composed by 2 servers.

Configuration

Nginx and Naxsi


Configure nginx as a reverse-proxy which listen in bk_waf and forward traffic to ft_web. In the mean time, naxsi is there to analyze the requests.

server {
 proxy_set_header Proxy-Connection "";
 listen       192.168.10.15:81;
 access_log  /var/log/nginx/naxsi_access.log;
 error_log  /var/log/nginx/naxsi_error.log debug;

 location / {
  include    /etc/nginx/test.rules;
  proxy_pass http://192.168.10.2:81/;
 }

 error_page 403 /403.html;
 location = /403.html {
  root /opt/nginx/html;
  internal;
 }

 location /RequestDenied {
  return 403;
 }
}

HAProxy Load-Balancer configuration


The configuration below allows the following advanced features:

  • DDOS protection on the frontend
  • abuser or attacker detection in bk_waf and blocking on the public interface (ft_waf)
  • Bypassing WAF when overusage or unavailable
######## Default values for all entries till next defaults section
defaults
  option  http-server-close
  option  dontlognull
  option  redispatch
  option  contstats
  retries 3
  timeout connect 5s
  timeout http-keep-alive 1s
  # Slowloris protection
  timeout http-request 15s
  timeout queue 30s
  timeout tarpit 1m          # tarpit hold tim
  backlog 10000

# public frontend where users get connected to
frontend ft_waf
  bind 192.168.10.2:80 name http
  mode http
  log global
  option httplog
  timeout client 25s
  maxconn 10000

  # DDOS protection
  # Use General Purpose Couter (gpc) 0 in SC1 as a global abuse counter
  # Monitors the number of request sent by an IP over a period of 10 seconds
  stick-table type ip size 1m expire 1m store gpc0,http_req_rate(10s),http_err_rate(10s)
  tcp-request connection track-sc1 src
  tcp-request connection reject if { sc1_get_gpc0 gt 0 }
  # Abuser means more than 100reqs/10s
  acl abuse sc1_http_req_rate(ft_web) ge 100
  acl flag_abuser sc1_inc_gpc0(ft_web)
  tcp-request content reject if abuse flag_abuser

  acl static path_beg /static/ /dokuwiki/images/
  acl no_waf nbsrv(bk_waf) eq 0
  acl waf_max_capacity queue(bk_waf) ge 1
  # bypass WAF farm if no WAF available
  use_backend bk_web if no_waf
  # bypass WAF farm if it reaches its capacity
  use_backend bk_web if static waf_max_capacity
  default_backend bk_waf

# WAF farm where users' traffic is routed first
backend bk_waf
  balance roundrobin
  mode http
  log global
  option httplog
  option forwardfor header X-Client-IP
  option httpchk HEAD /waf_health_check HTTP/1.0

  # If the source IP generated 10 or more http request over the defined period,
  # flag the IP as abuser on the frontend
  acl abuse sc1_http_err_rate(ft_waf) ge 10
  acl flag_abuser sc1_inc_gpc0(ft_waf)
  tcp-request content reject if abuse flag_abuser

  # Specific WAF checking: a DENY means everything is OK
  http-check expect status 403
  timeout server 25s
  default-server inter 3s rise 2 fall 3
  server waf1 192.168.10.15:81 maxconn 100 weight 10 check
  server waf2 192.168.10.16:81 maxconn 100 weight 10 check

# Traffic secured by the WAF arrives here
frontend ft_web
  bind 192.168.10.2:81 name http
  mode http
  log global
  option httplog
  timeout client 25s
  maxconn 1000
  # route health check requests to a specific backend to avoid graph pollution in ALOHA GUI
  use_backend bk_waf_health_check if { path /waf_health_check }
  default_backend bk_web

# application server farm
backend bk_web
  balance roundrobin
  mode http
  log global
  option httplog
  option forwardfor
  cookie SERVERID insert indirect nocache
  default-server inter 3s rise 2 fall 3
  option httpchk HEAD /
  # get connected on the application server using the user ip
  # provided in the X-Client-IP header setup by ft_waf frontend
  source 0.0.0.0 usesrc hdr_ip(X-Client-IP)
  timeout server 25s
  server server1 192.168.10.11:80 maxconn 100 weight 10 cookie server1 check
  server server2 192.168.10.12:80 maxconn 100 weight 10 cookie server2 check

# backend dedicated to WAF checking (to avoid graph pollution)
backend bk_waf_health_check
  balance roundrobin
  mode http
  log global
  option httplog
  option forwardfor
  default-server inter 3s rise 2 fall 3
  timeout server 25s
  server server1 192.168.10.11:80 maxconn 100 weight 10 check
  server server2 192.168.10.12:80 maxconn 100 weight 10 check

Detecting attacks


On the load-balancer


The ft_waf frontend stick table tracks two information: http_req_rate and http_err_rate which are respectively the http request rate and the http error rate generated by a single IP address.
HAProxy would automatically block an IP which has generated more than 100 requests over a period of 10s or 10 errors (WAF detection 403 responses included) in 10s. The user is blocked for 1 minute as long as he keeps on abusing.
Of course, you can setup above values to whatever you need: it is fully flexible.

To know the status of IPs in your load-balancer, just run the command below:

echo show table ft_waf | socat /var/run/haproxy.stat - 
# table: ft_waf, type: ip, size:1048576, used:1
0xc33304: key=192.168.10.254 use=0 exp=4555 gpc0=0 http_req_rate(10000)=1 http_err_rate(10000)=1

Note: The ALOHA Load-balancer does not provide watch tool, but you can monitor the content of the table in live with the command below:

while true ; do echo show table ft_waf | socat /var/run/haproxy.stat - ; sleep 2 ; clear ; done

On the Waf


Every Naxsi error log appears in /var/log/nginx/naxsi_error.log. IE:

2012/10/16 13:40:13 [error] 10556#0: *10293 NAXSI_FMT: ip=192.168.10.254&server=192.168.10.15&uri=/testphp.vulnweb.com/artists.php&total_processed=3195&total_blocked=2&zone0=ARGS&id0=1000&var_name0=artist, client: 192.168.10.254, server: , request: "GET /testphp.vulnweb.com/artists.php?artist=0+div+1+union%23foo*%2F*bar%0D%0Aselect%23foo%0D%0A1%2C2%2Ccurrent_user HTTP/1.1", host: "192.168.10.15:81"

Naxsi log line is less obvious than modsecurity one. The rule which matched os provided by the argument idX=abcde.
No false positive during the test, I had to build a request to make Naxsi match it 🙂 .

conclusion


Today, we saw it’s easy to build a scalable and performing WAF platform in front of any web application.
The WAF is able to communicate to HAProxy which IPs to automatically blacklist (throuth error rate monitoring), which is convenient since the attacker won’t bother the WAF for a certain amount of time 😉
The platform allows to detect WAF farm availability and to bypass it in case of total failure, we even saw it is possible to bypass the WAF for static content if the farm is running out of capacity. Purpose is to deliver a good end-user experience without dropping too much the security.
Note that it is possible to route all the static content to the web servers (or a static farm) directly, whatever the status of the WAF farm.
This make me say that the platform is fully scallable and flexible.
Thanks to HAProxy, the architecture is very flexible: I could switch my apache + modexurity to nginx + naxsi with no issues at all 🙂 This could be done as well for any third party waf appliances.
Note that I did not try any naxsi advanced features like learning mode and the UI as well.

Related links

Links