Howto write apache ProxyPass* rules in HAProxy

Apache mod_proxy

Apache webserver is a widely deployed modular web server.
One of its module is called mod_proxy. It aims to turn the web server into a proxy / reverse proxy server with load-balancing capabilities.

At HAProxy Technologies, we only use HAProxy :). Heh, what else ???
And during some deployments, customers ask us to migrate Apache mod_proxy configuration into HAProxy.
Present article explains how to translate ProxyPass related rules.

ProxyPass, ProxyPassReverse, etc…

Apache mod_proxy defines a few directives which let it forward traffic to a remote server.
They are listed below with a short description.

Note: I know the directives introduced in this article could be used in much complicated ways.

ProxyPass

ProxyPass maps local server URLs to remote servers + URL.
It applies on traffic from client to server.

For example:

ProxyPass /mirror/foo/ http://backend.example.com/

This makes the external URL http://example.com/mirror/foo/bar to be translated and forwarded to a remote server this way: http://backend.example.com/bar

This directive makes apache to update URL and headers to match both external traffic to internal needs.

ProxyPassReverse

ProxyPassReverse Adjusts the URL in HTTP response headers sent from a reverse proxied server. It only updates Location, Content-Location and URL.
It applies to traffic from server to client.

For example:

ProxyPassReverse /mirror/foo/ http://backend.example.com/

This directive makes apache to adapt responses generated by servers following internal urls to match external urls.

ProxyPassReverseCookieDomain

ProxyPassReverseCookieDomain adjusts the Set-Cookie header sent by the server to match external domain name.

It’s usage is pretty simple. For example:

ProxyPassReverseCookieDomain internal-domain public-domain

ProxyPassReverseCookiePath

ProxyPassReverseCookiePath adjusts the Set-Cookie header sent by the server to match external path.

It’s usage is pretty simple. For example:

ProxyPassReverseCookiePath internal-path public-path

Configure ProxyPass and ProxyPassReverse in HAProxy

Bellow, an example HAProxy configuration to make HAProxy work the same way as apache ProxyPass and ProxyPassReverse configuration. It should be added in the backend section while the frontend ensure that only traffic matching this external URL would be redirected to that backend.

frontend ft_global
 acl host_dom.com    req.hdr(Host) dom.com
 acl path_mirror_foo path -m beg   /mirror/foo/
 use_backend bk_myapp if host_dom.com path_mirror_foo

backend bk_myapp
[...]
# external URL                  => internal URL
# http://dom.com/mirror/foo/bar => http://bk.dom.com/bar

 # ProxyPass /mirror/foo/ http://bk.dom.com/bar
 http-request set-header Host bk.dom.com
 reqirep  ^([^ :]*)\ /mirror/foo/(.*)     \1\ /\2

 # ProxyPassReverse /mirror/foo/ http://bk.dom.com/bar
 # Note: we turn the urls into absolute in the mean time
 acl hdr_location res.hdr(Location) -m found
 rspirep ^Location:\ (https?://bk.dom.com(:[0-9]+)?)?(/.*) Location:\ /mirror/foo3 if hdr_location

 # ProxyPassReverseCookieDomain bk.dom.com dom.com
 acl hdr_set_cookie_dom res.hdr(Set-cookie) -m sub Domain= bk.dom.com
 rspirep ^(Set-Cookie:.*)\ Domain=bk.dom.com(.*) \1\ Domain=dom.com\2 if hdr_set_cookie_dom

 # ProxyPassReverseCookieDomain / /mirror/foo/
 acl hdr_set_cookie_path res.hdr(Set-cookie) -m sub Path= 
 rspirep ^(Set-Cookie:.*)\ Path=(.*) \1\ Path=/mirror/foo2 if hdr_set_cookie_path

Notes:
  * http to https redirect rules should be handled by HAProxy itself and not by the application server (to avoid some redirect loops)

Links

Asymmetric routing, multiple default gateways on Linux with HAProxy

Why we may need multiple default gateways?

Nowadays, Application Delivery controllers (aka Load-Balancers) become the entry point for all the applications hosted in a company or administration.
That said, many different type of population could access the applications:
  * internal users from the LAN
  * partners through MPLS or VPNs
  * external users from internet

On the other side, applications could be hosted on different VLANs in the architecture:
  * internal LAN
  * external DMZ

The diagram below shows the “big picture” of this type of architecture:
multiple_default_gateways

Routing in the Linux network stack

I’m not going to deeply explain how it works, sorry… It would deserve a complete blog post :)
That said, any device connected on an IP network needs an IP address to be able to talk to other devices in its LAN. It also needs a default gateway to be able to reach devices which are located outside its LAN.
A Linux kernel can use a single default gateway at a time, but thanks to the metric you can configure many default gateways.
When needed, the Linux Kernel will parse the default gateway table and will use the one with the lowest metric. By default, when no metric is configured, the kernel attributes a metric 0.
Each metric must be unique in your Kernel IP stack.

How HAProxy can help in such situation??


Users access applications through a HAProxy bind. The bind can be hosted on any IP address available or not (play with your sysctl for this purpose) on the server.
By default, the traffic comes in HAProxy through this bind and HAProxy let the kernel choose the most appropriate default gateway to forward the answer to the client. As we’ve seen above, the most appropriate default gateway from the kernel point of view is the one with the lowest metric usually 0.

That said, HAProxy is smart enough to tell the kernel which network interface to use to forward the response to the client. Just add the statement interface ethX (where X is the id of the interface you want to use) on HAProxy bind line.
With this parameter, HAProxy can force the kernel to use the default gateway associated to the network interface ethX if it exists, otherwise, the interface with the lowest metric will be used.

Security concern


From a security point of view, some security manager would say that it is absolutely unsecure to plug a device in multiple DMZ or VLANs. They are right. But usually, this type of company’s business is very important and they can affoard one load-balancer per DMZ or LAN.
That said, there is no security breach with the setup introduced here. HAProxy is a reverse-proxy and so you don’t need to allow ip_forward between all interfaces for this solution to work.
I mean that nobody could use the loadbalancer as a default gateway to reach an other subnet bypassing the firewall…
Then only traffic allowed to pass through is the one load-balanced!

Configuration

The configuration below applies to the ALOHA Loadbalancer. Just update the content to match your Linux distribution configuration syntax.
The configuration is also related to the diagram above.

Network configuration


In your ALOHA, go in the Services tab, then edit the Network configuration.
To keep it simple, I’m not going to add any VRRP configuration.

service network eth0
    ########## eth0.
    auto on
    ip   address 10.0.0.2/24
    ip   route   default 10.0.0.1

service network eth1
    ########## eth1.
    auto on
    ip   address 10.0.1.2/24
    ip   route   default 10.0.1.1 metric 1

service network eth2
    ########## eth2.
    auto on
    ip   address 10.0.2.2/24
    ip   route   default 10.0.2.1 metric 2

service network eth3
    ########## eth3.
    auto on
    ip   address 10.0.3.2/24
    ip   route   default 10.0.3.1 metric 3

service network eth4
    ########## eth4.
    auto on
    ip   address 10.0.4.2/24
    ip   route   default 10.0.4.1 metric 4

The routing table from the ALOHA looks like:

default via 10.0.0.1 dev eth0
default via 10.0.1.1 dev eth1  metric 1
default via 10.0.2.1 dev eth2  metric 2
default via 10.0.3.1 dev eth3  metric 3
default via 10.0.4.1 dev eth4  metric 4

HAProxy configuration for Corporate website or ADFS proxies


These services are used by internet users only.

frontend ft_www
 bind 10.0.0.2:80
[...]

no need to specify any interface here, since the traffic comes from internet, HAProxy can let the kernel to use the default gateway which points in that direction (here eth0).

HAProxy configuration for Exchange 2010 or 2013


This service is used by both internal and internet users.

frontend ft_exchange
 bind 10.0.0.3:443
 bind 10.0.2.3:443 interface eth2
[...]

The responses to internet users will go through eth0 while the one for internal LAN users will use the default gateway configured on eth2 10.0.2.1.

HAProxy configuration for Sharepoint 2010 or 2013


This service is used by MPLS/VPN users and internal users.

frontend ft_exchange
 bind 10.0.1.4:443 interface eth1
 bind 10.0.2.4:443 interface eth2
[...]

The responses to MPLS/VPN users will go through eth1 default gateway 10.0.1.1 while the one for internal LAN users will use the default gateway configured on eth2 10.0.2.1.

Links

How to protect application cookies while offloading SSL

SSL offloading

SSL offloading or acceleration is often seen as a huge benefit for applications. People usually forget that it may have impacts on the application itself. Some times ago, I wrote a blog article which lists these impacts and propose some solutions, using HAProxy.

One thing I forgot to mention at that time was Cookies.
You don’t want your clients to send their cookies (understand their identity) in clear through the Internet.
This is today’s article purpose.

Actually, there is a cookie attribute called Secure which can be emit by a server. When this attribute is set, the client SHOULD not send the cookie over a clear HTTP connection.

SSL offloading Diagram


Simple SSL offloading diagram:

|--------|              |---------|           |--------|
| client |  ==HTTPS==>  | HAProxy | --HTTP--> | Server |
|--------|              |---------|           |--------|

The client uses HTTPs to get connected on HAProxy, HAProxy gets connected to the application server through HTTP.

Even if HAProxy can forward client connection mode information to the application server, the application server may not protect its cookie…
Fortunately, we can use HAProxy for this purpose.

Howto make HAProxy to protect application cookie when SSL offloading is enabled

That’s the question.

The response is as simple as the configuration below:

acl https          ssl_fc
acl secured_cookie res.hdr(Set-Cookie),lower -m sub secure
rspirep ^(set-cookie:.*) \1;\ Secure if https !secured_cookie

The configuration above sets up the Secure attribute if it has not been setup by the application server while the client was browsing the application over a ciphered connection.

Related Links

Links

Emulating Active/passing application clustering with HAProxy

Synopsis

HAProxy is a Load-Balancer, this is a fact. It is used to route traffic to servers to primarily ensure applications reliability.

Most of the time, the sessions are locally stored in a server. Which means that if you want to split client traffic on multiple servers, you have to ensure each user can be redirected to the server which manages his session (if the server is available, of course). HAProxy can do this in many ways: we call it persistence.
Thanks to persistence, we usually says that any application can be load-balanced… Which is true in 99% of the cases. In very rare cases, the application can’t be load-balanced. I mean that there might be a lock somewhere in the code or for some other good reasons…

In such case, to ensure high-availability, we build “active/passive” clusters, where a node can be active at a time.
HAProxy can be use in different ways to emulate an active/passive clustering mode, and this is the purpose of today’s article.

Bear in mind that by “active/passive”, I mean that 100% of the users must be forwarded to the same server. And if a fail over occurs, they must follow it in the mean time!

Diagram

Let’s use one HAProxy with a couple of servers, s1 and s2.
When starting up, s1 is master and s2 is used as backup:

  -------------
  |  HAProxy  |
  -------------
   |         `
   |active    ` backup
   |           `
 ------       ------
 | s1 |       | s2 |
 ------       ------

Configuration

Automatic failover and failback

The configuration below makes HAProxy to use s1 when available, otherwise fail over to s2 if available:

defaults
 mode http
 option http-server-close
 timeout client 20s
 timeout server 20s
 timeout connect 4s

frontend ft_app
 bind 10.0.0.100:80 name app
 default_backend bk_app

backend bk_app
 server s1 10.0.0.1:80 check
 server s2 10.0.0.2:80 check backup

The most important keyword above is “backup” on s2 configuration line.
Unfortunately, as soon as s1 comes back, then all the traffic will fail back to it again, which can be acceptable for web applications, but not for active/passive

Automatic failover without failback

The configuration below makes HAProxy to use s1 when available, otherwise fail over to s2 if available.
When a failover has occured, no failback will be processed automatically, thanks to the stick table:

peers LB
 peer LB1 10.0.0.98:1234
 peer LB2 10.0.0.99:1234

defaults
 mode http
 option http-server-close
 timeout client 20s
 timeout server 20s
 timeout connect 4s

frontend ft_app
 bind 10.0.0.100:80 name app
 default_backend bk_app

backend bk_app
 stick-table type ip size 1 nopurge peers LB
 stick on dst
 server s1 10.0.0.1:80 check
 server s2 10.0.0.2:80 check backup

The stick table will maintain persistence based on destination IP address (10.0.0.100 in this case):

show table bk_app
# table: bk_app, type: ip, size:20480, used:1
0x869154: key=10.0.0.100 use=0 exp=0 server_id=1

With such configuration, you can trigger a fail back by disabling s2 during a few second period.

Links

HAProxy advanced Redis health check

Introduction

Redis is an opensource nosql database working on a key/value model.
One interesting feature in Redis is that it is able to write data to disk as well as a master can synchronize many slaves.

HAProxy can load-balance Redis servers with no issues at all.
There is even a built-in health check for redis in HAProxy.
Unfortunately, there was no easy way for HAProxy to detect the status of a redis server: master or slave node. Hence people usually hacks this part of the architecture.

As written in the title of this post, we’ll learn today how to make a simple Redis infrastructure thanks to newest HAProxy advanced send/expect health checks.
This feature is available in HAProxy 1.5-dev20 and above.

Purpose is to make the redis infrastructure as simple as possible and ease fail over for the web servers. HAProxy will have to detect which node is MASTER and route all the connection to it.

Redis high availability diagram with HAProxy

Below, an ascii art diagram of HAProxy load-balancing Redis servers:

+----+ +----+ +----+ +----+
| W1 | | W2 | | W3 | | W4 |   Web application servers
+----+ +----+ +----+ +----+
          |   |     /
          |   |    /
          |   |   /
        +---------+
        | HAProxy |
        +---------+
           /   
       +----+ +----+
       | R1 | | R2 |           Redis servers
       +----+ +----+

The scenario is simple:
  * 4 web application servers need to store and retrieve data to/from a Redis database
  * one (better using 2) HAProxy servers which load-balance redis connections
  * 2 (at least) redis servers in an active/standby mode with replication

Configuration

Below, is the HAProxy configuration for the

defaults REDIS
 mode tcp
 timeout connect  4s
 timeout server  30s
 timeout client  30s

frontend ft_redis
 bind 10.0.0.1:6379 name redis
 default_backend bk_redis

backend bk_redis
 option tcp-check
 tcp-check send PINGrn
 tcp-check expect string +PONG
 tcp-check send info replicationrn
 tcp-check expect string role:master
 tcp-check send QUITrn
 tcp-check expect string +OK
 server R1 10.0.0.11:6379 check inter 1s
 server R2 10.0.0.12:6379 check inter 1s

The HAProxy health check sequence above allows to consider the Redis master server as UP in the farm and redirect connections to it.
When the Redis master server fails, the remaining nodes elect a new one. HAProxy will detect it thanks to its health check sequence.

It does not require third party tools and make fail over transparent.

Links

failover and worst case management with HAProxy

Synopsis

One of HAProxy strength is that it is flexible and allows to redirect traffic based on events and internal status.
In the current article, I’ll show how HAProxy can be useful to handle traffic when worst cases happen.

By worst case I mean the moment when something went wrong in your architecture and your application because partially or totally unavailable.

Cases

Backup server

When all servers in a farm are down, we want to redirect traffic to a backup server which delivers either sorry pages or a degraded mode of the application.
This can be done easily in HAProxy by adding the keyword backup on the server line. If multiple backup servers are configured, only the first active one is used.

Below, the HAProxy configuration corresponding to this case:

frontent ft_app
 bind 10.0.0.1:80
 default_backend bk_app_main

backend bk_app_main
 server s1 10.0.0.101:80 check
 server s2 10.0.0.102:80 check
 server s3 10.0.0.103:80 check backup
 server s4 10.0.0.104:80 check backup

In this case, s3 will be used first, until it fails, then s4 will be used.

Multiple backup servers

In some cases, when the farm takes a huge traffic, we may want to use many backup servers at a time. This can be achieved by enabling the option allbackups in HAProxy configuration.

Below, the HAProxy configuration corresponding to this case:

frontent ft_app
 bind 10.0.0.1:80
 default_backend bk_app_main

backend bk_app_main
 option allbackups
 server s1 10.0.0.101:80 check
 server s2 10.0.0.102:80 check
 server s3 10.0.0.103:80 check backup
 server s4 10.0.0.104:80 check backup

In this case, both s3 and s4 will be used if they are available.

Farm failover

Despite the case above improves a bit our failover scenario, it has some weaknesses. For example we must wait until all the production servers are DOWN before using the backup servers.
HAProxy can failover traffic to a backup farm when the main one has not enough capacity or, worst case, no capacity anymore.

Below, the HAProxy configuration corresponding to this case:

frontent ft_app
 bind 10.0.0.1:80

# detect capacity issues in production farm
 acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2
# failover traffic to backup farm
 use_backend bk_app_backup if MAIN_not_enough_capacity

 default_backend bk_app_main

backend bk_app_main
 server s11 10.0.0.101:80 check
 server s12 10.0.0.102:80 check
 server s13 10.0.0.103:80 check
 server s14 10.0.0.104:80 check

backend bk_app_backup
 server s21 20.0.0.101:80 check
 server s22 20.0.0.102:80 check

Farm failover with backup servers

Of course, we could combine all the options above.
First we want to failover to a backup farm if the production one has not enough capacity, second, we want to use 2 backup servers when all the production servers from the backup farm are DOWN.

Below, the HAProxy configuration corresponding to this case:

frontent ft_app
 bind 10.0.0.1:80

# detect capacity issues in production farm
 acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2
# failover traffic to backup farm
 use_backend bk_app_backup if MAIN_not_enough_capacity

 default_backend bk_app_main

backend bk_app_main
 server s11 10.0.0.101:80 check
 server s12 10.0.0.102:80 check
 server s13 10.0.0.103:80 check
 server s14 10.0.0.104:80 check

backend bk_app_backup
 option allbackups
 server s21 20.0.0.101:80 check
 server s22 20.0.0.102:80 check
 server s23 20.0.0.103:80 check backup
 server s24 20.0.0.104:80 check backup

Worst case: no servers available anymore

Well, imagine you plugged all your servers on a single switch, HAProxy box has 2 interfaces, one on the public switch, on on the server switch. Of course, this is not how you plugged your servers, don’t you?
Imagine the server switch fails, then no servers are available anymore. Then HAProxy can be used to deliver sorry pages for you.

Below, the HAProxy configuration corresponding to this case:

frontent ft_app
 bind 10.0.0.1:80

# sorry page to return when worst case happens
 errorfile 503 /etc/haproxy/errorfiles/sorry.http

# detect capacity issues in production farm
 acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2
# failover traffic to backup farm
 use_backend bk_app_backup if MAIN_not_enough_capacity

 default_backend bk_app_main

backend bk_app_main
 server s11 10.0.0.101:80 check
 server s12 10.0.0.102:80 check
 server s13 10.0.0.103:80 check
 server s14 10.0.0.104:80 check

backend bk_app_backup
 option allbackups
 server s21 20.0.0.101:80 check
 server s22 20.0.0.102:80 check
 server s23 20.0.0.103:80 check backup
 server s24 20.0.0.104:80 check backup

And below, the content of the sorry.http page:

HTTP/1.0 200 OK
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html>
<body>
<h1>Sorry page</h1>
Sorry, we're under maintenance
</body>
</html>

Important notes


Health checking


Health checking must be enabled on the servers. Without health checking, HAProxy can’t know the server status and then can’t decide to failover traffic.

Persistence


If a persistence information points to one backup server, then HAProxy will keep on using it, even if production servers are available.

Links

Configuring HAProxy and Nginx for SPDY

Introduction to SPDY / HTTP-bis

SPDY is a protocol designed by google which aims to fix HTTP/1.1 protocol weaknesses and to adapt this 14 years old protocol to today’s internet devices and requirements.
Back in 1999, when HTTP/1.1 was designed, there was no mobile devices, the web pages were composed by HTML with a few images, almost no javascript and no CSS. The ISP delivered internet over very slow connections.
HTTP/2.0 has to address today’s and tomorrow’s need when much more devices can browse the internet from very different type of connections (very slow with high packet loss or very fast ones) and more and more people want multimedia content and interactivity. Websites became “web applications”.

SPDY is not HTTP 2.0! But HTTP 2.0 will use SPDY as basement

Note that as long as HTTP/2.0 has not been released officially, ALL articles written on SPDY, NPN, ALPN, etc may be outdated at some point.
Sadly, this is true for the present article, BUT I’ll try to keep it up to date :)
As an example, NPN is a TLS extension that has been designed to allow a client and a server to negotiate the protocol which will be used at the above network layer (in our case this is application layer).
Well, this NPN TLS extension is going to be outdated soon by an official RFC called “Transport Layer Security (TLS) Application Layer Protocol Negotiation Extension” (shortened to ALPN).

Also, we’re already at the third version of SPDY protocol (3.1 to be accurate). It changes quite often.

HAProxy and SPDY

As Willy explained on HAProxy‘s mailing list, HAProxy won’t implement SPDY.
But, as soon as HTTP/2.0 will be released officially, then the development will focus on it.
Main driver for this is the problem seen in introduction: waste of time because the drafts are updated quite often.

Saying that, HAProxy can be used to:
* load-balance SPDY web servers
* detect SPDY protocol through NPN (or more recent ALPN) TLS extension

Diagram

The picture below shows how HAProxy can be used to detect and split HTTP and SPDY traffic over an HTTPs connection:
spdy

Basically, HAProxy uses the NPN (and later the ALPN) TLS extension to figure out whether the client can browse the website using SPDY. If yes, the connection is forwarded to the SPDY farm (here hosted on nginx), otherwise, the connection is forwarded to the HTTP server farm (here hosted on nginx too).

Configuration

HAProxy configuration example for NPN and SPDY


Below, the HAProxy configuration to detect and split HTTP and SPDY traffic to two different farms:

defaults
 mode tcp
 log global
 option tcplog
 timeout connect           4s
 timeout server          300s
 timeout client          300s

frontend ft_spdy
 bind 64.210.140.163:443 name https ssl crt STAR_haproxylab_net.pem npn spdy/2

# tcp log format + SSL information (TLS version, cipher in use, SNI, NPN)
 log-format %ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq {%sslv/%sslc/%[ssl_fc_sni]/%[ssl_fc_npn]}

# acls: npn
 acl npn_spdy           ssl_fc_npn -i spdy/2

# spdy redirection
 use_backend bk_spdy      if npn_spdy

 default_backend bk_http

backend bk_spdy
 option httpchk HEAD /healthcheck
 server nginx 127.0.0.1:8082 maxconn 100 check port 8081

backend bk_http
 option httpchk HEAD /healthcheck
 http-request set-header Spdy no
 server nginx 127.0.0.1:8081 maxconn 100 check

NGINX configuration example for SPDY


And the corresponding nginx configuration:
Note that I use a LUA script to check if the “Spdy: no” HTTP header is present.
  * if present: then the connection was made over HTTP in HAProxy
  * if not present, then the connection was made over SPDY

   server {
      listen 127.0.0.1:8081;
      listen 127.0.0.1:8082 spdy;
      root /var/www/htdocs/spdy/;

        location = /healthcheck {
          access_log off;
          return 200;
        }

        location / {
             default_type 'text/plain';
             content_by_lua '
               local c = ngx.req.get_headers()[&quot;Spdy&quot;];
               if c then
                 ngx.say(&quot;Currently browsing over HTTP&quot;)
               else
                 ngx.say(&quot;Currently browsing over spdy&quot;)
               end
                return
             ';
        }

   }

Testing


This setup is in production.
Simply browse https://spdy.haproxylab.net/ to give it a try.

Related links

  • SPDY whitepaper: http://www.chromium.org/spdy/spdy-whitepaper
  • SPDY (official ???) page: http://www.chromium.org/spdy
  • IETF Transport Layer Security (TLS) Application Layer Protocol Negotiation Extension: http://tools.ietf.org/html/draft-friedl-tls-applayerprotoneg-02
  • IETF HTTP/1.1 RFC: http://www.ietf.org/rfc/rfc2616.txt

Links