Tag Archives: reverse-proxy

HAProxy, Varnish and the single hostname website

As explained in a previous article, HAProxy and Varnish are two great OpenSource software which aim to improve performance, resilience and scalability of web applications.
We saw also that these two softwares are not competitors. Instead of that, they can work properly together, each one bringing the other one its features, making any web infrastructure more agile and robust at the same time.

In the current article, I’m going to explain how to use both of them on a web application hosted on a single domain name.

Main advantages of each soft


As a reminder, here are the main features each product owns.

HAProxy


HAProxy‘s main features:

  • Real load-balancer with smart persistence
  • Request queueing
  • Transparent proxy

Varnish


Varnish‘s main features:

  • Cache server with stale content delivery
  • Content compression
  • Edge Side Includes

Common features


HAProxy and Varnish both have the features below:

  • Content switching
  • URL rewritting
  • DDOS protection

So if we need any of them, we could use either HAProxy or Varnish.

Why a single domain

In web application, there are two types of content: static and dynamic.

By dynamic, I mean content which is generated on the fly and which is dedicated to a single user based on its current browsing on the application. Anything which is not in this category, can be considered as static. Even a page which is generated by PHP and whose content does change every minutes or few seconds (like the CMS WordPress or drupal). I call these pages “pseudo-static

The biggest strength of Varnish is that it can cache static objects, delivering them on behalf of the server, offloading most of the traffic from the server.



An object is identified by a Host header and its URL. When you have a single domain name, you have a single Host header for all your requests: static, pseudo static or dynamic.

You can’t split your traffic: everything requests must arrive on a single type of device: the LB, the cache, etc…

A good practise to split dynamic and static content is to use one domain name per type of object: www.domain.tld for dynamic and static.domain.tld for static content. Doing that you could forward dynamic traffic to the LB and static traffic to the caches directly.



Now, I guess you understand that the web application host naming can have an impact on the platform you’re going to build.

In the current article, I’ll only focus on applications using a single domain name. We’ll see how we can route traffic to the right product despite the limitation of the single domain name.



Don’t worry, I’ll write an other article later about the fun we could have when building a platform for an application hosted on multiple domain names.

Available architectures

Considering I summarize the “web application” as a single brick called “APPSERVER“, we have 2 main architectures available:

  1. CLIENT ==> HAPROXY ==> VARNISH ==> APPSERVER
  2. CLIENT ==> VARNISH ==> HAPROXY ==> APPSERVER

Pro and cons of HAProxy in front of Varnish


Pros:

  • Use HAProxy‘s smart load-balancing algorithm such as uri, url_param to make varnish caching more efficient and improve the hit rate
  • Make the Varnish layer scalable, since load-balanced
  • Protect Varnish ramp up when starting up (related to thread pool creation)
  • HAProxy can protect against DDOS and slowloris
  • Varnish can be used as a WAF

Cons:

  • no easy way do do application layer persistence
  • HAProxy queueing system can hardly protect the application hidden by Varnish
  • The client IP will be mandatory forwwarded on the X-Forwarded-For header (or any header you want)

Pro and cons of Varnish in front of HAProxy


Pros:

  • Smart layer 7 persistence with HAProxy
  • HAProxy layer scalable (with persistence preserved) since load-balanced by Varnish
  • APPSERVER protection through HAProxy request queueing
  • Varnish can be used as a WAF
  • HAProxy can use the client IP address (provided by Varnish in a HTTP header) to do Transparent proying (getting connected on APPSERVER with the client ip)

Cons:

  • HAProxy can’t protect against DDOS, Varnish will do
  • Cache size must be big enough to store all objects
  • Varnish layer not scalable

Finally, which is the best architecture??


No need to choose between both architecture above which one is the less worst for you.

It would be better to build a platform where there are no negative points.

The Architecture


The diagram below shows the architecture we’re going to work on.
haproxy_varnish
Legend:

  • H: HAProxy Load-Balancers (could be ALOHA Load-Balancer or any home made)
  • V: Varnish servers
  • S: Web application servers, whatever the product used here (tomcat, jboss, etc…)…
  • C: Client or end user

Main roles of each layers:

  • HAProxy: Layer 7 traffic routing, first row of protection against DDOS (syn flood, slowloris, etc…), application request flow optimiation
  • Varnish: Caching, compression. Could be used later as a WAF to protect the application
  • Server: hosts the application and the static content
  • Client: browse and use the web application

traffic flow


Basically, the client will send all the requests to HAProxy, then HAProxy, based on URL or file extension will take a routing decision:

  • If the request looks to be for a (pseudo) static object, then forward it to Varnish
    If Varnish misses the object, it will use HAProxy to get the content from the server.
  • Send all the other requests to the appserver. If we’ve done our job properly, there should be only dynamic traffic here.

I don’t want to use Varnish as the default option in the flow, cause a dynamic content could be cached, which could lead to somebody’s personal information sent to everybody

Furthermore, in case of massive misses or purposely built request to bypass the caches, I don’t the servers to be hammered by Varnish, so HAProxy protects them with a tight traffic regulation between Varnish and appservers..

Dynamic traffic flow


The diagram below shows how the request requiring dynamic content should be ideally routed through the platform:
haproxy_varnish_dynamic_flow
Legend:

  1. The client sends its request to HAProxy
  2. HAProxy chooses a server based on cookie persistence or Load-Balancing Algorithm if there is no cookie.
    The server processes the request and send the response back to HAPRoxy which forwards it to the client

Static traffic flow


The diagram below shows how the request requiring static content should be ideally routed through the platform:
haproxy_varnish_static_flow

  1. The client sends its request to HAProxy which sees it asks for a static content
  2. HAProxy forward the request to Varnish. If Varnish has the object in Cache (a HIT), it forwards it directly to HAProxy.
  3. If Varnish doesn’t have the object in cache or if the cache has expired, then Varnish forwards the request to HAProxy
  4. HAProxy randomly chooses a server. The response goes back to the client through Varnish.

In case of a MISS, the flow looks heavy ūüôā I want to do it that way to use the HAProxy traffic regulation features to prevent Varnish to flood the servers. Furthermore, since Varnish sees only static content, its HIT rate is over 98%… So the overhead is very low and the protection is improved.

Pros of such architecture

  • Use smart load-balancing algorithm such as uri, url_param to make varnish caching more efficient and improve the hit rate
  • Make the Varnish layer scalable, since load-balanced
  • Startup protection for Varnish and APPSERVER, allowing server reboot or farm expansion even under heavy load
  • HAProxy can protect against DDOS and slowloris
  • Smart layer 7 persistence with HAProxy
  • APPSERVER protection through HAProxy request queueing
  • HAProxy can use the client IP address to do Transparent proxying (getting connected on APPSERVER with the client ip)
  • Cache farm failure detection and routing to application servers (worst case management)
  • Can load-balance any type of TCP based protocol hosted on APPSERVER

Cons of such architecture


To be totally fair, there are a few “non-blocking” issues:

  • HAProxy layer is hardly scalable (must use 2 crossed Virtual IPs declared in the DNS)
  • Varnish can’t be used as a WAF since it will see only static traffic passing through. This can be updated very easily

Configuration

HAProxy Configuration

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin
  log 10.0.1.10 local3

# default options
defaults
  option http-server-close
  mode http
  log global
  option httplog
  timeout connect 5s
  timeout client 20s
  timeout server 15s
  timeout check 1s
  timeout http-keep-alive 1s
  timeout http-request 10s  # slowloris protection
  default-server inter 3s fall 2 rise 2 slowstart 60s

# HAProxy's stats
listen stats
  bind 10.0.1.3:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy Statistics
  stats auth    admin:admin

# main frontend dedicated to end users
frontend ft_web
  bind 10.0.0.3:80
  acl static_content path_end .jpg .gif .png .css .js .htm .html
  acl pseudo_static path_end .php ! path_beg /dynamic/
  acl image_php path_beg /images.php
  acl varnish_available nbsrv(bk_varnish_uri) ge 1
  # Caches health detection + routing decision
  use_backend bk_varnish_uri if varnish_available static_content
  use_backend bk_varnish_uri if varnish_available pseudo_static
  use_backend bk_varnish_url_param if varnish_available image_php
  # dynamic content or all caches are unavailable
  default_backend bk_appsrv

# appsrv backend for dynamic content
backend bk_appsrv
  balance roundrobin
  # app servers must say if everything is fine on their side
  # and they can process requests
  option httpchk
  option httpchk GET /appcheck
  http-check expect rstring [oO][kK]
  cookie SERVERID insert indirect nocache
  # Transparent proxying using the client IP from the TCP connection
  source 10.0.1.1 usesrc clientip
  server s1 10.0.1.101:80 cookie s1 check maxconn 250
  server s2 10.0.1.102:80 cookie s2 check maxconn 250

# static backend with balance based on the uri, including the query string
# to avoid caching an object on several caches
backend bk_varnish_uri
  balance uri # in latest HAProxy version, one can add 'whole' keyword
  # Varnish must tell it's ready to accept traffic
  option httpchk HEAD /varnishcheck
  http-check expect status 200
  # client IP information
  option forwardfor
  # avoid request redistribution when the number of caches changes (crash or start up)
  hash-type consistent
  server varnish1 10.0.1.201:80 check maxconn 1000
  server varnish2 10.0.1.202:80 check maxconn 1000

# cache backend with balance based on the value of the URL parameter called "id"
# to avoid caching an object on several caches
backend bk_varnish_url_param
  balance url_param id
  # client IP information
  option forwardfor
  # avoid request redistribution when the number of caches changes (crash or start up)
  hash-type consistent
  server varnish1 10.0.1.201:80 maxconn 1000 track bk_varnish_uri/varnish1
  server varnish2 10.0.1.202:80 maxconn 1000 track bk_varnish_uri/varnish2

# frontend used by Varnish servers when updating their cache
frontend ft_web_static
  bind 10.0.1.3:80
  monitor-uri /haproxycheck
  # Tells Varnish to stop asking for static content when servers are dead
  # Varnish would deliver staled content
  monitor fail if nbsrv(bk_appsrv_static) eq 0
  default_backend bk_appsrv_static

# appsrv backend used by Varnish to update their cache
backend bk_appsrv_static
  balance roundrobin
  # anything different than a status code 200 on the URL /staticcheck.txt
  # must be considered as an error
  option httpchk
  option httpchk HEAD /staticcheck.txt
  http-check expect status 200
  # Transparent proxying using the client IP provided by X-Forwarded-For header
  source 10.0.1.1 usesrc hdr_ip(X-Forwarded-For)
  server s1 10.0.1.101:80 check maxconn 50 slowstart 10s
  server s2 10.0.1.102:80 check maxconn 50 slowstart 10s

Varnish Configuration

backend bk_appsrv_static {
        .host = "10.0.1.3";
        .port = "80";
        .connect_timeout = 3s;
        .first_byte_timeout = 10s;
        .between_bytes_timeout = 5s;
        .probe = {
                .url = "/haproxycheck";
                .expected_response = 200;
                .timeout = 1s;
                .interval = 3s;
                .window = 2;
                .threshold = 2;
                .initial = 2;
        }
}

acl purge {
        "localhost";
}

sub vcl_recv {
### Default options

        # Health Checking
        if (req.url == /varnishcheck) {
                error 751 "health check OK!";
        }

        # Set default backend
        set req.backend = bk_appsrv_static;

        # grace period (stale content delivery while revalidating)
        set req.grace = 30s;

        # Purge request
        if (req.request == "PURGE") {
                if (!client.ip ~ purge) {
                        error 405 "Not allowed.";
                }
                return (lookup);
        }

        # Accept-Encoding header clean-up
        if (req.http.Accept-Encoding) {
                # use gzip when possible, otherwise use deflate
                if (req.http.Accept-Encoding ~ "gzip") {
                        set req.http.Accept-Encoding = "gzip";
                } elsif (req.http.Accept-Encoding ~ "deflate") {
                        set req.http.Accept-Encoding = "deflate";
                } else {
                        # unknown algorithm, remove accept-encoding header
                        unset req.http.Accept-Encoding;
                }

                # Microsoft Internet Explorer 6 is well know to be buggy with compression and css / js
                if (req.url ~ ".(css|js)" && req.http.User-Agent ~ "MSIE 6") {
                        remove req.http.Accept-Encoding;
                }
        }

### Per host/application configuration
        # bk_appsrv_static
        # Stale content delivery
        if (req.backend.healthy) {
                set req.grace = 30s;
        } else {
                set req.grace = 1d;
        }

        # Cookie ignored in these static pages
        unset req.http.cookie;

### Common options
         # Static objects are first looked up in the cache
        if (req.url ~ ".(png|gif|jpg|swf|css|js)(?.*|)$") {
                return (lookup);
        }

        # if we arrive here, we look for the object in the cache
        return (lookup);
}

sub vcl_hash {
        hash_data(req.url);
        if (req.http.host) {
                hash_data(req.http.host);
        } else {
                hash_data(server.ip);
        }
        return (hash);
}

sub vcl_hit {
        # Purge
        if (req.request == "PURGE") {
                set obj.ttl = 0s;
                error 200 "Purged.";
        }

        return (deliver);
}

sub vcl_miss {
        # Purge
        if (req.request == "PURGE") {
                error 404 "Not in cache.";
        }

        return (fetch);
}

sub vcl_fetch {
        # Stale content delivery
        set beresp.grace = 1d;

        # Hide Server information
        unset beresp.http.Server;

        # Store compressed objects in memory
        # They would be uncompressed on the fly by Varnish if the client doesn't support compression
        if (beresp.http.content-type ~ "(text|application)") {
                set beresp.do_gzip = true;
        }

        # remove any cookie on static or pseudo-static objects
        unset beresp.http.set-cookie;

        return (deliver);
}

sub vcl_deliver {
        unset resp.http.via;
        unset resp.http.x-varnish;

        # could be useful to know if the object was in cache or not
        if (obj.hits > 0) {
                set resp.http.X-Cache = "HIT";
        } else {
                set resp.http.X-Cache = "MISS";
        }

        return (deliver);
}

sub vcl_error {
        # Health check
        if (obj.status == 751) {
                set obj.status = 200;
                return (deliver);
        }
}
  

Related links

Links

HAProxy and Varnish comparison

In opensource world, there are some very smart products which are very often used to build a high performance, reliable and scalable architecture.
HAProxy and Varnish are both in this category.

Since we can’t really compare a reverse-proxy cache and a reverse-proxy load-balancer, I’m just going to focus in common for both software as well as the advantage of each of them.
The list is not exhaustive, but must only focus on most used / interesting features. So feel free to add a comment if you want me to complete the list.

Common points between HAProxy and Varnish


Before comparing the differences, we can summarize the points in common:

  • reverse-proxy mode
  • advanced HTTP features
  • no SSL offloading
  • client-side HTTP 1.1 with keepalive
  • tunnel mode available
  • high performance
  • basic load-balancing
  • server health checking
  • IPv6 ready
  • Management socket (CLI)
  • Professional services and training available

Features available in HAProxy and not in Varnish


The features below are available in HAProxy, but aren’t in Varnish:

  • advanced load-balancer
  • multiple persistence methods
  • DOS and DDOS mitigation
  • Advanced and custom logging
  • Web interface
  • Server / application protection through queue management, slow start, etc…
  • SNI content switching
  • Named ACLs
  • Full HTTP 1.1 support on server side, but keep-alive
  • Can work at TCP level with any L7 protocol
  • Proxy protocol for both client and server
  • powerful log analyzer tool (halog)
  • <private joke> 2002 website design </private joke>

Features available in Varnish and not in HAProxy


The features below are available in Varnish, but aren’t in HAProxy:

  • caching
  • grace mode (stale content delivery)
  • saint mode (manages origin server errors)
  • modular software (with a lot of modules available)
  • intuitive VCL configuration language
  • HTTP 1.1 on server side
  • TCP connection re-use
  • Edge side includes (ESI)
  • a few command line tools for stats (varnishstat, varnishhist, etc…)
  • powerful live traffic analyzer (varnishlog)
  • <private joke> 2012 website design </private joke>

Conclusion


Even if HAProxy can do TCP proxying, it is often used in front of web application, exactly where we find Varnish :).
They complete very well together: Varnish will make the website faster by offloading static object delivery to itself, while HAProxy can ensure a smooth load-balancing with smart persistence and DDOS mitigation.

Basically, HAProxy and Varnish completes very well, despite being “competitors” on a few features, each on them has its own domain of expertise where it performs very well: HAProxy is a reverse-proxy Load-Balancer and Varnish is a Reverse-proxy cache.

To be honest, when, at HAProxy Technologies, we work on infrastructures where Aloha Load balancer or HAProxy is deployed, we often see Varnish deployed. And if it is not the case, we often recommend the customer to deploy one if we feel it would improve its website performance.
Recently, I had a discussion with Ruben and Kristian when they came to Paris and they told me that they also often see an HAProxy when they work on infrastructure where Varnish is deployed.

So the real question is: Since Varnish and HAProxy are a bit similar but complete so well, how can we use them together???
The response could be very long, so stay tuned, I’ll try to answer this question in an article coming soon.

Related Links

Links

HTTP request flood mitigation

In a recent article, we saw how we can use a load-balancer as a first row of defense against DDOS.

The purpose of the present article to provide a configuration to protect your applications against HTTP request flood.

The configuration below allows only 10 requests per source IP over a period of 10s for the dynamic part of the website.
If the user go above this limit, it gets blacklisted until he stops browsing the website for 10 seconds.
HAProxy would return him a 403 for requests over an established connection and would refuse any new connection from this user.

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

# On Aloha, you don't need to set up the stats page, the GUI already provides
# all the necessary information
listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Use General Purpose Couter (gpc) 0 in SC1 as a global abuse counter
  # Monitors the number of request sent by an IP over a period of 10 seconds
  stick-table type ip size 1m expire 10s store gpc0,http_req_rate(10s)
  tcp-request connection track-sc1 src
  # refuses a new connection from an abuser
  tcp-request content reject if { src_get_gpc0 gt 0 }
  # returns a 403 for requests in an established connection
  http-request deny if { src_get_gpc0 gt 0 }

  # Split static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache

  # If the source IP sent 10 or more http request over the defined period, 
  # flag the IP as abuser on the frontend
  acl abuse src_http_req_rate(ft_web) ge 10
  acl flag_abuser src_inc_gpc0(ft_web) ge 0
  # Returns a 403 to the abuser
  http-request deny if abuse flag_abuser

  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

Links

Efficient SMTP relay infrastructure with Postfix and load-balancers

Scalable architecture

In order to make your architecture scalable, you may often want to use a load-balancer or an application delivery controller.
When using one of them (or a reverse-proxy), the client information is almost all the time hidden. Or if you want to get them, it requires huge modifications in the architecture.

Unfortunately, for troubleshooting and security purpose it would be interesting to know the client information (mainly source IP address)…
That’s where the proxy protocol comes in.

The proxy protocol???

As explained in a previous article“preserve source ip address despite reverse proxies”, the proxy protocol was developped to maintain client information when chaining proxies and reverse-proxies.
Two main advantages when using it:

  • you can provide a downstream proxy or server (aka next hop) the client information (for now, mainly IP and port source)
  • you can use servers in multiple datacenter without a complex network architecture (just need to provide routing for a TCP connection)

Why simply not using TPROXY (transparent proxy) mode?


TPROXY allows a load-balancer or reverse-proxy to open the TCP connection to the server using the client IP address.
One of the drawback of TPROXY is that the default gateway of the application servers must be the load-balancer.
Or you must do policy based routing on your network which could be painfull.

Why Postfix and HAProxy?

HAProxy was the first software to implement the proxy protocol.
Note that you’ll have to use HAProxy 1.5 branch or patched HAProxy 1.4.
< advertisement >
An other solution would to use Aloha load-balancer which does everything for you in a box (from the OS to HAPrxoxy) with all the nice features you could expect. ūüėČ
< /advertisement >

Lately, Postfix implemented it. It is available in Postfix 2.10.
It is the first application server first application server to ship with it: THANKS and CONGRATULATION!!!!
Hopefully other MTAs will implement it soon. It is simple and brings so many improvements to an architecture.

SMTP, spam and securtiy


In SMTP, it is really important to know the client IP, since we use it most of the time through RBL to fight spam.
For security purpose as well: we may want to allow only some hosts to use our SMTP relays and block any other clients.
Without the proxy protocol, the load-balancer will hide the client IP with its own IP. You would have to maintain whitelists into the load-balancer (which is doable). Thanks to proxy protocol, Postscreen would be aware of the client IP, it means you could maintain lists directly into the MTA.

HAProxy and Postfix connection flow

The diagram below shows the protocols and the process in place in this kind of architecture:

           smtp              proxy-protocol
                             + smtp
(INTERNET) ---> 25 (HAPROXY)      --->      srv1:10024 (Postscreen
                                                       / smtpd)
                                  --->      srv2:10024 (Postscreen
                                                       / smtpd)

Note that the default gateway of the MTA servers is not anymore the load-balancer.
Both servers migt be in the same LAN or datacenter. Any type of architecture is now doable.

Configuration

HAProxy

frontend ft_smtp
  bind 0.0.0.0:25
  mode tcp
  no option http-server-close
  timeout client 1m
  log global
  option tcplog
  default_backend bk_postfix

backend bk_postfix
  mode tcp
  no option http-server-close
  log global
  option tcplog
  timeout server 1m
  timeout connect 5s
  server postfix 127.0.0.1:10024 send-proxy

Postfix

Note: I installed postfix in /opt/postfix directory

main.cf

queue_directory = /opt/postfix/var/spool/postfix
command_directory = /opt/postfix/usr/sbin
daemon_directory = /opt/postfix/usr/libexec/postfix
data_directory = /opt/postfix/var/lib/postfix
mail_owner = postfix
unknown_local_recipient_reject_code = 550
inet_interfaces = localhost
sendmail_path = /opt/postfix/usr/sbin/sendmail
newaliases_path = /opt/postfix/usr/bin/newaliases
mailq_path = /opt/postfix/usr/bin/mailq
setgid_group = postdrop
html_directory = no
manpage_directory = /opt/postfix/usr/local/man
sample_directory = /opt/postfix/etc/postfix
readme_directory = no
inet_protocols = ipv4
postscreen_upstream_proxy_protocol = haproxy

master.cf

10024     inet  n       -       n       -       1       postscreen
smtpd     pass  -       -       n       -       -       smtpd

See the results in Postfix logs

No proxy protocol

Jun 30 01:18:14 sd-33932 postfix/postscreen[2375]: 
       CONNECT from [127.0.0.1]:52841 to [127.0.0.1]:10024
Jun 30 01:18:22 sd-33932 postfix/smtpd[2376]: 
       disconnect from localhost[127.0.0.1]

With proxy protocol

Jun 29 09:13:41 sd-33932 postfix/postscreen[30505]: 
       CONNECT from [<client public IP>]:59338 to [<server IP>]:25
Jun 29 09:13:52 sd-33932 postfix/postscreen[30505]: 
       DISCONNECT [<client public IP>]:59338

Related Links

Links

Preserve source IP address despite reverse proxies

What is a Reverse-Proxy?

A Reverse-proxy is a server which get connected on upstream servers on behalf of users.
Basically, it usually maintain two TCP connections: one with the client and one with the upstream server.
The upstream server can be either an application server, a load-balancer or an other proxy/reverse-proxy.

For more details, please consult the page about the proxy mode of the Aloha load balancer.

Why using a reverse-proxy?

It can be used for different purpose:

  • Improve security, performance and scalability
  • Prevent direct access from a user to a server
  • Share a single IP for multiple services

Reverse-proxies are commonly deployed in DMZs to give access to servers located in a more secured area of the infrastructure.
That way, the Reverse-Proxy hides the real servers and can block malicious requests, choose a server based on the protocol or application information (IE: URL, HTTP header, SNI, etc…), etc…
Of course, a Reverse-proxy can act as a load-balancer ūüôā

Drawback when using a reverse-proxy?


The main drawback when using a reverse-proxy is that it will hide the user IP: when acting on behalf of the user, it will use its own IP address to get connected on the server.
There is a workaround: using a transparent proxy, but this usage can hardly pass through firewalls or other reverse-proxies: the default gateway of the server must be the reverse-proxy.

Unfortunately, it is sometimes very useful to know the user IP when the connections comes in to the application server.
It can be mandatory for some applications and it can ease troubleshooting.

Diagram

The Diagram below shows a common usage of a reverse-proxy: it is isolated into a DMZ and handles the users traffic. Then it gets connected to the LAN where an other reverse-proxy act as a load-balancer.

Here is the flow of the requests and responses:

  1. The client gets connected through the firewall to the reverse-proxy in the DMZ and send it its request.
  2. The Reverse-Proxy validates the request, analyzes it to choose the right farm then forward it to the load-balancer in the LAN, through the firewall.
  3. The Load-balancer choose a server in the farm and forward the request to it
  4. The server processes the request then answers to the load-balancer
  5. The load-balancer forward the response to the reverse-proxy
  6. The reverse-proxy forward the response to the client

Bascially, the source IP is modified twice on this kind of architecture: during the steps 2 and 3.
And of course, the more you chain load-balancer and reverse proxies, the more the source IP will be changed.

The Proxy protocol


The proxy protocol was designed by Willy Tarreau, HAProxy developper.
It is used between proxies (hence its name) or between a proxy and a server which could understand it.

The main purpose of the proxy protocol is to forward information from the client connection to the next host.
These information are:

  • L4 and L3 protocol information
  • L3 source and destination addresses
  • L4 source and destination ports

That way, the proxy (or server) receiving these information can use them exactly as if the client was connected to it.

Basically, it’s a string the first proxy would send to the next one when connecting to it.
In example, the proxy protocol applied to an HTTP request:

PROXY TCP4 192.168.0.1 192.168.0.11 56324 80rn
GET / HTTP/1.1rn
Host: 192.168.0.11rn
rn

Note: no need to change anything in the architecture, since the proxy protocol is just a string sent over the TCP connection used by the sender to get connected on the service hosted by the receiver.

Now, I guess you understand how we can take advantage of this protocol to pass through the firewall, preserving the client IP address between the reverse proxy (which is able to generate the proxy protocol string) and the load-balancer (which is able to analyze the proxy protocol string).

Stud and stunnel are two SSL reverse proxy software which can send proxy protocol information.

Configuration

Between the Reverse-Proxy and the Load-Balancer

Since both devices must understand the proxy protocol, we’re going to consider both the LB and the RP are Aloha Load-Balancers.

Configuring the Reverse-proxy to send proxy protocol information


In the reverse proxy configuration, just add the keyword “send-proxy” on the server description line.
In example:

server srv1 192.168.10.1 check send-proxy

Configuring the Load-balancer to receive proxy protocol information

In the load-balancer configuration, just add the keyword “accept-proxy” on the bind description line.
In example:

bind 192.168.1.1:80 accept-proxy

What’s happening?


The reverse proxy will open a connection to the address binded by the load-balancer (192.168.1.1:80). This does not change from a regular connection flow.
Once the TCP connection is established, then the reverse proxy would send a string with client connection information.
The load-balancer can now use the client information provided through the proxy protocol exactly as if the connection was opened directly by the client itself. For example, we can match the client IP address in ACLs for white/black listing, stick-tables, etc…
This would also make the “balance source” algorithm much more efficient ūüėČ

Between the Load-Balancer and the server

Here we are!!!!
Since the LB knows the client IP, we can use it to get connected on the server. Yes, this is some kind of spoofing :).

In your HAProxy configuration, just use the source parameter in your backend:

backend bk_app
[...]
  source 0.0.0.0 usesrc clientip
  server srv1 192.168.11.1 check

What’s happening?


The Load-balancer uses the client IP information provided by the proxy protocol to get connected to the server. (the server sees the client ip as source ip)
Since the server will receive a connection from the client IP address, it will try to reach it through its default gateway.
In order to process the reverse-NAT, the traffic must pass through the load-balancer, that’s why the server’s default gateway must be the load-balancer.
This is the only architecture change.

Compatibility


The kind of architecture in the diagram will only work with Aloha Load-balancers or HAProxy.

Links

Aloha Load-balancer as a reverse proxy

Synopsis

You own small public subnet and want to be able to access multiple web sites or application behind a single public IP address.
Basically, you want to use your Aloha load-balancer as a reverse proxy.

Diagram

The diagram below shows how the reverse proxy works.
In our case, we have 2 domains pointing to the Aloha IP address.
Depending on the domain name, the Aloha will decide which farm it will use.
reverse_proxy

Configuration

On the Aloha, the reverse-proxy configuration is achieved by HAProxy.
HAProxy configuration can be done in the “layer 7” tab of the GUI or through the CLI command “service haproxy edit”.

First, the Frontend definition.
This is where HAProxy will take rooting decision based on layer 7 information.

frontend ft_websites
   mode http
   bind 0.0.0.0:80
   log global
   option httplog
# Capture Host header is important to know whether rules matches or not
   capture request header host len 64
# mysite configuration
   acl site1 hdr_sub(host) site1.com
   acl site1 hdr_sub(host) site1.eu
   use_backend bk_site1 if site1
# yoursite configuration
   acl site2 hdr_sub(host) site2.com
   acl site2 hdr_sub(host) site2.ie
   use_backend bk_site2 if site2
# default configuration
   default_backend bk_default

And now, we can define our backend sections for each website or application:

# First site backend configuration
backend bk_site1
   mode http
   balance roundrobin
   cookie SERVERID insert indirect nocache    # persistence cookie
   option forwardfor # add X-Forwarded-For
   option httpchk HEAD / HTTP/1.0rnHost: www.site1.com
   default-server inter 3s rise 2 fall 3 slowstart 0 # servers default parameters
   server srv1 192.168.10.11:80 cookie s1 weight 10 maxconn 1000 check
   server srv2 192.168.10.12:80 cookie s2 weight 10 maxconn 1000 check

# Second site backend configuration
backend bk_site2
   mode http
   balance roundrobin
   cookie SERVERID insert indirect nocache    # persistence cookie
   option forwardfor # add X-Forwarded-For
   option httpchk HEAD / HTTP/1.0rnHost: www.site2.com
   default-server inter 3s rise 2 fall 3 slowstart 0 # servers default parameters
   server srv1 192.168.10.13:80 cookie s1 weight 10 maxconn 1000 check
   server srv2 192.168.10.14:80 cookie s2 weight 10 maxconn 1000 check

And finally, the “garbage collector”, the default backend which hosts all the traffic that has not match any other rules.
It may be important to watch logs from this backend in order to ensure there is no mis-configuration.

backend bk_default
   mode http
   balance roundrobin
   option forwardfor # add X-Forwarded-For
   option httpchk HEAD /
   default-server inter 3s rise 2 fall 3 slowstart 0 # servers default parameters
   server srv1 192.168.10.8:80 weight 10 maxconn 1000 check
   server srv2 192.168.10.9:80 weight 10 maxconn 1000 check

Links