Tag Archives: stud

SSL Client certificate information in HTTP headers and logs

HAProxy and SSL

HAProxy has many nice features when speaking about SSL, despite SSL has been introduced in it lately.

One of those features is the client side certificate management, which has already been discussed on the blog.
One thing was missing in the article, since HAProxy did not have the feature when I first write the article. It is the capability of inserting client certificate information in HTTP headers and reporting them as well in the log line.

Fortunately, the devs at HAProxy Technologies keep on improving HAProxy and it is now available (well, for some time now, but I did not have any time to write the article yet).

OpenSSL commands to generate SSL certificates

Well, just take the script from HAProxy Technologies github, follow the instruction and you’ll have an environment setup in a very few seconds.
Here is the script: https://github.com/exceliance/haproxy/tree/master/blog/ssl_client_certificate_management_at_application_level

Configuration

The configuration below shows a frontend and a backend with SSL offloading and with insertion of client certificate information into HTTP headers. As you can see, this is pretty straight forward.

frontend ft_www
 bind 127.0.0.1:8080 name http
 bind 127.0.0.1:8081 name https ssl crt ./server.pem ca-file ./ca.crt verify required
 log-format %ci:%cp [%t] %ft %b/%s %Tq/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs {%[ssl_c_verify],%{+Q}[ssl_c_s_dn],%{+Q}[ssl_c_i_dn]} %{+Q}r
 http-request set-header X-SSL                  %[ssl_fc]
 http-request set-header X-SSL-Client-Verify    %[ssl_c_verify]
 http-request set-header X-SSL-Client-DN        %{+Q}[ssl_c_s_dn]
 http-request set-header X-SSL-Client-CN        %{+Q}[ssl_c_s_dn(cn)]
 http-request set-header X-SSL-Issuer           %{+Q}[ssl_c_i_dn]
 http-request set-header X-SSL-Client-NotBefore %{+Q}[ssl_c_notbefore]
 http-request set-header X-SSL-Client-NotAfter  %{+Q}[ssl_c_notafter]
 default_backend bk_www

backend bk_www
 cookie SRVID insert nocache
 server server1 127.0.0.1:8088 maxconn 1

To observe the result, I just fake a server using netcat and observe the headers sent by HAProxy:

X-SSL: 1
X-SSL-Client-Verify: 0
X-SSL-Client-DN: "/C=FR/ST=Ile de France/L=Jouy en Josas/O=haproxy.com/CN=client1/emailAddress=ba@haproxy.com"
X-SSL-Client-CN: "client1"
X-SSL-Issuer: "/C=FR/ST=Ile de France/L=Jouy en Josas/O=haproxy.com/CN=haproxy.com/emailAddress=ba@haproxy.com"
X-SSL-Client-NotBefore: "130613144555Z"
X-SSL-Client-NotAfter: "140613144555Z"

And the associated log line which has been generated:

Jun 13 18:09:49 localhost haproxy[32385]: 127.0.0.1:38849 [13/Jun/2013:18:09:45.277] ft_www~ bk_www/server1 
1643/0/1/-1/4645 504 194 - - sHNN 0/0/0/0/0 0/0 
{0,"/C=FR/ST=Ile de France/L=Jouy en Josas/O=haproxy.com/CN=client1/emailAddress=ba@haproxy.com",
"/C=FR/ST=Ile de France/L=Jouy en Josas/O=haproxy.com/CN=haproxy.com/emailAddress=ba@haproxy.com"} "GET /" 

NOTE: I have inserted a few CRLF to make it easily readable.

Now, my HAProxy can deliver the following information to my web server:
  * ssl_fc: did the client used a secured connection (1) or not (0)
  * ssl_c_verify: the status code of the TLS/SSL client connection
  * ssl_c_s_dn: returns the full Distinguished Name of the certificate presented by the client
  * ssl_c_s_dn(cn): same as above, but extracts only the Common Name
  * ssl_c_i_dn: full distinguished name of the issuer of the certificate presented by the client
  * ssl_c_notbefore: start date presented by the client as a formatted string YYMMDDhhmmss
  * ssl_c_notafter: end date presented by the client as a formatted string YYMMDDhhmmss

Related Links

Links

How to get SSL with HAProxy getting rid of stunnel, stud, nginx or pound

Update: HAProxy can now handle SSL client certificate: SSL Client certificate management at application level

History

HAProxy is well know for its performance as a reverse-proxy and load-balancer and is widely deployed on web platforms where performance matters. It is sometimes even used to replace hardware load-balancers such as F5 appliances.
When the platform requires SSL, it is common to use nginx, Pound or http://www.stunnel.org/index.html. Recently, stud came in the dance with a major advantage over other software: support for HAProxy’s proxy protocol.

At HAProxy Technologies, we build our ALOHA load-balancers using HAProxy and we use stunnel as the SSL offloading software. Our clients wanted some new features on our SSL implementation we could not provide through stunnel.
By the way, you can find our stunnel patches here: http://www.haproxy.com/download/free/patches/stunnel/
An other important thing to notice, is that stunnel does not scale very well: when managing a lot en encrypted connections, stud or nginx are far much better.
That’s why we decided to implement SSL directly in HAProxy. For now, it is still quite basic: SSL offloading with SNI support and wildcard certificates, ability to encrypt traffic to servers.
But at least, the performance are here!
We’ll keep on improving it later with new features, IE: client certificate management and some fun stuff with ACLs: stay tuned!

Remember that the job was done by HAProxy Technologies engineers.

Note that if you’re using the softwares listed above for other purpose than SSL, then you may still use them. For example, nginx performs very well on static content and on dynamic using php-fpm.

SSL offloading diagram

This is pretty simple, as shown on the picture below. The client will get connected on HAProxy using SSL, HAProxy will process SSL and get connected in clear to the server:
ssl offloading diagram

HAproxy installation

cd /usr/src
wget http://haproxy.1wt.eu/download/1.5/src/devel/haproxy-1.5-dev12.tar.gz
tar xzf haproxy-1.5-dev12.tar.gz
cd haproxy-1.5-dev12/
make TARGET=linux2628 USE_STATIC_PCRE=1 USE_OPENSSL=1
sudo make PREFIX=/opt/haproxy-ssl install

HAProxy configuration for SSL offloading

First of all, you have to generate a few keys and a certificates using openssl and concatenate them in a file, the certificate first, then the key.

HAProxy configuration, very basic, for test purpose, and just to let you know which lines are very important:

defaults
 log 127.0.0.1 local0
 option tcplog

frontend ft_test
  mode http
  bind 0.0.0.0:8443 ssl crt ./haproxy.pem crt ./certs/ prefer-server-ciphers
  # other (self described) options are: [ciphers <suite>] [nosslv3] [notlsv1]
  use_backend bk_cert1 if { ssl_fc_sni cert1 } # content switching based on SNI
  use_backend bk_cert2 if { ssl_fc_sni cert2 } # content switching based on SNI
  default_backend bk_www.haproxy.com

backend bk_www.haproxy.com
 mode http
 server srvxlc 127.0.0.1:80

backend bk_cert1
  mode http
  server srv1 127.0.0.1:80

backend bk_cert2
  mode http
  server srv2 127.0.0.1:80

As you can see, HAProxy load one cert haproxy.pem which will be default one, and all the certificates from the certs dir. Actually, I have only 2 for my tests: cert1 and cert2.

Running HAProxy

First, just test the configuration is valid:

/opt/haproxy-ssl/sbin/haproxy -c -f ./hassl.cfg 
[WARNING] 247/110924 (6497) : config : missing timeouts for frontend 'ft_test'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 247/110924 (6497) : config : missing timeouts for backend 'bk_test'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Configuration file is valid

Don’t worry about warnings, I purposely wrote a very basic configuration.

Now, you can run HAProxy:

/opt/haproxy-ssl/sbin/haproxy  -f ./ha.cfg

Testing SSL provided by HAProxy

Check the default certificate server name:

openssl s_client -connect 127.0.0.1:8443 -servername www.haproxy.com
[...]
Certificate chain
 0 s:/CN=www.haproxy.com
   i:/CN=www.haproxy.com
[...]

HAProxy log line:

[...] ft_test bk_www.haproxy.com/srvxlc [...]

Checking cert1, loaded from ./certs/ dir:

openssl s_client -connect 127.0.0.1:8443 -servername cert1
[...]
Certificate chain
 0 s:/CN=cert1
   i:/CN=cert1
[...]

HAProxy log line:

[...] ft_test bk_cert1/srv1 [...]

Checking cert2, loaded from ./certs/ dir:

openssl s_client -connect 127.0.0.1:8443 -servername cert2
[...]
Certificate chain
 0 s:/CN=cert2
   i:/CN=cert2
[...]

HAProxy log line:

[...] ft_test bk_cert2/srv2 [...]

Checking with an unknown servername:

openssl s_client -connect 127.0.0.1:8443 -servername kemp
[...]
Certificate chain
 0 s:/CN=www.haproxy.com
   i:/CN=www.haproxy.com
[...]

HAProxy log line:

[...] ft_test bk_www.haproxy.com/srvxlc [...]

When the name is unknown, the failover is well done on the default certificate.

And voilà !!!
Since it has been released in the 1.5 branch, you can use it in production 🙂

Related articles

Links

HOWTO SSL native in HAProxy

IMPORTANT NOTE: this article has been outdated since HAProxy-1.5-dev12 has been released (10th of September). For more information about SSL inside HAProxy. please read:


How to get SSL with HAProxy getting rid of stunnel, stud, nginx or pound

Synopsis

Since yesterday night (FR time), HAProxy can support SSL offloading. It can even crypt traffic to a downstream server.
We’ll see later all the fun we could have with these nice features and the goodness it could bring in term of architecture. Today, I’ll just focus on how to install and configure HAProxy to offload SSL processing from your servers.

It’s important to notice that in order to be able to manage SSL connections, a huge rework of connection management has been done in HAProxy. Despite the long time spent on testing, there might still remain some bugs.
So we ask anybody who tests the procedure below to report bugs to HAProxy mailing list.

Note as well that the job was done by HAProxy Technologies engineers, who already improved stunnel and stud.

SSL offloading diagram

This is pretty simple, as shown on the picture below. The client will get connected on HAProxy using SSL, HAProxy will process SSL and get connected in clear to the server:
ssl offloading diagram

HAproxy installation

cd /usr/src
wget http://haproxy.1wt.eu/download/1.5/src/snapshot/haproxy-ss-20120905.tar.gz
tar xzf haproxy-ss-20120905.tar.gz
cd haproxy-ss-20120905/
make TARGET=linux2628 USE_STATIC_PCRE=1 USE_OPENSSL=1
sudo make PREFIX=/opt/haproxy-ssl install

HAProxy configuration for SSL offloading


First of all, you have to generate a key and a certificate using openssl and concatenate them in a file, the certificate first, then the key.
Here is mine, just copy/paste it in a file for your tests:

-----BEGIN CERTIFICATE-----
MIIBrzCCARgCCQCfMsCGwq31yzANBgkqhkiG9w0BAQUFADAcMRowGAYDVQQDExF3
d3cuZXhjZWxpYW5jZS5mcjAeFw0xMjA5MDQwODU3MzNaFw0xMzA5MDQwODU3MzNa
MBwxGjAYBgNVBAMTEXd3dy5leGNlbGlhbmNlLmZyMIGfMA0GCSqGSIb3DQEBAQUA
A4GNADCBiQKBgQDFxSTUwX5RD4AL2Ya5t5PAaNjcwPa3Km40uaPKSHlU8AMydxC1
wB4L0k3Ms9uh98R+kIJS+TxdfDaYxk/GdDYI1CMm4TM+BLHGAVA2DeNf2hBhBRKb
TAgxCxXwORJQSB/B+1r0/ZiQ2ig5Jzr8xGHz+tBsHYZ+t+RmjZPQFjnlewIDAQAB
MA0GCSqGSIb3DQEBBQUAA4GBABqVuloGWHReSGLY1yAs20uhJ3j/9SvtoueyFBag
z5jX4BNO/4yhpKEpCGmzYtjr7us3v/s0mKoIVvAgah778rCZW3kF1Y6xR6TYqZna
1ryKB50/MJg9PC4LNL+sAu+WSslOf6+6Ru5N3JjhIZST8edJsGDi6/5HTKoqyvkp
wOMn
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
MIICXgIBAAKBgQDFxSTUwX5RD4AL2Ya5t5PAaNjcwPa3Km40uaPKSHlU8AMydxC1
wB4L0k3Ms9uh98R+kIJS+TxdfDaYxk/GdDYI1CMm4TM+BLHGAVA2DeNf2hBhBRKb
TAgxCxXwORJQSB/B+1r0/ZiQ2ig5Jzr8xGHz+tBsHYZ+t+RmjZPQFjnlewIDAQAB
AoGBALUeVhuuVLOB4X94qGSe1eZpXunUol2esy0AMhtIAi4iXJsz5Y69sgabg/qL
YQJVOZO7Xk8EyB7JaerB+z9BIFWbZwS9HirqR/sKjjbhu/rAQDgjVWw2Y9sjPhEr
CEAvqmQskT4mY+RW4qz2k8pe4HKq8NAFwbe8iNP7AySP3K4BAkEA4ZPBagtlJzrU
7Tw4BvQJhBmvNYEFviMScipHBlpwzfW+79xvZhTxtsSBHAM9KLbqO33VmJ3C/L/t
xukW8SO6ewJBAOBxU0TfS0EzcRQJ4sn78G6hTjjLwJM2q4xuSwLQDVaWwtXDI6HE
jb7HePaGBGnOrlXxEOFQZCVdDaLhX0zcEQECQQDHcvc+phioGRKPOAFp1HhdfsA2
FIBZX3U90DfAXFMFKFXMiyFMJxSZPyHQ/OQkjaaJN3eWW1c+Vw0MJKgOSkLlAkEA
h8xpqoFEgkXCxHIa00VpuzZEIt89PJVWhJhzMFd7yolbh4UTeRx4+xasHNUHtJFG
MF+0a+99OJIt3wBn7hQ1AQJACScT3p6zJ4llm59xTPeOYpSXyllR4GMilsGIRNzT
RGYxcvqR775RkAgE+5DHmAkswX7TBaxcO6+C1+LJEwFRxw==
-----END RSA PRIVATE KEY-----

Now, HAProxy configuration, very basic, for test purpose, and just to let you know which lines are very important:

frontend ft_test
  mode http
  bind 0.0.0.0:8443 ssl crt ./haproxy.pem  # basic conf require only 1 keyword
  # other (self described) options are: [ciphers <suite>] [nosslv3] [notlsv1]
  default_backend bk_test

backend bk_test
  mode http
  server srv1 127.0.0.1:80

Running HAProxy


First, just test the configuration is valid:

/opt/haproxy-ssl/sbin/haproxy -c -f ./ha.cfg 
[WARNING] 247/110924 (6497) : config : missing timeouts for frontend 'ft_test'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 247/110924 (6497) : config : missing timeouts for backend 'bk_test'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Configuration file is valid

Don’t worry about warnings, I purposely wrote a very basic configuration.

Now, you can run HAProxy:

/opt/haproxy-ssl/sbin/haproxy  -f ./ha.cfg

Testing SSL provided by HAProxy


Use curl, with “–insecure” option if your certificate is self-signed, like mine:

curl --noproxy * -D - --insecure https://127.0.0.1:8443/index.html
HTTP/1.1 200 OK
Date: Tue, 04 Sep 2012 09:13:55 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Tue, 04 Sep 2012 09:10:01 GMT
ETag: "a35d1-e-4c8dc9f7d6c40"
Accept-Ranges: bytes
Content-Length: 14
Vary: Accept-Encoding
Content-Type: text/html

Welcome page.

Check SSL parameters with openssl in client mode:

openssl s_client -connect 127.0.0.1:8443
CONNECTED(00000003)
depth=0 /CN=www.exceliance.fr
verify error:num=18:self signed certificate
verify return:1
depth=0 /CN=www.exceliance.fr
verify return:1
---
Certificate chain
 0 s:/CN=www.exceliance.fr
   i:/CN=www.exceliance.fr
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIBrzCCARgCCQCfMsCGwq31yzANBgkqhkiG9w0BAQUFADAcMRowGAYDVQQDExF3
d3cuZXhjZWxpYW5jZS5mcjAeFw0xMjA5MDQwODU3MzNaFw0xMzA5MDQwODU3MzNa
MBwxGjAYBgNVBAMTEXd3dy5leGNlbGlhbmNlLmZyMIGfMA0GCSqGSIb3DQEBAQUA
A4GNADCBiQKBgQDFxSTUwX5RD4AL2Ya5t5PAaNjcwPa3Km40uaPKSHlU8AMydxC1
wB4L0k3Ms9uh98R+kIJS+TxdfDaYxk/GdDYI1CMm4TM+BLHGAVA2DeNf2hBhBRKb
TAgxCxXwORJQSB/B+1r0/ZiQ2ig5Jzr8xGHz+tBsHYZ+t+RmjZPQFjnlewIDAQAB
MA0GCSqGSIb3DQEBBQUAA4GBABqVuloGWHReSGLY1yAs20uhJ3j/9SvtoueyFBag
z5jX4BNO/4yhpKEpCGmzYtjr7us3v/s0mKoIVvAgah778rCZW3kF1Y6xR6TYqZna
1ryKB50/MJg9PC4LNL+sAu+WSslOf6+6Ru5N3JjhIZST8edJsGDi6/5HTKoqyvkp
wOMn
-----END CERTIFICATE-----
subject=/CN=www.exceliance.fr
issuer=/CN=www.exceliance.fr
---
No client certificate CA names sent
---
SSL handshake has read 604 bytes and written 319 bytes
---
New, TLSv1/SSLv3, Cipher is AES256-SHA
Server public key is 1024 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : AES256-SHA
    Session-ID: CF9B7BFF64DE0B332CE9A76896EC1C59C941340D6913612286113FA1F7E09E88
    Session-ID-ctx: 
    Master-Key: C6893078E49626DAF329C61774BA5A35E0264818E0D76542F25BB958584B835154402E02F9B722DD94C56B14EBB14D46
    Key-Arg   : None
    Start Time: 1346750742
    Timeout   : 300 (sec)
    Verify return code: 18 (self signed certificate)
---
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 04 Sep 2012 09:26:44 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Tue, 04 Sep 2012 09:10:01 GMT
ETag: "a35d1-e-4c8dc9f7d6c40"
Accept-Ranges: bytes
Content-Length: 14
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

Welcome page.
closed

Related articles

Links

Preserve source IP address despite reverse proxies

What is a Reverse-Proxy?

A Reverse-proxy is a server which get connected on upstream servers on behalf of users.
Basically, it usually maintain two TCP connections: one with the client and one with the upstream server.
The upstream server can be either an application server, a load-balancer or an other proxy/reverse-proxy.

For more details, please consult the page about the proxy mode of the Aloha load balancer.

Why using a reverse-proxy?

It can be used for different purpose:

  • Improve security, performance and scalability
  • Prevent direct access from a user to a server
  • Share a single IP for multiple services

Reverse-proxies are commonly deployed in DMZs to give access to servers located in a more secured area of the infrastructure.
That way, the Reverse-Proxy hides the real servers and can block malicious requests, choose a server based on the protocol or application information (IE: URL, HTTP header, SNI, etc…), etc…
Of course, a Reverse-proxy can act as a load-balancer 🙂

Drawback when using a reverse-proxy?


The main drawback when using a reverse-proxy is that it will hide the user IP: when acting on behalf of the user, it will use its own IP address to get connected on the server.
There is a workaround: using a transparent proxy, but this usage can hardly pass through firewalls or other reverse-proxies: the default gateway of the server must be the reverse-proxy.

Unfortunately, it is sometimes very useful to know the user IP when the connections comes in to the application server.
It can be mandatory for some applications and it can ease troubleshooting.

Diagram

The Diagram below shows a common usage of a reverse-proxy: it is isolated into a DMZ and handles the users traffic. Then it gets connected to the LAN where an other reverse-proxy act as a load-balancer.

Here is the flow of the requests and responses:

  1. The client gets connected through the firewall to the reverse-proxy in the DMZ and send it its request.
  2. The Reverse-Proxy validates the request, analyzes it to choose the right farm then forward it to the load-balancer in the LAN, through the firewall.
  3. The Load-balancer choose a server in the farm and forward the request to it
  4. The server processes the request then answers to the load-balancer
  5. The load-balancer forward the response to the reverse-proxy
  6. The reverse-proxy forward the response to the client

Bascially, the source IP is modified twice on this kind of architecture: during the steps 2 and 3.
And of course, the more you chain load-balancer and reverse proxies, the more the source IP will be changed.

The Proxy protocol


The proxy protocol was designed by Willy Tarreau, HAProxy developper.
It is used between proxies (hence its name) or between a proxy and a server which could understand it.

The main purpose of the proxy protocol is to forward information from the client connection to the next host.
These information are:

  • L4 and L3 protocol information
  • L3 source and destination addresses
  • L4 source and destination ports

That way, the proxy (or server) receiving these information can use them exactly as if the client was connected to it.

Basically, it’s a string the first proxy would send to the next one when connecting to it.
In example, the proxy protocol applied to an HTTP request:

PROXY TCP4 192.168.0.1 192.168.0.11 56324 80rn
GET / HTTP/1.1rn
Host: 192.168.0.11rn
rn

Note: no need to change anything in the architecture, since the proxy protocol is just a string sent over the TCP connection used by the sender to get connected on the service hosted by the receiver.

Now, I guess you understand how we can take advantage of this protocol to pass through the firewall, preserving the client IP address between the reverse proxy (which is able to generate the proxy protocol string) and the load-balancer (which is able to analyze the proxy protocol string).

Stud and stunnel are two SSL reverse proxy software which can send proxy protocol information.

Configuration

Between the Reverse-Proxy and the Load-Balancer

Since both devices must understand the proxy protocol, we’re going to consider both the LB and the RP are Aloha Load-Balancers.

Configuring the Reverse-proxy to send proxy protocol information


In the reverse proxy configuration, just add the keyword “send-proxy” on the server description line.
In example:

server srv1 192.168.10.1 check send-proxy

Configuring the Load-balancer to receive proxy protocol information

In the load-balancer configuration, just add the keyword “accept-proxy” on the bind description line.
In example:

bind 192.168.1.1:80 accept-proxy

What’s happening?


The reverse proxy will open a connection to the address binded by the load-balancer (192.168.1.1:80). This does not change from a regular connection flow.
Once the TCP connection is established, then the reverse proxy would send a string with client connection information.
The load-balancer can now use the client information provided through the proxy protocol exactly as if the connection was opened directly by the client itself. For example, we can match the client IP address in ACLs for white/black listing, stick-tables, etc…
This would also make the “balance source” algorithm much more efficient 😉

Between the Load-Balancer and the server

Here we are!!!!
Since the LB knows the client IP, we can use it to get connected on the server. Yes, this is some kind of spoofing :).

In your HAProxy configuration, just use the source parameter in your backend:

backend bk_app
[...]
  source 0.0.0.0 usesrc clientip
  server srv1 192.168.11.1 check

What’s happening?


The Load-balancer uses the client IP information provided by the proxy protocol to get connected to the server. (the server sees the client ip as source ip)
Since the server will receive a connection from the client IP address, it will try to reach it through its default gateway.
In order to process the reverse-NAT, the traffic must pass through the load-balancer, that’s why the server’s default gateway must be the load-balancer.
This is the only architecture change.

Compatibility


The kind of architecture in the diagram will only work with Aloha Load-balancers or HAProxy.

Links

Scaling out SSL

Synopsis

We’ve seen recently how we could scale up SSL performance.
But what about scaling out SSL performance?
Well, thanks to Aloha and HAProxy, it’s easy to manage smartly a farm of SSL accelerator servers, using persistence based on the SSL Session ID.
This way of load-balancing is smart, but in case of SSL accelerator failure, other servers in the farm would have a CPU overhead to generate SSL Session IDs for sessions re-balanced by the Aloha.

After a talk with (the famous) emericbr, HAProxy Technologies dev team leader, he decided to write a patch for stud to add a new feature: sharing SSL session between different stud processes.
That way, in case of SSL accelerator failure, the servers getting re-balanced sessions would not have to generate a new SSL session.

Emericbr’s patch is available here: https://github.com/bumptech/stud/pull/50
At the end of this article, you’ll learn how to use it.

Stud SSL Session shared caching

Description

As we’ve seen in our article on SSL performance, a good way to improve performance on SSL is to use a SSL Session ID cache.

The idea here, is to use this cache as well as sending updates into a shared cache one can consult to get the SSL Session ID and the data associated to it.

As a consequence, there are 2 levels of cache:

      * Level 1: local process cache, with the currently used SSL session
      * Level 2: shared cache, with the SSL session from all local cache

Way of working

The protocol understand 3 types of request:

      * New: When a process generates a new session, it updates its local cache then the shared cache
      * Get: When a client tries to resume a session and the process receiving it is not aware of it, then the process tries to get it from the shared cache
      * Del: When a session has expired or there is a bad SSL ID sent by a client, then the process will delete the session from the shared cache

Who does what?

Stud has a Father/Son architecture.
The Father starts up then starts up Sons. The Sons bind the TCP external ports, load the certificate and process the SSL requests.
Each son manages its local cache and send the updates to the shared cache. The Father manages the shared cache, receiving the changes and maintaining it up to date.

How are the updates exchanged?

Updates are sent either on Unicast or Multicast, on a specified UDP port.
Updates are compatible both IPv4 and IPv6.
Each packet are signed by an encrypted signature using the SSL certificate, to avoid cache poisoning.

What does a packet look like?


SSL Session ID ASN-1 of SSL Session structure Timestamp Signature
[32 bytes] [max 512 bytes] [4 bytes] [20 bytes]

Note: the SSL Session ID field is padded with 0 if required

Diagram

Let’s show this on a nice picture where each potato represents each process memory area.
stud_shared_cache
Here, the son on host 1 got a new SSL connection to process, since he could not find it in its cache and in the shared cache, he generated the asymmetric key, then push it to his shared cache and the father on host 2 which updates the shared cache for this host.
That way, if this user is routed to any stud son process, he would not have to compute again its asymmetric key.

Let’s try Stud shared cache

Installation:

git clone https://github.com/EmericBr/stud.git
cd stud
wget http://1wt.eu/tools/ebtree/ebtree-6.0.6.tar.gz
tar xvzf ebtree-6.0.6.tar.gz
ln -s ebtree-6.0.6 ebtree
make USE_SHARED_CACHE=1

Generate a key and a certificate, add them in a single file.

Now you can run stud:

sudo ./stud -n 2 -C 10000 -U 10.0.3.20,8888 -P 10.0.0.17 -f 10.0.3.20,443 -b 10.0.0.3,80 cert.pem

and run a test:

curl --noproxy * --insecure -D - https://10.0.3.20:443/

And you can watch the synchronization packets:

$ sudo tcpdump -n -i any port 8888
[sudo] password for bassmann: 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes

17:47:10.557362 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176
17:49:04.592522 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176
17:49:05.476032 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176

Related links

Benchmarking SSL performance

Introduction

The story

Recently, there has been some attacks against website which aimed to steal user identity. In order to protect their users, major website owners had to find a solution.
Unfortunately, we know that sometimes, improving security means downgrading performance.

SSL/TLS is a fashion way to improve data safety when data is exchanged over a network.
SSL/TLS encryption is used to crypt any kind of data, from the login/password on a personnal blog service to a company extranet passing through an e-commerce caddy.
Recent attack shown that to be protect users identity, all the traffic must be encrypted.

Note, SSL/TLS is not only used on Website, but can be used to crypt any TCP based protocol like POP, IMAP, SMTP, etc…

Why this benchmark?

At HAProxy Technologies, we build load-balancer appliances based on a Linux kernel, LVS (for layer 3/4 load-balancing), HAProxy (for layer 7 load-balancing) and stunnel (SSL encryption), for the main components.

  1. Since SSL/TLS is fashion, we wanted to help people ask the right questions and to do the right choice when they have to bench and choose SSL/TLS products.
  2. We wanted to explain to everybody how one can improve SSL/TLS performance by adding some functionality to SSL open source software.
  3. Lately, on HAProxy mailing list, Sebastien introduced us to stud, a very good, but still young, alternative to stunnel. So we were curious to bench it.

SSL/TLS introduction

The theory

SSL/TLS can be a bit complicated at first sight.
Our purpose here is not to describe exactly how it works, there are useful readings for that:

    SSL main lines

    Basically, there are two main phases in SSL/TLS:

    1. the handshake
    2. data exchange

    During the handshake, the client and the server will generate three keys which are unique for the client and the server, available during the session life only and will be used to crypt and uncrypt data on both sides by the symmetric algorithms.
    Later in the article, we will use the term “Symmetric key” for those keys.

    The symmetric key is never exchanged over the network. An ID, called SSL session ID, is associated to it.

    Let’s have a look at the diagram below, which shows a basic HTTPS connection, step by step:

    SSL_handshake

    We also represented on the diagram the factor which might have an impact on performance.

    1. Client sends to the server the Client Hello packet with some randon numbers, its supported ciphers and a SSL session ID in case of resuming SSL session
    2. Server chooses a cipher from the client cipher list and sends a Server Hello packet, including random number.
      It generates a new SSL session ID if resume is not possible or available.
    3. Server sends its public certificate to the client, the client validates it against CA certificates.
      ==> sometimes you may have warnings about self-signed certificates.
    4. Server sends a Server Hello Done packet to tell the client he has finished for now
    5. Client generates and sends the pre-master key to the server
    6. Client and server generate the symmetric key that will be used to crypt data
    7. Client and server tells each other the next packets will be sent encrypted
    8. Now, data is encrypted.

    SSL Performance

    As you can see on the diagram above, some factors may influence SSL/TLS performance on the server side:

    1. the server hardware, mainly the CPU
    2. the asymmetric key size
    3. the symmetric algorithm

    In this article, we’re going to study the influence of these 4 factors and observe the impact on the performance.

    A few other things might have an impact on performance:

    • the ability to resume a SSL/TLS session
    • symmetric key generation frequency
    • object size to crypt

    Benchmark platform

    We used the platform below to run our benchmark:

    SSL_benchmark_platform

    The SSL server has purposely much less capacity than the client in order to ensure the client won’t saturate before the server.

    The client is inject + stunnel on client mode.
    The web server behind HAProxy and the SSL offloader is httpterm

    Note: Some resuts were checked using httperf and curl-loader, and results were similar.

    On the server, we have 2 cores and since we have enabled hyper threading, we have 4 CPUs available from a kernel point of view.
    The e1000e driver of the server has been modified to be able to bind interrupts on the first logical CPU core 0.

    Last but not least, the SSL library used is Openssl 0.9.8.

    Benchmark purpose

    The purpose of this benchmark is to:

    • Compare the different way of working of stunnel (fork, pthread, ucontext, ucontext + session cache)
    • Compare the different way of working of stud (without and with session cache)
    • Compare stud and stunnel (without and with session cache)
    • Impact of session renegotiation frequency
    • Impact of asymmetric key size
    • Impact of object size
    • Impact of symmetric cypher

    At the end of the document, we’re going to give some conclusion as well as some advices.

    As a standard test, we’re going to use the following:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA
    Object size: 0 byte

    For each test, we’re going to provide the transaction per second (TPS) and the handshake capacity, which are the two most important numbers you need to know when comparing SSL accelerator products.

    • Transactions per second: the client will always re-use the same SSL session ID
    • Symmetric key generation: the client will never re-use its SSL session ID, forcing the server to generate a new symmetric key for each request

    1. From the influence of symmetric key generation frequency

    For this test, we’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA
    Object size: 0 byte
    CPU: 1 core

    Note that the object is void because we want to mesure pure SSL performance.

    We’re going to bench the following software:
    STNL/FORK: stunnel-4.39 mode fork
    STNL/PTHD: stunnel-4.39 mode pthread
    STNL/UCTX: stunnel-4.39 mode ucontext
    STUD/BUMP: stud github bumptech (github: 1846569)

    Symmetric key generation frequency STNL/FORK STNL/PTHD STNL/UCTX STUD/BUMP
    For each request 131 188 190 261
    Every 100 requests 131 487 490 261
    Never 131 495 496 261

    Observation:

    – We can clearly see that STNL/FORK and STUD/BUMP can’t resume a SSL/TLS session.
    STUD/BUMP has better performance than STNL/* on symmetric key generation.

    2. From the advantage of caching SSL session

    For this test, we have developed patches for both stunnel and stud to improve a few things.
    The stunnel patches are applied on STNL/UCTX and include:
    – listen queue settable
    – performance regression due to logs fix
    – multiprocess start up management
    – session cache in shared memory

    The stud patches are applied on STUD/BUMP and include:
    – listen queue settable
    – session cache in shared memory
    – fix to allow session resume

    We’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA
    Object size: 0 byte
    CPU: 1 core

    Note that the patched version will be respectively called STNL/PATC and STUD/PATC in the rest of this document.
    The percentage highlights the improvement of STNL/PATC and STUD/PATC respectively over STNL/UCTX and STUD/BUMP.

    Symmetric key generation frequency STNL/PATC STUD/PATC
    For each request 246
    +29%
    261
    +0%
    Every 100 requests 1085
    +121%
    1366
    +423%
    Never 1129
    +127%
    1400
    +436%

    Observation:

    – obviously, caching SSL session improves the number of transaction per second
    stunnel patches also improved stunnel performance

    3. From the influence of CPU cores

    As seen on the previous test, we could improve TLS capacity by adding a symmetric key cache to both stud and stunnel.
    We still might be able to improve things :).

    For this test, we’re going to configure both stunnel and stud to use 2 CPU cores.
    The kernel will be configured on core 0, userland on core 1 and stunnel or stud on cores 2 and 3, as shown below:
    cpu_affinity_ssl_2_cores

    For the rest of the tests, we’re going to bench only STNL/PTHD, which is the stunnel mode used by most of linux distribution, and the two patched STNL/PATC and STUD/PATC.

    For this test, we’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA
    Object size: 0 byte
    CPU: 2 cores

    The table below summarizes the number we get with 2 cores and the percentage of improvement with 1 core:

    Symmetric key generation frequency STNL/PTHD STNL/PATC STUD/PATC
    For each request 217
    +15%
    492
    +100%
    517
    +98%
    Every 100 requests 588
    +20%
    2015
    +85%
    2590
    +89%
    Never 602
    +21%
    2118
    +87%
    2670
    +90%

    Observation:

    – now, we know the number of CPU cores has an influence 😉
    – the symmetric key generation has doubled on the patched versions. STNL/FORK does not take advantage of the second CPU core.
    – we can clearly see the benefit of SSL session caching on both STNL/PATC and STUD/PATC
    STUD/PATC performs around 25% better than STNL/PATC

    Note that since STNL/FORK and STUD/BUMP have no SSL session cache, no need to test them anymore.
    We’re going to concentrate on STNL/PTHD, STNL/UCTX, STNL/PATC and STUD/PATC.

    4. From the influence of the asymmetric key size

    The default asymmetric key size on current website is usually 1024 bits. For security purpose, more and more engineer now recommend using 2048 bits or even 4096 bits.
    In the following test, we’re going to use observe the impact of the asymmetric key size on the SSL performance.

    For this test, we’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 2048 bits
    Cipher: AES256-SHA
    Object size: 0 byte
    CPU: 2 cores

    The table below summarizes the number we got with 2048 bits asymmetric key size generation and the percentage highlights the performance impact compared to the 1024 bits asymmetric key size, both tests running on 2 CPU cores:

    Symmetric key generation frequency STNL/PTHD STNL/PATC STUD/PATC
    For each request 46
    -78%
    96
    -80%
    96
    -81%
    Every 100 requests 541
    -8%
    1762
    -13%
    2121
    -18%
    Never 602
    +0%
    2118
    0%
    2670
    +0%

    Observation:

    – the asymmetric key size has only an influence on symmetric key generation. The number of transaction per second does not change at all for the software which are able to cache and re-use SSL session id.
    – passing from 1024 to 2048 bits means dividing by 4 the number of symmetric key generated per second on our environment.
    – on an average traffic with renegotiation every 100 requests, stud is more impacted than stunnel but it performs better anyway.

    5. From the influence of the object size

    If you read carefully the article since the beginning, then you might be thinking “they’re nice with their test, but thier objects are empty… what happens with real objects?”
    So, I guess it’s time to study the impact of the object size!

    For this test, we’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA
    Object size: 1 KByte / 4 KBytes
    CPU: 2 cores

    Results for STNL/PTHD, STNL/PATC and STUD/PATC:
    The percentage number highlights the performance impact.

    Symmetric key generation frequency STNL/PTHD STNL/PATC STUD/PATC
       1KB       4KB       1KB       4KB       1KB       4KB   
    every 100 requests 582 554
    -5%
    1897 1668
    -13%
    2475 2042
    -21%
    never 595 564
    -5%
    1997 1742
    -14%
    2520 2101
    -19%

    Observation

    – the bigger the object, the lower the performance…
    To be fair, we’re not surprised by this result 😉
    STUD/PATC performs 20% better than STNL/PATC
    STNL/PATC performs 3 times better than STNL/PTHD

    6. From the influence of the cipher

    Since the beginning, we run our bench only with the cipher AES256-SHA.
    It’s now the time to bench some other cipher:
    – first, let’s give a try to AES128-SHA, and compare it to AES256-SHA
    – second, let’s try RC4_128-SHA, and compare it to AES128-SHA

    For this test, we’re going to use the following parameters:
    Protocol: TLSv1
    Asymmetric key size: 1024 bits
    Cipher: AES256-SHA / AES128-SHA / RC4_128-SHA
    Object size: 4Kbyte
    CPU: 2 cores

    Results for STNL/PTHD, STNL/PATC and STUD/PATC:
    The percentage number highlights the performance impact on the following cipher:
    – AES 128 ==> AES 256
    – RC4 128 ==> AES 128

    Symmetric key generation frequency STNL/PTHD STNL/PATC STUD/PATC
    AES256 AES128 RC4_128 AES256 AES128 RC4_128 AES256 AES128 RC4_128
    every 100 requests 554 567
    +2%
    586
    +3%
    1668 1752
    +5%
    1891
    +8%
    2042 2132
    +4%
    2306
    +8%
    never 564 572
    +1%
    600
    +5%
    1742 1816
    +4%
    1971
    +8%
    2101 2272
    +8%
    2469
    +8%

    Observation:

    – As expected, AES128 performs better than AES256
    RC4 128 performs better than AES128
    stud performs better than stunnel
    – Note that RC4 will perform better on big objects, since it works on a stream while AES works on blocks

    Conclusion on SSL performance

    1.) bear in mind to ask the 2 numbers when comparing SSL products:
    – the number of handshakes per second
    – the number of transaction per second (aka TPS).

    2.) if the product is not able do resume SSL session (by caching SSL ID), just forget it!
    It won’t perform well and is not scalable at all.

    Note that having a load-balancer which is able to maintain affinity based on SSL session ID is really important. You can understand why now.

    3.) bear in mind that the asymmetric key size may have a huge impact on performance.
    Of course, the bigger the asymmetric key size is, the harder it will be for an attacker to break the generated symmetric key.

    4.) stud is young, but seems promising
    By the way, stud has included HAProxy Technologies patches from @emericbr, so if you use a recent stud version, you may have the same result as us.

    5.) euh, let’s read again the results… If we consider that your user would renegociate every 100 request and that the average object size you want to encrypt is 4K, you could get 2300 SSL transaction per second on a small Intel Atom @1.66GHZ!!!!
    Imagine what you could do with a dual CPU core i7!!!

    By the way, we’re glad that the stud developers have integrated our patches into main stud branch:

    Related links