Universal SSL: How It Scales
On Monday, we announced Universal SSL, enabling HTTPS for all websites using CloudFlare’s Free plan. Universal SSL represents a massive increase in the number of sites we serve over HTTPS—from tens of thousands, to millions. People have asked us, both in comments and in person, how our servers will handle this extra load. The answer, in a nutshell, is this: we found that with the right hardware, software, and configuration, the cost of SSL on web servers can be reduced to almost nothing.
CloudFlare’s entire infrastructure is built on modern commodity hardware. Specifically, our web servers are running on CPUs manufactured by Intel that were designed with cryptography in mind.
All Intel CPUs based on the Westmere CPU microarchitecture (introduced in 2010) and later have specialized cryptographic instructions. Important for CloudFlare’s Universal SSL rollout are the AES-NI instructions which speed up the Advanced Encryption Standard (AES) algorithm. There’s also a set of instructions called Carry-less Multiplication (CLMUL) that computes mathematical operations binary finite fields. CLMUL can be used to speed up AES in Galois Counter-mode (GCM): our preferred mode of encryption due to its resistance against recent attacks like BEAST.
As we described in our primer on TLS, the server picks which algorithm is used in a connection based on the cipher suites supported by the client. In our configuration (available on GitHub), we prioritize AES-based ciphers, and prefer AES-GCM to AES-CBC.
The vast majority of the HTTPS data served by CloudFlare’s servers is encrypted with AES. Here’s the breakdown of ciphers we use on an average day:
AES-CBC: 62.80% AES-GCM: 36.28% 3DES: 0.91% RC4: 0.0003%
AES is practically free of performance cost on our modern processors, and 99% of data enciphered on CloudFlare’s servers uses AES so the cost is trivially small. Note that out of these ciphers, RC4 is the second fastest; however, we de-prioritized it for security reasons, though we couldn’t remove it completely due to some odd client configurations.
Image © Trevor Perrin 2014
There are two potentially costly portions of a TLS connection: the data encipherment and the handshake. With AES-NI and CLMUL data encipherment is essentially free; however, there are two expensive steps in the handshake. One is the the private key operation, and the other is the key establishment (this is described in our Keyless SSL post).
With Universal SSL, both the private key operation and the key establishment use elliptic curve cryptography. The private key operation uses the Elliptic Curve Digital Signature Algorithm (ECDSA), and the key establishment uses Ephemeral Elliptic Curve Diffie-Hellman (ECDHE).
Elliptic curve cryptography allows you to use smaller keys than traditional RSA. For example, a 256-bit elliptic curve key is equivalent in strength to a 3072-bit RSA key. Smaller keys allow elliptic curve cryptography to be around 5-10x faster than RSA in general cases. For Universal SSL, we chose the elliptic curve P-256 with an optimized assembly code implementation by Shay Gueron and Vlad Krasnov. This implementation was merged into OpenSSL this week, and provides additional speedup of 2-3x for both ECDHE and ECDSA. Choosing this elliptic curve reduced the computational burden of the TLS handshake on our servers by an order of magnitude.
Up until the launch of Universal SSL this week, all but a hundred sites on the Internet used RSA-based certificates. Universal SSL is the first large-scale deployment of ECDSA keys for TLS. This is the first major step towards bringing the advantages of elliptic curves onto the web.
Even with fast elliptic curve cryptography, the asymmetric steps (key establishment and digital signature) are still the most expensive part of a TLS handshake. For returning visitors of a site we have a shortcut that eliminates the need for our servers to perform these expensive operations. The shortcut is called session resumption and it’s built into the TLS specification.
In our post about Keyless SSL, we mentioned new work we did to improve session resumption. Resuming a TLS connection is not only faster in terms of latency—there is one less round-trip to the server—but it’s also more lightweight because the server can skip the expensive asymmetric cryptographic operations.
The TLS protocol has two ways to resume a session: session tickets and session IDs. In session ID resumption, the server stores the session information for reuse later. For session tickets, the session information is encrypted by a key known only by the server and sent to the client in the handshake in a “session ticket”. When the client wants to resume a session, it can send the session ticket to the server which can decrypt it and resume the session. By storing the connection information in a way that it can be reused later, the expensive parts of the handshake are not necessary.
The work done by Piotr Sikora and Zi Lin for Keyless SSL to share sessions and session tickets across machines allows us to resume connections even if they were made to a different CloudFlare server. For SSL session ticket based resumption (used in Chrome and Firefox), sessions can be resumed worldwide; for session ID based resumption (all other browsers), sessions can be resumed from any machine in the same data center.
CloudFlare can serve any customer’s site, from any CloudFlare server, anywhere in the world—including sites over TLS. This flexibility allows us to efficiently handle attacks, and evenly share the load across our data centers.
Web servers like nginx are designed to use static configurations. If something about a site changes (like the certificate), the server configuration needs to be reloaded. Reloading can cause the server to read data from disk and re-initialize internal state, causing a strain on server resources. Reloading often is necessary when you have millions of customers who are able to change their certificates at any time. At CloudFlare’s scale, this can result in a performance bottleneck.
Lazy loading of certificates helps relieve that bottleneck. Using custom modifications to nginx, we are able to dynamically load certificates into memory only when they’re needed. Now, if one site changes their certificate, the server does not have to reload every certificate. This change allows our servers to scale up to handle millions of HTTPS sites.
Through a combination of modern hardware, modern algorithms, lazy loading, and session resumption techniques, we were able to reduce the CPU usage of Universal SSL to almost nothing.
The following two CPU graphs are from the same machine on the same day of the week. Want to guess which one was after we rolled out Universal SSL?
Hopefully, our experience helps debunk one of the myths about SSL by showing that it can be done on a massive scale with minimal extra burden on web servers.
Read more here:: CloudFlare