Technical Details of Why Cloudflare Chose AMD EPYC for Gen X Servers
From the very beginning Cloudflare used Intel CPU-based servers (and, also, Intel components for things like NICs and SSDs). But we’re always interested in optimizing the cost of running our service so that we can provide products at a low cost and high gross margin.
We’re also mindful of events like the Spectre and Meltdown vulnerabilities and have been working with outside parties on research into mitigation and exploitation which we hope to publish later this year.
We looked very seriously at ARM-based CPUs and continue to keep our software up to date for the ARM architecture so that we can use ARM-based CPUs when the requests per watt is interesting to us.
In the meantime, we’ve deployed AMD’s EPYC processors as part of Gen X server platform and for the first time are not using any Intel components at all. This week, we announced details of this tenth generation of servers. Below is a recap of why we’re excited about the design, specifications, and performance of our newest hardware.
Servers for an Accelerated Future
Every server can run every service. This architectural decision has helped us achieve higher efficiency across the Cloudflare network. It has also given us more flexibility to build new software or adopt the newest available hardware.
Notably, Intel is not inside. We are not using their hardware for any major server components such as the CPU, board, memory, storage, network interface card (or any type of accelerator).
This time, AMD is inside.
Compared with our prior server (Gen 9), our Gen X server processes as much as 36% more requests while costing substantially less. Additionally, it enables a ~50% decrease in L3 cache miss rate and up to 50% decrease in NGINX p99 latency, powered by a CPU rated at 25% lower TDP (thermal design power) per core.
Gen X CPU benchmarks
To identify the most efficient CPU for our software stack, we ran several benchmarks for key workloads such as cryptography, compression, regular expressions, and LuaJIT. Then, we simulated typical requests we see, before testing servers in live production to measure requests per watt.
Based on our findings, we selected the single socket 48-core AMD 2nd Gen EPYC 7642.
Impact of Cache Locality
The single AMD EPYC 7642 performed very well during our lab testing, beating our Gen 9 server with dual Intel Xeon Platinum 6162 with the same total number of cores. Key factors we noticed were its large L3 cache, which led to a low L3 cache miss rate, as well as a higher sustained operating frequency.
Gen X Performance Tuning
Partnering with AMD, we tuned the 2nd Gen EPYC 7642 processor to achieve additional 6% performance. We achieved this by using power determinism and configuring the CPU’s Thermal Design Power (TDP).
Securing Memory at EPYC Scale
Finally, we described how we use Secure Memory Encryption (SME), an interesting security feature within the System on a Chip architecture of the AMD EPYC line. We were impressed by how we could achieve RAM encryption without significant decrease in performance. This reduces the worry that any data could be exfiltrated from a stolen server.
We enjoy designing hardware that improves the security, performance and reliability of our global network, trusted by over 26 million Internet properties.
Want to help us evaluate new hardware? Join us!