We constantly measure our own and other networks’ performance, and look for ways to improve our performance; and share our results.
In this post we are going to share the most recent updates, and tell you about our tools and processes that we use to monitor and improve our network performance.
First, the results
In July, 2022, we started taking a more granular look down to every single network and taking actions for the specific networks where we have some room for improvement. Cloudflare was already the fastest provider for most of the networks around the world (we define a network as country and AS number pair). Taking a closer look at the numbers, Cloudflare was ranked #1 in 33% of the networks and was within 2 ms or 5% of the #1 provider for 8% of the networks that we measured in terms of the 95th percentile TCP Connection Time. For reference, our closest competitor on that front was the fastest for 20% of networks.
As of May 31, 2023 those numbers have improved significantly. Today, Cloudflare is the fastest provider for 46% of networks—and was within 2 ms (95th percentile TCP Connection Time) or 5% of the fastest provider for 10% of the networks that we measured—whereas our closest competitor is now the fastest for 18% of networks.
Below is the change in percentage of networks that each provider is the fastest over time for Cloudflare and other services.
Our tooling and process
We use Real User Measurements (RUM) and fetch a small file from Cloudflare, Akamai, CloudFront, Fastly and Google Cloud CDN. Browsers around the world report the performance of those providers from the perspective of the end-user network they are on. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.
Using the RUM data, we are able to measure various performance metrics, such as TCP Connection Time, Time to First Byte (TTFB), Time to Last Byte (TTLB), for ourselves and other networks.
One of the most important tools that we use for measuring and improving our performance is what we call Performance Benchmarks Dashboard. That’s the dashboard where we can analyze the data that we collect in different dimensions.
Here are the metrics that we monitor based on some of the dimensions.
The first metric we closely monitor is the percent of networks that we are ranked #1 in terms of TCP Connection Time. That’s a key performance indicator that we evaluate ourselves against.
The second metric we monitor is our overall performance in each country. This gives us visibility into the countries or regions that we need to pay closer attention to and take action towards improving our performance. Those actions will be listed later. Orange indicates the countries that Cloudflare is the fastest provider based on the TCP Connection Time.
The third set of metrics we use are TCP Connection Time and TTLB. The number of networks where we are #1 in terms of 95th percentile TCP Connection Time is one of our key performance indicators. We actively monitor and work on improving that metric. More on that later.
Using all the metrics listed above, the Performance Benchmarks Dashboard helps us to find networks where we can improve our performance. Our engineering teams monitor that dashboard and investigate the underlying reasons for degraded performance if there are any and the action items are displayed on the dashboard until they are resolved.
Once we identify a particular network to improve, we investigate the root cause and document the action items to improve our performance. Those actions generally fall under three categories.
The first category is establishing peering with that network in a specific location so that users can take the optimum path. That’s a critical component of a better Internet! Here is our more detailed blog post about that from earlier this week.
The second category is expanding our compute capacity in a specific data center so that we can serve the users at the closest data center.
And finally, we apply traffic engineering actions to make sure that the network is served in the optimum way. Traffic engineering actions are generally manual configurations that we apply, in case the path that’s chosen by the routing protocols is not the most performant path.
The data we collect gives us a granular view of every network that connects to Cloudflare and we constantly optimize our infrastructure to improve Cloudflare’s performance. We won’t rest until we’re #1 everywhere.