How Cloudflare uses Cloudflare Spectrum: A look into an intern’s project at Cloudflare

By GIXnews
How Cloudflare uses Cloudflare Spectrum: A look into an intern's project at Cloudflare

How Cloudflare uses Cloudflare Spectrum: A look into an intern's project at Cloudflare

Cloudflare extensively uses its own products internally in a process known as ) never needed, so it also needs to be avoided.

As well, since we are providing a reverse proxy to customer origins, we do not need to allow connections to IP ranges that cannot be used on the public Internet, as specified in this RFC.

The Problem

To improve usability and allow internal Spectrum customers to create apps using the Dashboard instead of the static configuration workflow, we needed a way to give particular customers permission to use Cloudflare managed addresses in their Spectrum configuration. Solving this problem was my main project for the internship.

A good starting point ended up being the Addressing API. The Addressing API is Cloudflare’s solution to IP management, an internal database and suite of tools to keep track of IP prefixes, with the goal of providing a unified source of truth for how IP addresses are being used across the organization. This makes it possible to provide a cross-product platform for products and features such as BYOIP, BGP On Demand, and Magic Transit.

The Addressing API keeps track of all Cloudflare managed IP prefixes, along with who owns the prefix. As well, the owner of a prefix can give permission for someone else to use the prefix. We call this a delegation.

A user’s permission to use an IP address managed by the Addressing API is determined as followed:

  • Is the user the owner of the prefix containing the IP address?
    a) Yes, the user has permission to use the IP
    b) No, go to step 2
  • Has the user been delegated a prefix containing the IP address?
    a) Yes, the user has permission to use the IP.
    b) No, the user does not have permission to use the IP.
  • The Solution

    With the information present in the Addressing API, the solution starts to become clear. For a given customer and IP, we use the following algorithm:

  • Is the IP managed by Cloudflare (or contained in the previous RFC)?
    a) Yes, go to step 2
    b) No, allow as origin
  • Does the customer have permission to use the IP address?
    a) Yes, allow as origin
    b) No, deny as origin
  • As long as the internal customer has been given permission to use the Cloudflare IP (through a delegation in the Addressing API), this approach would allow them to use it as an origin.

    However, we run into a corner case here – since BYOIP customers also have permission to use their own ranges, they would be able to set their own IP as an origin, potentially causing a cycle. To mitigate this, we need to check if the IP is a Spectrum edge IP. Fortunately, the Addressing API also contains this information, so all we have to do is check if the given origin IP is already in use as a Spectrum edge IP, and if so, deny it. Since all of the denied networks checks occur in the Addressing API, we were able to remove Spectrum’s own deny network database, reducing the engineering workload to maintain it along the way.

    Let’s go through a concrete example. Consider an internal customer who wants to use 104.16.8.54/32 as an origin for their Spectrum app. This address is managed by Cloudflare, and suppose the customer has permission to use it, and the address is not already in use as an edge IP. This means the customer is able to specify this IP as an origin, since it meets all of our criteria.

    For example, a request to the Addressing API could look like this:

    curl --silent 'https://addr-api.internal/edge_services/spectrum/validate_origin_ip_acl?cidr=104.16.8.54/32' -H "Authorization: Bearer $JWT" | jq .
    {
      "success": true,
      "errors": [],
      "result": {
        "allowed_origins": {
          "104.16.8.54/32": {
            "allowed": true,
            "is_managed": true,
            "is_delegated": true,
            "is_reserved": false,
            "has_binding": false
          }
        }
      },
      "messages": []
    }
    

    Now we have completely moved the responsibility of validating the use of origin IP addresses from Spectrum’s configuration service to the Addressing API.

    Performance

    This approach required making another HTTP request on the critical path of every create app request in the Spectrum configuration service. Some basic performance testing showed (as expected) increased response times for the API call (about 100ms). This led to discussion among the Spectrum team about the performance impact of different HTTP requests throughout the critical path. To investigate, we decided to use OpenTracing.

    OpenTracing is a standard for providing distributed tracing of microservices. When an HTTP request is received, special headers are added to it to allow it to be traced across the different services. Within a given trace, we can see how long a SQL query took, the time a function took to complete, the amount of time a request spent at a given service, and more.

    We have been deploying a tracing system for our services to provide more visibility into a complex system.

    After instrumenting the Spectrum config service with OpenTracing, we were able to determine that the Addressing API accounted for a very small amount of time in the overall request, and allowed us to identify potentially problematic request times to other services.

    How Cloudflare uses Cloudflare Spectrum: A look into an intern's project at Cloudflare

    Lessons Learned

    Reading documentation is important! Having a good understanding of how the Addressing API and the config service worked allowed me to create and integrate an endpoint that made sense for my use-case.

    Writing documentation is just as important. For the final part of my project, I had to onboard Crossbow – an internal Cloudflare tool used for diagnostics – to Spectrum, using the new features I had implemented. I had written an onboarding guide, but some stuff was unclear during the onboarding process, so I made sure to gather feedback from the Crossbow team to improve the guide.

    Finally, I learned not to underestimate the amount of complexity required to implement relatively simple validation logic. In fact, the implementation required understanding the entire system. This includes how multiple microservices work together to validate the configuration and understanding how the data is moved from the Core to the Edge, and then processed there. I found increasing my understanding of this system to be just as important and rewarding as completing the project.

    Footnotes:

    [1]Regional Services actually makes use of proxying a Spectrum connection to another colocation, and then proxying to the origin, but the configuration plane is not involved in this setup.

    Source:: CloudFlare