Keeping the Cloudflare API ‘all green’ using Python-based testing

Keeping the Cloudflare API 'all green' using Python-based testing

Keeping the Cloudflare API 'all green' using Python-based testing

At Cloudflare, we reuse existing core systems to power multiple products and testing of these core systems is essential. In particular, we require being able to have a wide and thorough visibility of our live APIs’ behaviors. We want to be able to detect regressions, prevent incidents and maintain healthy APIs. That is why we built Scout.

Scout is an automated system periodically running Python tests verifying the end to end behavior of our APIs. Scout allows us to evaluate APIs in production-like environments and thus ensures we can green light a production deployment while also monitoring the behavior of APIs in production.

Why Scout?

Before Scout, we were using an automated test system leveraging the Robot Framework. This older system was limiting our testing capabilities. In fact, we could not easily match json responses against keys we were looking for. We would abandon covering different behaviors of our APIs as it was impossible to decide on which resources a given test suite would run. Two different test suites would create false negatives as they were running on the same account.

Regarding schema validation, only API responses were validated against a json schema and tests would not fail if the response did not match the schema. Moreover, It was impossible to validate API requests.

Test suites were run in a queue, making the delay to a new feature assessment dependent on the number of test suites to run. The queue would as well potentially make newer test suites run the following day. Hence we often ended up with a mismatch between tests and APIs versions. Test steps could not be run in parallel either.

We could not split test suites between different environments. If a new API feature was being developed it was impossible to write a test without first needing the actual feature to be released to production.

We built Scout to overcome all these difficulties. We wanted the developer experience to be easy and we wanted Scout to be fast and reliable while spotting any live API issue.

A Scout test example

Scout is built in Python and leverages the functionalities of Pytest. Before diving into the exact capabilities of Scout and its architecture, let’s have a quick look at how to use it!

Following is an example of a Scout test on the Rulesets API (the docs are available here):

from scout import requires, validate, Account, Zone

@validate(schema="rulesets", ignorePaths=["accounts/[^/]+/rules/lists"])
@requires(
    account=Account(
        entitlements={"rulesets.max_rules_per_ruleset": 2),
    zone=Zone(plan="ENT",
        entitlements={"rulesets.firewall_custom_phase_allowed": True},
        account_entitlements={"rulesets.max_rules_per_ruleset": 2 }))
class TestZone:
    def test_create_custom_ruleset(self, cfapi):
        response = cfapi.zone.request(
            "POST",
            "rulesets",
            payload=f"""{{
            "name": "My zone ruleset",
            "description": "My ruleset description",
            "phase": "http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq "fake.net""
                }}
            ]
        }}""")
        response.expect_json_success(
            200,
            result=f"""{{
            "name": "My zone ruleset",
            "version": "1",
            "source": "firewall_custom",
            "phase": "http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq "fake.net"",
                    "enabled": true,
                    ...
                }}
            ],
            ...
        }}""")

A Scout test is a succession of roundtrips of requests and responses against a given API. We use the functionalities of Pytest fixtures and marks to be able to target specific resources while validating the request and responses.  Pytest marks in Scout allow to provide an extra set of information to test suites. Pytest fixtures are contexts with information and methods which can be used across tests to enhance their capabilities. Hence the conjunction of marks with fixtures allow Scout to build the whole harness required to run a test suite against APIs.

Being able to exactly describe the resources against which a given test will run provides us confidence the live API behaves as expected under various conditions.

The cfapi fixture provides the capability to target different resources such as a Cloudflare account or a zone. In the test above, we use a Pytest mark @requires to describe the characteristics of the resources we want, e.g. we need here an account with a flag allowing us to have 2 rules for a ruleset. This will allow the test to only be run against accounts with such entitlements.

The @validate mark provides the capability to validate requests and responses to a given OpenAPI schema (here the rulesets OpenAPI schema). Any validation failure will be reported and flagged as a test failure.

Regarding the actual requests and responses, their payloads are described as f-strings, in particular the response f-string can be written as a “semi-json”:

 response.expect_json_success(
            200,
            result=f"""{{
            "name": "My zone ruleset",
            "version": "1",
            "source": "firewall_custom",
            "phase": "phase_http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq "fake.net"",
                    "enabled": true,
                    ...
                }}
            ],
            ...
        }}""")

Among many test assertions possible, Scout can assert the validity of a partial json response and it will log the information. We added the handling of ellipsis as an indication for Scout not to care about any further fields at a given json nesting level. Hence, we are able to do partial matching on JSON API responses, thus focusing only on what matters the most in each test.

Once a test suite run is complete, the results are pushed by the service and stored using Cloudflare Workers KV. They are displayed via a Cloudflare Worker.

Keeping the Cloudflare API 'all green' using Python-based testing

Scout is run in separate environments such as production-like and production environments. It is part of our deployment process to verify Scout is green in our production-like environment prior to deploying to production where Scout is also used for monitoring purposes.

How we built it

The core of Scout is written in Python and it is a combination of three components interacting together:

Keeping the Cloudflare API 'all green' using Python-based testing

  • The Scout plugin: a Pytest plugin to write tests easily
  • The Scout service: a scheduler service to run the test suites periodically
  • The Scout Worker: a collector and presenter of test reports

The Scout plugin

This is the core component of the Scout system. It allows us to write self explanatory tests while ensuring a high level of compliance against OpenAPI schemas and verifying the APIs’ behaviors.

Keeping the Cloudflare API 'all green' using Python-based testing

The Scout plugin architecture can be split into three components: setup, resource allocator, and runners. Setup is a conjunction of multiple sub components in charge of setting up the plugin.

The Registry contains all the information regarding a pool of accounts and zones we use for testing. As an example, entitlements are flags gating customers for using products features, the Registry provides the capability to describe entitlements per account and zone so that Scout can run a test against a specific setup.

As explained earlier, Scout can validate requests and responses against OpenAPI schemas. This is the responsibility of validators. A validator is built per OpenAPI schema and can be selected via the @validate mark we saw above.

@validate(schema="rulesets", ignorePaths=["accounts/[^/]+/rules/lists"])

As soon as a validator is selected, all the interaction of a given test with an API will be validated. If there is a validation failure, it will be marked as a test failure.

Last element of the setup, the config reader. It is the sub component in charge of providing all the URLs and authentication elements required for the Scout plugin to communicate with APIs.

Next in the chain, the resources allocator. This component is in charge of consuming the configuration and objects of the setup to build multiple runners. This is a factory which will make available the runners in the cfapi fixture.

response = cfapi.zone.request(method, path, payload)

When such a line of code is processed, it is the actual method request of the zone runner allocated for the test which is executed. Actually, the resources allocator is able to provide specialized runners (account, zone or default) which grant the possibility of targeting specific API endpoints for a given account or zone.

Runners are in charge of handling the execution of requests, managing the test expectations and using the validators for request/response schema validation.

Any failure on expectation or validation and any exceptions are recorded in the stash. The stash is shared across all runners. As such, when a test setup, run or cleanup is processed, the timeline of execution and potential retries are logged in the stash. The stash contents are later used for building the test suite reports.

Scout is able to run multiple test steps in parallel. Actually, each resource couple (Account Runner, Zone Runner) is associated with a Pytest-xdist worker which runs test steps independently. There can be as many workers as there are resource couples. An extra “default” runner is provided for reaching our different APIs and/or URLs with or without authentication.

Testing a test system was not the easiest part. We have been required to build a fake API and assert the Scout plugin would behave as it should in different situations. We reached and maintained a test coverage confidence which was considered good (close to 90%) for using the Scout plugin permanently.

The Scout service

The Scout service is meant to schedule test suites periodically. It is a configurable scheduler providing a reporting harness for the test suites as well as multiple metrics. It was a design decision to build a scheduler instead of using cron jobs.

We wanted to be aware of any scheduling issue as well as run issues. For this we used Prometheus metrics. The problem is that the Prometheus default configuration is to scrape metrics advertised by services. This scraping happens periodically and we were concerned about the eventuality of missing metrics if a cron job was to finish prior to the next Prometheus metrics scraping. As such we decided a small scheduler was better suited for overall observability of the test runs. Among the metrics the Scout service provides are network failures, general test failures, reporting failures, tests lagging and more.

Keeping the Cloudflare API 'all green' using Python-based testing

The Scout service runs threads on configured periods. Each thread is a test suite run as a separate Pytest with Scout plugin process followed by a reporting execution consuming the results and publishing them to the relevant parties.

The reporting component provided to each thread publishes the report to Workers KV and notifies us on chat in case there is a failure. Reporting takes also care of publishing the information relevant for building API testing coverage. In fact it is mandatory for us to have coverage of all the API endpoints and their possible methods so that we can achieve a wide and thorough visibility of our live APIs.

As a fallback, if there are any thread failure, test failure or reporting failure we are alerted based on the Prometheus metrics being updated across the service execution. The logs of the Scout service as well as the logs of each Pytest-Scout plugin execution provide the last resort information if no metrics are available and reporting is failing.

The service can be deployed with a minimal YAML configuration and be set up for different environments. We can for example decide to run different test suites based on the environment, publish or not to Cloudflare Workers, set different periods and retry mechanisms and so on.

We keep the tests as part of our code base alongside the configuration of the Scout service, and that’s about it, the Scout service is a separate entity.

The Scout Worker

It is a Cloudflare worker in charge of fetching the most recent Worker KVs and displaying them in an eye pleasing manner. The Scout service publishes a test report as JSON, thus the Scout worker parses the report and displays its content based on the status of the test suite run.

For example, we present below an authentication failure during a test which resulted in such a display in the worker:

Keeping the Cloudflare API 'all green' using Python-based testing

What does Scout let us do

Through leveraging the capabilities of Pytest and Cloudflare Workers, we have been able to build a configurable, robust and reliable system which allows us to easily write self explanatory tests for our APIs.

We can validate requests and responses against OpenAPI schemas and test behaviors over specific resources while getting alerted through multiple means if something goes wrong.

For specific use cases, we can write a test verifying the API behaves as it should, the configuration to be pushed at the edge is valid and a given zone will react as it should to security threats. Thus going beyond an end-to-end API test.

Scout quickly became our permanent live tester and monitor of APIs. We wrote tests for all endpoints to maintain a wide coverage of all our APIs. Scout has since been used for verifying an API version prior to its deployment to production. In fact, after a deployment in a production-like environment we can know in a couple of minutes if a new feature is good to go to production and assess if it is behaving correctly.

We hope you enjoyed this deep dive description into one of our systems!

Source:: CloudFlare