Zaraz use Workers to make third-party tools secure and fast

Zaraz use Workers to make third-party tools secure and fast

We decided to create Zaraz around the end of March 2020. We were working on another product when we noticed everyone was asking us about the performance impact of having many third-parties on their website. Third-party content is an important part of the majority of websites today, powering analytics, chatbots, conversion pixels, widgets — you name it. The definition of third-party is an asset, often JavaScript, hosted outside the primary site-user relationship, that is not under the direct control of the site owner but is present with ‘approval’. Yair wrote in detail about the process of measuring the impact of these third-party tools, and how we pivoted our startup, but I wanted to write about how we built Zaraz and what it actually does behind the scenes.

Third parties are great in that they let you integrate already-made solutions with your website, and you barely need to do any coding. Analytics? Just drop this code snippet. Chat widget? Just add this one. Third-party vendors will usually instruct you on how to add their tool, and from that point on things should just be working. Right? But when you add third-party code, it usually fetches even more code from remote sources, meaning you have less and less control over whatever is happening in your visitors’ browsers. How can you guarantee that none of the multitude of third parties you have on your website wasn’t hacked, and started stealing information, mining cryptocurrencies or logging key presses on your visitors’ computers?

It doesn’t even have to be a deliberate hack. As we investigated more and more third-party tools, we noticed a pattern — sometimes it’s easier for a third-party vendor to collect everything, rather than being selective or careful about it. More often than not, user emails would find their way into a third-party tool, which could very easily put the website owner in trouble due to GDPR, CCPA, or similar.

How third-party tools work today

Usually, when you add a third party to your page, you’re asked to add a piece of JavaScript code to the of your HTML. Google Analytics is by far the most popular third-party, so let’s see how it’s done there:

<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXX-Y', 'auto');
ga('send', 'pageview');
</script>
<!-- End Google Analytics -->

In this case, and in most other cases, the snippet that you’re pasting actually calls more JavaScript code to be executed. The snippet above creates a new element, gives it the https://www.google-analytics.com/analytics.js src attribute, and appends it to the DOM. The browser then loads the analytics.js script, which includes more JavaScript code than the snippet itself, and sometimes asks the browser to download even more scripts, some of them bigger than analytics.js itself. So far, however, no analytics data has been captured at all, although this is why you’ve added Google Analytics in the first place.

The last line in the snippet, ga('send', 'pageview');, uses a function defined in the analytics.js file to finally send the pageview. The function is needed because it is what is capturing the analytics data — it fetches the kind of browser, the screen resolution, the language, etc…  Then, it constructs a URL that includes all the data, and  sends a request to this URL. It’s only after this step that the analytics information gets captured. Every user behavior event you record using Google Analytics will result in another request.

The reality is that the vast majority of tools consist of more than one resource file, and that it’s practically impossible to know in advance what a tool is going to load without testing it on your website. You can use Request Map Generator to get a visual representation of all the resources loaded on your website, including how they call each other. Below is a Request Map of a demo e-commerce website we created:

That big blue circle is our website’s resources, and all other circles are third-party tools. You can see how the big green circle is actually a sub-request of the main Facebook pixel (fbevents.js), and how many tools, like LinkedIn on top right, are creating a redirect chain in order to sync some data, on the expense of forcing the browser to make more and more network requests.

A new place to run a tag manager — the edge

Since we want to make third-parties faster, more secure, and private, we had to develop a fundamental new way of thinking about them and a new system for how they run. We came up with a plan: build a platform where third-parties can run code outside the browser, while still getting access to the information they need and being able to talk with the DOM when necessary. We don’t believe third parties are evil: they never intended to slow down the Internet for everyone, they just didn’t have another option. Being able to run code on the edge and run it fast opened up new possibilities and changed all that, but the transition is hard.

By moving third-party code to run outside the browser, we get multiple wins.

  • The website will load faster and be more interactive. The browser rendering your website can now focus on the most important thing — your website. The downloading, parsing and execution of all the third-party scripts will no longer compete or even block the rendering and interactivity of your website.
  • Control over the data sent to third-parties. Third-party tools often automatically collect information from the page and from the browser to, for example, measure site behaviour/usage. In many cases, this information should stay private. For example, most tools collect the document.location, but we often see a “reset password” page including the user email in the URL, meaning emails are unknowingly being sent and saved by third-party providers, usually without consent. Moving the execution of the third parties to the edge means we have full visibility into what is being sent. This means we can provide alerts and filters in case tools are trying to collect Personally Identifiable Information or mask the private parts of the data before they reach third-party servers. This feature is currently not available on the public beta, but contact us if you want to start using it today.
  • By reducing the amount of code being executed in the browser and by scanning all code that is executed in it, we can continuously verify that the code hasn’t been tampered with and that it only does what it is intended to do. We are working to connect Zaraz with Cloudflare Page Shield to do this automatically.

When you configure a third-party tool through a normal tag manager, a lot happens in the browsers of your visitors which is out of your control. The tag manager will load and then evaluate all trigger rules to decide which tools to load. It would then usually append the script tags of those tools to the DOM of the page, making the browser fetch the scripts and execute them. These scripts come from untrusted or unknown origins, increasing the risk of malicious code execution in the browser. They can also block the browser from becoming interactive until they are completely executed. They are generally free to do whatever they want in the browser, but most commonly they would then collect some information and send it to some endpoint on the third-party server. With Zaraz, the browser essentially does none of that.

Choosing Cloudflare Workers

When we set about coding Zaraz, we quickly understood that our infrastructure decisions would have a massive impact on our service. In fact, choosing the wrong one could mean we have no service at all. The most common alternative to Zaraz is traditional Tag Management software. They generally have no server-side component: whenever a user “publishes” a configuration, a JavaScript file is rendered and hosted as a static asset on a CDN. With Zaraz the idea is to move most of the evaluation of code out of the browser, and respond with a dynamically generated JavaScript code each time. We needed to find a solution that would allow us to have a server-side component, but would be as fast as a CDN. Otherwise, there was a risk we might end up slowing down websites instead of making them faster.

We needed Zaraz to be served from a place close to the visiting user. Since setting up servers all around the world seemed like too big of a task for a very young startup, we looked at a few distributed serverless platforms. We approached this search with a small list of requirements:

  • Run JavaScript: Third-party tools all use JavaScript. If we were to port them to run in a cloud environment, the easiest way to do so would be to be able to use JavaScript as well.
  • Secure: We are processing sensitive data. We can’t afford the risk of someone hacking into our EC2 instance. We wanted to make sure that data doesn’t stay on some server after we sent our HTTP response.
  • Fully programmable: Some CDNs allow setting complicated rules for handling a request, but altering HTTP headers, setting redirects or HTTP response codes isn’t enough. We need to generate JavaScript code on the fly, meaning we need full control over the responses. We also need to use some external JavaScript libraries.
  • Extremely fast and globally distributed: In the very early stages of the company, we already had customers in the USA, Europe, India, and Israel. As we were preparing to show them a Proof of Concept, we needed to be sure it would be fast wherever they are. We were competing with tag managers and Customer Data Platforms that have a pretty fast response time, so we need to be able to respond as fast as if our content was statically hosted on a CDN, or faster.

Initially we thought we would need to create Docker containers that would run around the globe and would use their own HTTP server, but then a friend from our Y Combinator batch said we should check out Cloudflare Workers.

At first, we thought it wouldn’t work — Workers doesn’t work like a Node.js application, and we felt that limitation would prevent us from building what we wanted. We planned to let Workers handle the requests coming from users’ browsers, and then use an AWS Lambda for the heavy lifting of actually processing data and sending it to third-party vendors.

Our first attempt with Workers was very simple: just confirming we could use it to actually return dynamic browser-side JavaScript that is generated on-the-fly:

addEventListener('fetch', (event) => {
 event.respondWith(handleRequest(event.request))
})
 
async function handleRequest(request) {
   let code = '(function() {'
  
   if (request.headers.get('user-agent').includes('Firefox')) {
     code += `console.log('Hello Firefox!');`
   } else {
     code += `console.log('Hey other browsers...');`
   }
  
   code += '})();'
  
   return new Response(code, {
     headers: { 'content-type': 'text/javascript' }
   });
}

It was a tiny example, but I remember calling Yair afterwards and saying “this could actually work!”. It proved the flexibility of Workers. We just created an endpoint that served a JavaScript file, this JavaScript file was dynamically generated, and the response time was less than 10ms. We could now put in our HTML and treat this Worker like a normal JavaScript file.

As we took a deeper look, we found Workers answering demand after demand from our list, and learned we could even do the most complicated things inside Workers. The Lambda function started doing less and less, and was eventually removed. Our little Node.js proof-of-concept was easily converted to Workers.

Using the Cloudflare Workers platform: “standing on the shoulders of giants”

When we raised our seed round we heard many questions like “if this can work, how come it wasn’t built before?” We often said that while the problem has been a long standing one, accessible edge computing is a new possibility. Later, on our first investors update after creating the prototype, we told them about the unbelievably fast response time we managed to achieve and got much praise for it — talk about “standing on the shoulders of giants”. Workers simply checked all our boxes. Running JavaScript and using the same V8 engine as the browser meant that we could keep the same environment when porting tools to run on the cloud (it also helped with hiring). It also opened the possibility of later on using WebAssembly for certain tasks. The fact that Workers are serverless and stateless by default was a selling point for our own trustworthiness: we told customers we couldn’t save their personal data even by mistake, which was true. The integration between webpack and Wrangler meant that we could write a full-blown application — with modules and external dependencies — to shift 100% of our logic into our Worker. And the performance helped us ace all our demos.

As we were building Zaraz, the Workers platform got more advanced. We ended up using Workers KV for storing user configuration, and Durable Objects for communicating between Workers. Our main Worker holds server-side implementations of more than 50 popular third-party tools, replacing hundreds of thousands of JavaScript lines of code that traditionally run inside browsers. It’s an ever growing list, and we recently also published an SDK that allows third-party vendors to build support for their tools by themselves. For the first time, they can do it in a secure, private, and fast environment.

A new way to build third-parties

Most third-party tools do two fundamental things: First, they collect some information from the browser such as screen resolution, current URL, page title or cookie content. Second, they send it to their server. It is often simple, but when a website has tens of these tools, and each of them query for the information it needs and then sends its requests, it can cause a real slowdown. On Zaraz, this looks very different: Every tool provides a run function, and when Zaraz evaluates the user request and decides to load a tool, it executes this run function. This is how we built integrations for over 50 different tools, all from different categories, and this is how we’re inviting third-party vendors to write their own integrations into Zaraz.

run({system, utils}) { 
  // The `system` object includes information about the current page, browser, and more 
  const { device, page, cookies } = system
  // The `utils` are a set of functions we found useful across multiple tools
  const { getCookieString, waitUntil } = utils

  // Get the existing cookie content, or create a new UUID instead
  const cookieName = 'visitor-identifier'
  const sessionCookie = cookies[cookieName] || crypto.randomUUID()

  // Build the payload
  const payload = {
    session: sessionCookie,
    ip: device.ip,
    resolution: device.resolution,
    ua: device.userAgent,
    url: page.url.href,
    title: page.title,
  }

  // Construct the URL
  const baseURL = 'https://example.com/collect?'
  const params = new URLSearchParams(payload)
  const finalURL = baseURL + params

  // Send a request to the third-party server from the edge
  waitUntil(fetch(finalURL))
  
  // Save or update the cookie in the browser
  return getCookieString(cookieName, sessionCookie)
}

The above code runs in our Cloudflare Worker, instead of the browser. Previously, having 10x more tools meant 10x more requests browsers rendering your website needed to make, and 10x more JavaScript code they needed to evaluate. This code would often be repetitive, for example, almost every tool implements their own “get cookie” function. It’s also 10x more origins you have to trust no one is tampering with. When running tools on the edge, this doesn’t affect the browser at all: you can add as many tools as you want, but they wouldn’t be loading in the browser, so they will have no effect.

In this example, we first check for the existence of a cookie that identifies the session, called “visitor-identifier”. If it exists, we read its value; if not, we generate a new UUID for it. Note that the power of Workers is all accessible here: we use crypto.randomUUID() just like we can use any other Workers functionality. We then collect all the information our example tool needs — user agent, current URL, page title, screen resolution, client IP address — and the content of the “visitor-identifier” cookie. We construct the final URL that the Worker needs to send a request to, and we then use waitUntil to make sure the request gets there. Zaraz’s version of fetch gives our tools automatic logging, data loss prevention and retries capabilities.

Lastly, we return the value of the getCookieString function. Whatever string is returned by the run function is passed to the visitor as browser-side JavaScript. In this case, getCookieString returns something like document.cookie = 'visitor-identifier=5006e6fa-7ce6-45ef-8724-c846f1953369; Path=/; Max-age=31536000';, causing the browser to create a first-party cookie. The next time a user loads a page, the visitor-identifier cookie should exist, causing Zaraz to reuse the UUID instead of creating a new one.

This system of run functions allows us to separate and isolate each tool to run independently of the rest of the system, while still providing it with all the required context and data coming from the browser, and the capabilities of Workers. We are inviting third-party vendors to work with us to build the future of secure, private and fast third-party tools.

A new events system

Many third-party tools need to collect behavioral information during a user visit. For example, you might want to place a conversation pixel right after a user clicked “submit” on the credit card form. Since we moved tools to the cloud, you can’t access their libraries from the browser context anymore. For that we created zaraz.track() — a method that allows you to call tools programmatically, and optionally provide them with more information:

document.getElementById("credit-card-form").addEventListener("submit", () => {
  zaraz.track("card-submission", {
    value: document.getElementById("total").innerHTML,
    transaction: "X-98765",
  });
});

In this example, we’re letting Zaraz know about a trigger called “card-submission”, and we associate some data with it — the value of the transaction that we’re taking from an element with the ID total, and a transaction code that is hardcoded and gets printed directly from our backend.

In the Zaraz interface, configured tools can be subscribed to different and multiple triggers. When the code above gets triggered, Zaraz checks, on the edge, what tools are subscribed to the card-submission trigger, and it then calls them with the right additional data supplied, populating their requests with the transaction code and its value.

This is different from how traditional tag managers work: GTM’s dataLayer.push serves a similar purpose, but is evaluated client-side. The result is that GTM itself, when used intensively, will grow its script so much that it can become the heaviest tool a website loads. Each event sent using dataLayer.push will cause repeated evaluation of code in the browser, and each tool that will match the evaluation will execute code in the browser, and might call more external assets again. As these events are usually coupled with user interactions, this often makes interacting with a website feel slow, because running the tools is occupying the main thread. With Zaraz, these tools exist and are evaluated only at the edge, improving the website’s speed and security.

You don’t have to be coder to use triggers. The Zaraz dashboard allows you to choose from a predefined set of templates like click listeners, scroll events and more, that you can attach to any element on your website without touching your code. When you combine zaraz.track() with the ability to program your own tools, what you get is essentially a one-liner integration of Workers into your website. You can write any backend code you want and Zaraz will take care of calling it exactly at the right time with the right parameters.

Joining Cloudflare

When new customers started using Zaraz, we noticed a pattern: the best teams we worked with chose Cloudflare, and some were also moving parts of their backend infrastructure to Workers. We figured we could further improve performance and integration for companies using Cloudflare as well. We could inline parts of the code inside the page and then further reduce the amount of network requests. Integration also allowed us to remove the time it takes to DNS resolve our script, because we could use Workers to proxy Zaraz into our customers’ domains. Integrating with Cloudflare made our offering even more compelling.

Back when we were doing Y Combinator in Winter 2020 and realized how much third parties could affect a websites’ performance, we saw a grand mission ahead of us: creating a faster, private, and secure web by reducing the amount of third-party bloat. This mission remained the same to this day. As our conversations with Cloudflare got deeper, we were excited to realize that we’re talking with people who share the same vision. We are thrilled for the opportunity to scale our solutions to millions of websites on the Internet, making them faster and safer and even reducing carbon emissions.

If you would like to explore the free beta version, please click here. If you are an enterprise and have additional/custom requirements, please click here to join the waitlist. To join our Discord channel, click here.

Source:: CloudFlare