Use Snowflake with R2 to extend your global data lake

Use Snowflake with R2 to extend your global data lake

Use Snowflake with R2 to extend your global data lake

R2 is the ideal object storage platform to build data lakes. It’s infinitely scalable, highly durable (eleven 9’s of annual durability), and has no egress fees. Zero egress fees mean zero vendor lock-in. You are free to use the tools you want to get the maximum value from your data.

Today we’re excited to announce our partnership with Snowflake so that you can use Snowflake to query data stored in your R2 data lake and load data from R2 into Snowflake. Organizations use Snowflake’s Data Cloud to unite siloed data, discover, and securely share data, and execute diverse analytic workloads across multiple clouds.

One challenge of loading data into Snowflake database tables and querying external data lakes is the cost of data transfer. If your data is coming from a different cloud or even different region within the same cloud, this typically means you are paying an additional tax for each byte going into Snowflake. Pairing R2 and Snowflake lets you focus on getting valuable insights from your data, without having to worry about egress fees piling up.

Getting started

Sign up for R2 and create an API token

If you haven’t already, you’ll need to sign up for R2 and create a bucket. You’ll also need to create R2 security credentials for Snowflake following the steps below.

Generate an R2 token

  • In the Cloudflare dashboard, select R2.
  • Select Manage R2 API Tokens on the right side of the dashboard.
  • Select Create API token.
  • Use Snowflake with R2 to extend your global data lake

  • Optionally select the pencil icon or R2 Token text to edit your API token name.
  • Use Snowflake with R2 to extend your global data lake

  • Under Permissions, select Edit.
  • Select Create API Token.
  • Use Snowflake with R2 to extend your global data lake

    You’ll need the Secret Access Key and Access Key ID to create an external stage in Snowflake.

    Creating external stages in Snowflake

    In Snowflake, stages refer to the location of data files in object storage. To create an external stage, you’ll need your bucket name and R2 credentials. Find your Cloudflare account ID in the dashboard.

    CREATE STAGE my_r2_stage
      URL = 's3compat://my_bucket/files/'
      ENDPOINT = 'cloudflare_account_id.r2.cloudflarestorage.com'
      CREDENTIALS = (AWS_KEY_ID = '1a2b3c...' AWS_SECRET_KEY = '4x5y6z...')
    

    Note: You may need to contact your Snowflake account team to enable S3-compatible endpoints in Snowflake. Get more information here.

    Loading data into Snowflake

    To load data from your R2 data lake into Snowflake, use the COPY INTO command.

    COPY INTO t1
      FROM @my_r2_stage/load/;
    

    You can flip the table and external stage parameters in the example above to unload data from Snowflake into R2.

    Querying data in R2 with Snowflake

    You’ll first need to create an external table in Snowflake. Once you’ve done that you’ll be able to query your data stored in R2.

    SELECT * FROM external_table;
    

    For more information on how to use R2 and Snowflake together, refer to documentation here.

    “Data is becoming increasingly the center of every application, process, and business metrics, and is the cornerstone of digital transformation. Working with partners like Cloudflare, we are unlocking value for joint customers around the world by helping save costs and helping maximize customers data investments,” – James Malone, Director of Product Management at SnowflakeUse Snowflake with R2 to extend your global data lake

    Have any feedback?

    We want to hear from you! If you have any feedback on the integration between Cloudflare R2 and Snowflake, please let us know here.

    Be sure to check our Discord server to discuss everything R2!

    Source:: CloudFlare