This document will provide a summary of the resources and steps necessary to pass data to SnowFlake via S3. S3 is the preferred method NinjaCat uses to load data directly to NinjaCat’s owned SnowFlake.
This process will require Client-side administrators and NinjaCat-side administrators to access and create the required permissions/role details.
Expect a back and forth email or chat exchange to accomplish this connection.
- SnowFlake - SnowFlake is a secure cloud data storage solution that eliminates silos and allows seamless data sharing on a scalable elastic architecture. (For NinjaCat this is our backend database)
- SnowPipe - SnowPipe is the automated process of consuming files from a staged location without manual intervention. As soon as files are available in a stage files are queued and consumed.
- Stages - A stage specifies where data files are stored (i.e. “staged”) so that the data in the files can be loaded into a table. Stages allow the bulk loading of data into tables. Stages can be internal (snowflake owned for loading local files) or External (Amazon S3, Google Cloud Storage, or Microsoft Azure). NinjaCat uses S3 for data Staging.
S3 SnowPipe Setup Overview
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. (This is NinjaCat's preferred Stage)
If you already have an Amazon Web Services (AWS) account and use S3 buckets for storing and managing your data files, you can make use of your existing buckets and folder paths for bulk loading into Snowflake.
Every file that is placed into the folder is copied into a table in SnowFlake through an import command that copies the contents of the file into the same table. SnowFlake automatically uses metadata to prevent any files from being loaded multiple times.
NinjaNote: The highlighted section titles are links to the in-depth documentation on each process step presented by Snowflake
This topic describes how an AWS administrator in your organization can explicitly grant Snowflake access to your AWS S3 storage account (i.e. your buckets and the objects in those buckets). The process involves allowing the Amazon Virtual Private Cloud (Amazon VPC) IDs for your Snowflake account.
NinjaCat will Provide the IDs of the AWS Virtual Network (VNet) in which NinjaCat’s Snowflake account is located.
Client will Allow the VPC IDs by creating an Amazon S3 policy for a specific VPC.
To read data from and write to an S3 bucket, the security and access management policies on the bucket must allow Snowflake to access the bucket.
Client will Configure the following permissions on an S3 bucket and folder to be able to access files in the folder (and sub-folders):
Client will Configure an AWS IAM role with the required policies and permissions to access your external S3 bucket. This approach allows individual users to avoid providing and managing security credentials and access keys.
Client will provide this IAM user to NinjaCat
NinjaCat will Create a storage integration using the CREATE STORAGE INTEGRATION command. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) user for your S3 cloud storage, along with an optional set of allowed or blocked storage locations (i.e. buckets). Cloud provider administrators in your organization grant permissions on the storage locations to the generated user. This option allows users to avoid supplying credentials when creating stages or loading data.
NinjaCat will Execute the DESCRIBE INTEGRATION command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account.
NinjaCat will record the:
NinjaCat will provide these values to the client
Client will configure the IAM user trust relationship settings and Modify the policy document with the DESC STORAGE INTEGRATION output values provided by NinjaCat in step 4.
NinjaCat will Create an external (i.e. S3) stage that references the storage integration created in Step 3.
Clients will need to provide a relationship map of the files so that NinjaCat can properly store the data in the proper tables and properly query/join the data. NinjaCat will consume all new files placed in the S3 bucket and either append or overwrite existing table data. Client, please decide the preferred method of delivering updated data (i.e. upcerts).
Introduction to loading data into SnowPipe
Bulk Loading from S3
Amazon VPC IDs