This document will provide a summary of the resources and steps necessary to pass data to SnowFlake via S3. S3 is the preferred method NinjaCat uses to load data directly to NinjaCat’s owned SnowFlake.
This process will require Client-side administrators and NinjaCat-side administrators to access and create the required permissions/role details.
Expect a back and forth email or chat exchange to accomplish this connection.
Section Links
Preliminary - Allowing the Virtual Private Cloud IDs
Step 1: Configure Access Permissions for the S3 Bucket
Step 2: Create the IAM Role in AWS
Step 3: Create a Cloud Storage Integration in Snowflake
Step 4: Retrieve the AWS IAM User for your Snowflake Account
Step 5: Grant the IAM User Permissions to Access Bucket Objects
Step 6: Create an External Stage
Definitions
- SnowFlake - SnowFlake is a secure cloud data storage solution that eliminates silos and allows seamless data sharing on a scalable elastic architecture. (For NinjaCat this is our backend database)
- SnowPipe - SnowPipe is the automated process of consuming files from a staged location without manual intervention. As soon as files are available in a stage files are queued and consumed.
- Stages - A stage specifies where data files are stored (i.e. “staged”) so that the data in the files can be loaded into a table. Stages allow the bulk loading of data into tables. Stages can be internal (snowflake owned for loading local files) or External (Amazon S3, Google Cloud Storage, or Microsoft Azure). NinjaCat uses S3 for data Staging.
S3 SnowPipe Setup Overview
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. (This is NinjaCat's preferred Stage)
If you already have an Amazon Web Services (AWS) account and use S3 buckets for storing and managing your data files, you can make use of your existing buckets and folder paths for bulk loading into Snowflake.
Every file that is placed into the folder is copied into a table in SnowFlake through an import command that copies the contents of the file into the same table. SnowFlake automatically uses metadata to prevent any files from being loaded multiple times.
NinjaNote: The highlighted section titles are links to the in-depth documentation on each process step presented by Snowflake
Preliminary - Allowing the Virtual Private Cloud IDs
This topic describes how an AWS administrator in your organization can explicitly grant Snowflake access to your AWS S3 storage account (i.e. your buckets and the objects in those buckets). The process involves allowing the Amazon Virtual Private Cloud (Amazon VPC) IDs for your Snowflake account.
NinjaCat will Provide the IDs of the AWS Virtual Network (VNet) in which NinjaCat’s Snowflake account is located.
Client will Allow the VPC IDs by creating an Amazon S3 policy for a specific VPC.
Step 1: Configure Access Permissions for the S3 Bucket
To read data from and write to an S3 bucket, the security and access management policies on the bucket must allow Snowflake to access the bucket.
Client will Configure the following permissions on an S3 bucket and folder to be able to access files in the folder (and sub-folders):
s3:GetBucketLocation
s3:GetObject
s3:GetObjectVersion
s3:ListBucket
Step 2: Create the IAM Role in AWS
Client will Configure an AWS IAM role with the required policies and permissions to access your external S3 bucket. This approach allows individual users to avoid providing and managing security credentials and access keys.
Client will provide this IAM user to NinjaCat
Step 3: Create a Cloud Storage Integration in Snowflake
NinjaCat will Create a storage integration using the CREATE STORAGE INTEGRATION command. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) user for your S3 cloud storage, along with an optional set of allowed or blocked storage locations (i.e. buckets). Cloud provider administrators in your organization grant permissions on the storage locations to the generated user. This option allows users to avoid supplying credentials when creating stages or loading data.
Step 4: Retrieve the AWS IAM User for your Snowflake Account
NinjaCat will Execute the DESCRIBE INTEGRATION command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account.
NinjaCat will record the:
STORAGE_AWS_IAM_USER_ARN
STORAGE_AWS_EXTERNAL_ID
NinjaCat will provide these values to the client
Step 5: Grant the IAM User Permissions to Access Bucket Objects
Client will configure the IAM user trust relationship settings and Modify the policy document with the DESC STORAGE INTEGRATION output values provided by NinjaCat in step 4.
Step 6: Create an External Stage
NinjaCat will Create an external (i.e. S3) stage that references the storage integration created in Step 3.
Loading Data
Clients will need to provide a relationship map of the files so that NinjaCat can properly store the data in the proper tables and properly query/join the data. NinjaCat will consume all new files placed in the S3 bucket and either append or overwrite existing table data. Client, please decide the preferred method of delivering updated data (i.e. upcerts).
Supporting Documents
Introduction to loading data into SnowPipe
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro.html
Bulk Loading from S3
https://docs.snowflake.com/en/user-guide/data-load-s3.html
Amazon VPC IDs
Comments
0 comments
Article is closed for comments.