Overview
Solution Overview
Snowflake Protector on AWS is a cloud native, serverless product for fine-grained data protection with Snowflake™, a managed Cloud data warehouse. This enables invocation of the Protegrity data protection cryptographic methods from the Snowflake SQL execution context. The benefits of serverless include rapid auto-scaling, performance, low administrative overhead, and reduced infrastructure costs compared to a server-based solution.
This product provides data protection services invoked by External User Defined Functions (UDFs) within Snowflake. The UDFs act as a client transmitting micro-batches of data to the serverless Protegrity Lambda function. User queries may generate hundreds or thousands of parallel requests to perform security operations. Protegrity’s serverless function is designed to scale and yield reliable query performance under such load.
Snowflake Protector on AWS utilizes a data security policy maintained by an Enterprise Security Administrator (ESA), similar to other Protegrity products. Using regular SQL database queries or tools, such as, Tableau™, authorized users can perform de-identification (protect) and re-identification (unprotect) operations on data within the managed Cloud data warehouse. A user’s individual capabilities are subject to privileges and policies defined by the Enterprise Security Administrator.
The following data ingestion patterns are available with your managed Cloud data warehouse:
- Data protection at source applications: In this case, sensitive data is already de-identified (protected) across the enterprise wherever it resides, including the managed data warehouse. Protected data can be ingested directly into your managed Cloud data warehouse. Depending on usage patterns, this ensures that your managed data warehouse is not brought into scope for PCI, PII, GDPR, HIPPA, and other compliance policies.
- Data protection using the Extract-Transform-Load (ETL) pattern: In this case, sensitive data may be transformed with a Protegrity protector either on-premise or in the Cloud before it is ingested into Snowflake.
- Data protection using the Extract-Load-Transform (ELT) pattern: In this case, sensitive data is protected after it lands into the target system typically through a temporary landing table. It uses the native data warehouse’s compute engine with Protegrity to protect incoming data at very high throughput rates. After the data is protected, the intermediate loading tables are dropped as part of the ingestion workflow.
Analytics on Protected Data
Protegrity’s format and length preserving tokenization scheme make it possible to perform analytics directly on protected data. Tokens are join-preserving so protected data can be joined across datasets. Often statistical analytics and machine learning training can be performed without the need to re-identify protected data. However, a user or service account with authorized security policy privileges may re-identify subsets of data using the Snowflake Protector on AWS service.
Features
Snowflake Protector on AWS incorporates Protegrity’s patent-pending vaultless tokenization capabilities into cloud-native serverless technology. Combined with an ESA security policy, the protector provides the following features:
- Role-based access control (RBAC) to protect and unprotect (re-identify) data depending on the user privileges.
- Policy enforcement features of other Protegrity application protectors.
For more information about the available protection options, such as, data types, Tokenization or Encryption types, or length preserving and non-preserving tokens, refer to Protection Methods Reference.
Deployment Architecture
The Protegrity product should be deployed in the customer’s Cloud account within the same AWS region as the Snowflake cluster. The product incorporates Protegrity’s vaultless tokenization engine within an AWS Lambda Function. The encrypted data security policy from an ESA is deployed periodically as a static resource through an AWS Lambda Layer. The policy is decrypted in memory at runtime within the Lambda. This architecture allows Protegrity to be highly available and scale very quickly without direct dependency on any other Protegrity services.
The product exposes a remote data protection service invoked from external User Defined Functions (UDFs), a native feature of specific Cloud databases. The UDFs can be invoked through direct SQL queries or database views. The external UDF makes parallel API calls to the serverless Protegrity Lambda function via the AWS API Gateway to perform protect and unprotect data operations. Each network REST request includes a micro-batch of data to process and a secure context header generated by the database with the username and a Protegrity context header with the data element type, and operation to perform. The product applies the ESA security policy including user authorization and returns a corresponding response. Security operations on sensitive data performed by protector can be audited. The product can be configured to send audit logs to ESA for reporting and alerting purposes via optional component called Log Forwarder explained in details in the next section of this guide.
The security policy is synchronized through another serverless component called the Protegrity Policy Agent. The agent operates on a configurable schedule, fetches the policy from the ESA, performs additional envelope encryption using Amazon KMS, and deploys the policy into the Lambda Layer used by the serverless product. This solution can be configured to automatically provision the static policy or the final step can be performed on-demand by an administrator. The policy takes effect immediately for all new requests. There is no downtime for users during this process.
The following diagram shows the high-level architecture described above.

The following diagram shows a reference architecture for synchronizing the security policy from ESA.

The Protegrity Policy Agent requires network access to an Enterprise Security Administrator (ESA). Most organizations install the ESA on-premise. Therefore, it is recommended that the Policy Agent is installed into a private subnet with a Cloud VPC using a NAT Gateway to enable this communication through a corporate firewall.
The ESA is a soft appliance that must be pre-installed on a separate server. It is used to create and manage security policies.
For more information about installing the ESA, and creating and managing policies, refer the Policy Management Guide.
Log Forwarding Architecture
Audit logs are by default sent to CloudWatch as long as the function’s execution role has the necessary permissions. The Protegrity Product can also be configured to send audit logs to ESA. Such configuration requires deploying Log Forwarder component which is available as part of Protegrity Product deployment bundle. The diagram below shows additional resources deployed with Log Forwarder component.

The log forwarder component includes Amazon Kinesis data stream and the forwarder Lambda function. Amazon Kinesis stream is used to batch audit records before sending them to forwarder function, where similar audit logs are aggregated before sending to ESA. Aggregation rules are described in the Protegrity Log Forwarding guide. When the protector function is configured to send audit logs to log forwarder, audit logs are aggregated on the protector function before sending to Amazon Kinesis. Due to specifics of the Lambda runtime lifecycle, audit logs may take up to 15 minutes before being sent to Amazon Kinesis. Protector function exposes configuration to minimize this time which is described in the protector function installation section.
The security of audit logs is ensured by using HTTPS connection on each link of the communication between protector function and ESA. Integrity and authenticity of audit logs is additionally checked on log forwarder which verifies individual logs signature. The signature verification is done upon arrival from Amazon Kinesis before applying aggregation. If signature cannot be verified, the log is forwarded as is to ESA where additional signature verification can be configured. Log forwarder function uses client certificate to authenticate calls to ESA.
To learn more about individual audit log entry structure and purpose of audit logs, refer to Audit Logging section in this document. Installation instructions can be found in Audit Log Forwarder installation.
The audit log forwarding requires network access from the cloud to the ESA. Most organizations install the ESA on-premise. Therefore, it is recommended that the Log Forwarder Function is installed into a private subnet with a Cloud VPC using a NAT Gateway to enable this communication through a corporate firewall.
Snowflake Connectivity
Request authorization between Snowflake’s AWS VPC to the customer’s AWS API-Gateway occurs using a Snowflake integration enabled by an AWS cross-account IAM role created within the customer’s AWS account. The API Gateway authorizes each request, limiting access to the Snowflake AWS IAM user and external ID established through a Snowflake API Integration object. The following figure illustrates the Protegrity Snowflake Integration architecture.

Snowflake’s API Integration Object
The Snowflake API Integration Object provides a connection between your Snowflake VPC and your AWS account where the Protegrity product is hosted. Creating this connection requires setting up the API Gateway and the IAM shared account role. These steps are provided in the installation.

The following figure shows how trust is established between Snowflake and the product.

Feedback
Was this page helpful?