Overview
Solution Overview
The S3 Protector provides an automated solution for protecting files on Amazon S3 at-scale. The product integrates with the Amazon S3 Event Notification feature to trigger data protection when new files are created or modified. The notifications are consumed asynchronously by S3 Protector Lambda Function which breaks up files into batches, transmitting them securely to Cloud API Protector on AWS. The Cloud API Protector applies Protegrity cryptographic methods and sends back processed data. Processed file is saved to an output S3 bucket after all batches of the input file are processed. The serverless nature of the S3 protector solution allows scaling up automatically to accommodate for increasing volume of files and scale completely down during idle time, providing significant savings in Cloud compute fees.
The solution requires installation of Protegrity Cloud Protector, Cloud API. The Cloud API Protector provides an endpoint for performing Protegrity operations within Amazon AWS for integration with Cloud-based ETL workflows.
Protected files can be used as source for a data lake or downstream database ingestion. For example:
- Snowflake Snowpipe can be used to automatically ingest protected files as they are written by the S3 Protector.
- Amazon Redshift provides a mechanism for bulk loading data from Amazon S3 using the COPY INTO command.
Similar to other Protegrity products, the S3 Protector utilizes data security policy maintained on Enterprise Security Appliance (ESA). The existing ESA policy user must be supplied as part of the S3 Protector configuration. The user acts as a service account user for the S3 Protector deployment. For more information about policy user configuration refer to Enterprise Security Administrator Guide.
Analytics on Protected Data
Protegrity’s format and length preserving tokenization scheme make it possible to perform analytics directly on protected data. Tokens are join-preserving so protected data can be joined across datasets. Often statistical analytics and machine learning training can be performed without the need to re-identify protected data. However, a user or service account with authorized security policy privileges may re-identify subsets of data using the Cloud Storage Protector - Amazon S3 service.
Features
Protegrity S3 Protector provides the following features:
Fine-grained field-level protection for structured data with the following formats supported:
File Format Suffix CSV .csv JSON .json Parquet .parquet Excel .xlsx Role-based access control (RBAC) to protect and unprotect (re-identify) data depending on user privileges.
Policy enforcement features of other Protegrity application protectors.
For more information about the available protection options, such as data types, Tokenization or Encryption types, or length-preserving and non-preserving tokens, refer to Protection Methods Reference.
Feedback
Was this page helpful?