This is the multi-page printable view of this section. Click here to print.
Installation
- 1: Prerequisites
- 2: Pre-Configuration
- 3: S3 Protector Service Installation
- 4:
- 5:
- 6:
1 - Prerequisites
AWS Services
The following table describes the AWS services that may be a part of your Protegrity installation.
| Service | Description |
|---|---|
| Lambda | Provides serverless compute for S3 Protector. |
| S3 | Input and Output data to be processed with S3 Protector. |
| CloudWatch | Application and audit logs, performance monitoring, and alerts. |
Prerequisites
| Requirement | Detail |
|---|---|
| S3 Protector distribution and installation scripts | These artifacts are provided by Protegrity |
| Protegrity Cloud Protect API | This product is required. |
| AWS Account | Recommend using the same AWS account as the Protegrity Cloud API deployment. |
Required Skills and Abilities
| Role / Skillset | Description |
|---|---|
| AWS Account Administrator | To run CloudFormation (or perform steps manually), create/configure S3, VPC and IAM permissions. |
| Protegrity Administrator | The ESA credentials required to read the policy configuration. |
What’s Next
2 - Pre-Configuration
Provide AWS sub-account
Identify or create an AWS account where the Protegrity solution will be installed. The installation instructions assume the same AWS account and region are used for Cloud Protect API deployment.
AWS Account ID: ___________________
AWS Region: ___________________
Create S3 bucket for Installing Artifacts
This S3 bucket will be used for the artifacts required by the CloudFormation installation steps. This S3 bucket must be created in the region that is defined in Provide AWS sub-account.
To create S3 bucket for installing artifacts:
Sign in to the AWS Management Console and open the Amazon S3 console.
Change region to the one determined in Provide AWS sub-account
Click Create Bucket.
Enter a unique bucket name:
For example,
protegrity-install.us-west-2.example.com.Upload the installation artifacts to this bucket. Protegrity will provide the following artifacts.
protegrity-s3-protector-<version>.zip
Note
The S3 Protector installation deployment package contains artifacts for installing Cloud Protect Cloud API. If installing the Cloud API version included with S3 Protector, you may unzip the Cloud API bundle as well. The same S3 bucket may be used to upload those artifacts. For more information on Cloud API installation, check the Cloud API on AWS installation guide.Important
The deployment package you receive from Protegrity must be extracted to reveal the Protegrity artifacts. CloudFormation requires them in the provided .zip format. Do not extract the individual Protegrity artifacts. Upload these artifacts to the S3 bucket created.Artifact S3 Bucket Name: ___________________
Cloud Protect API function
Protegrity Cloud Protect API on AWS is required for the S3 Protector installation. See the Cloud Protect API on AWS documentation to create a new installation if one is not already available in your account/region. With Cloud Protect API on AWS installed, follow the below instructions to obtain the ARN of the protector lambda function.
Follow these steps to obtain Cloud API Lambda ARN.
Access the AWS Management Console.
Navigate to the Cloud Protect API function in the AWS Lambda service.
Open the Cloud Protect API function.
From the Lambda view, choose Aliases, then click on Production alias.
At the top right, copy the Lambda function ARN and record it. The Cloud API Production Alias ARN will be used later in this installation guide when creating IAM policy and deploying S3 Protector with Cloud Formation template.
Cloud Protect API function ARN: ____________________
S3 Buckets For Input And Output Data
Two S3 buckets are required. One bucket is used for incoming files. The second bucket is used for files processed by the S3 Protector. The buckets must be different. The S3 buckets should be created in the region that is defined in Provide AWS sub-account.
Note
Before continuing it is critical to understand Amazon S3 security concepts and best practices. You can refer to AWS S3 Best Practices for the list of recommend S3 security configuration, however it is strongly recommended to check the AWS official documentation for more details.Identify existing bucket names or follow the steps below to create new buckets.
Sign in to the AWS Management Console and open the Amazon S3 console.
Change region to the one determined in Provide AWS sub-account
Select Create Bucket.
Enter a globally unique bucket name. For example: in.us-west-2.example.com or out.us-west-2.example.com.
Scroll down and configure S3 bucket security features. It is strongly recommend to keep Block all public access on. It is also recommend to enable server-side encryption.
Note
Additional S3 security features can be configured after the bucket is created. Refer to AWS documentation for more details.Record bucket names. They will be required later in this installation guide.
Input S3 Bucket Name: ____________________
Output S3 Bucket Name: ____________________
What’s Next
3 - S3 Protector Service Installation
Preparation
Ensure that all the steps in Pre-Configuration are performed.
Login to the AWS sub-account console where Protegrity will be installed.
Ensure that the required CloudFormation templates provided by Protegrity are available on your local computer.
Create S3 Protector Lambda IAM Execution Policy
The below steps create an IAM policy for use by the Protegrity Lambda function. The policy grants permissions to:
- Write logs to CloudWatch
- Read from input S3 bucket
- Write to output S3 bucket
- Invoke Cloud Protect API function
Steps
From the AWS IAM console, select Policies → Create Policy.
Select the JSON tab and copy the following sample policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudWatchWriteLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
},
{
"Sid": "ReadS3In",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectAcl",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::PLACEHOLDER_S3_IN_BUCKET_NAME",
"arn:aws:s3:::PLACEHOLDER_S3_IN_BUCKET_NAME/*"
]
},
{
"Sid": "WriteS3Out",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:ListBucket",
"s3:PutObjectAcl",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::PLACEHOLDER_S3_OUT_BUCKET_NAME",
"arn:aws:s3:::PLACEHOLDER_S3_OUT_BUCKET_NAME/*"
]
},
{
"Sid": "InvokeCloudProtectApi",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"PLACEHOLDER_CLOUD_PROTECT_API_ARN"
]
}
]
}
Replace the PLACEHOLDER values with the values recorded in earlier steps:
- Cloud Protect API prerequisites
- S3 Data Buckets prerequisites
Select Review policy, type in a policy name (e.g.,
ProtegrityS3ProtectorLambdaPolicy) and Confirm.Record the policy name.
S3 Protector Function Policy Name: __________________
Create S3 Protector Lambda IAM Role
The following steps create the role to utilize the policy defined in the previous section.
Steps
From the AWS IAM console, select Role → Create Role.
Select AWS Service → Lambda → Permissions.
In the list, search and select the policy created in the previous step.
Proceed to Tags.
Proceed to final step of the wizard.
Type the role name (e.g.,
ProtegrityS3ProtectorLambdaRole) and click Confirm.Record the role ARN.
Protegrity S3 Protector Lambda Role ARN: ___________________
Install through CloudFormation
The following steps describe deployment of the S3 Protector Lambda Function using CloudFormation.
Access CloudFormation and select the target AWS Region in the console.
Click Create Stack and choose With new resources.
Specify the template.
Select Upload a template file.
Upload the Protegrity-provided CloudFormation template called
pty_s3_protector_cf.jsonand click Next.Specify the stack details. Enter a stack name.
Note
The stack name will be appended to all the services created by the template.Enter the required parameters. All the values were generated in the pre-configuration steps.
CloudFormation Parameters
| Parameter | Description | Default Value |
|---|---|---|
| ArtifactS3Bucket | The name of the S3 bucket containing deployment package for S3 Protector. Use Artifact S3 Bucket Name recorded in prerequisites. Allowed pattern: [a-zA-Z0-9.\-_]+ | |
| CloudApiProtectorLambdaArn | The ARN of the Cloud Protect API Lambda which will be invoked by S3 Protector Lambda. Use Cloud Protect API function ARN recorded in prerequisites. Allowed pattern: arn:(aws[a-zA-Z-]*)?:lambda:[a-z]{2}(-gov)?-[a-z]+-\d{1}:\d{12}:function:[a-zA-Z0-9-_\.]+(:(\$LATEST|[a-zA-Z0-9-_]+))? | |
| DeleteInputFiles | Delete the input files after they have been successfully processed. Allowed values: [true, false] | true |
| IncludeHeader | Add header to output data. Allowed values: [true, false] | true |
| LambdaExecutionRoleArn | S3 Protector Lambda IAM execution role ARN allowing access to CloudWatch logs and S3 bucket. Use Protegrity S3 Protector Lambda Role ARN recorded previously. Allowed pattern: arn:(aws[a-zA-Z-]*)?:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+ | |
| MaxBatchSize | The maximum number of rows to process in single Cloud API invocation. Allowed pattern: [0-9]+ | 25000 |
| MinLogLevel | Minimum log level for S3 protector function. Allowed values: [off, severe, warning, info, config, all] | severe |
| OutputFileCompressionType | Compression type to apply to processed files in the output s3 bucket. Allowed values: [gzip, none] | gzip |
| OutputFileNamePostfix | Postfix to append to processed file names in the output s3 bucket. Allowed values: [none, timestamp]NoteThe timestamp appended when value is ’timestamp’ is a unix timestamp. | timestamp |
| OutputFileFormat | Format of the processed file saved in the output s3 bucket. Allowed values: [csv, json, parquet, preserve_input_format, use_mapping_spec, xlsx]NoteWhen use_mapping_spec is set, the output format will be read from mapping.json file. | preserve_input_format |
| OutputS3BucketName | The name of the output S3 bucket where protected files will be saved. Use Output S3 Bucket Name recorded in prerequisites. Allowed pattern: [a-zA-Z0-9.\-_]+ | |
| PolicyUser | The name of the authorized user in the Protegrity security policy. This is the user which will be applied to every protect operation. | |
| LambdaFunctionProductionVersion | S3 Protector Lambda version handling service requests. Allowed pattern: ([0-9]+|\$LATEST)NoteUsed in upgrade steps | $LATEST |
Click Next with defaults to complete CloudFormation.
After CloudFormation is completed, select the Outputs tab in the stack.
Record the S3 Protector Lambda Name and Arn.
S3 Protector Lambda Name: __________________________
S3 Protector Lambda Arn: ________________________________
Test S3 Protector Function Configuration
Perform the following steps to verify that S3 Protector Function can read files from input S3 bucket, call Cloud API protector and write data to output S3 bucket.
Note
Steps described in this section require read/write permissions for S3 data buckets. Data bucket names were recorded in prerequisites section.Before you begin:
- Update S3 Protector Cloud Formation stack with temporary settings used for testing:
- In AWS Cloud Formation console, go to Stacks
- Select Cloud Formation stack deployed in the installation step
- In the stack details pane, choose Update
- Select Use existing template and then choose Next
- Change the following parameters:
| Parameter | Value | Note |
|---|---|---|
| DeleteInputFiles | false | For testing purposes input file will not be deleted after it’s processed. |
| MinLogLevel | config | Config level prints verbose log messages. |
| OutputFileCompressionType | none | For testing purposes compression is disabled for quicker visual verification of the output file. |
- Select Next and then Submit. Wait until the changes are deployed.
- Upload sample data file to S3 input bucket.
data.csv:
first name,last name,email
tusqB,FrjKe,ebMgF.VoiDd@bqclblD.wOt
JXVVW,acg,BikPa.ufb@UmPxcTD.bLh
mDNJ,IZWCYkbnrAs,NWXD.GdrzMJwmwJG@fMZsuSE.Qlp
jIqColWOss,XKfz,NVabzoUSgx.XRHM@BQleCST.Mnb
muUxYvz,FLZxCHlca,eiNjzCm.UMRNYANwn@isvxpAV.PJk
- Upload mapping.json to the input S3 bucket next to the input data file. Replace placeholders with data element names configured in your security policy. If your Cloud Protect API Function uses sample policy you can replace “protect” with “unprotect” for operation and use “alpha” as data element.
{
"columns":{
"first name":{
"operation":"protect",
"data_element":"<data_element_1_name>"
},
"last name":{
"operation":"protect",
"data_element":"<data_element_2_name>"
},
"email":{
"operation":"protect",
"data_element":"<data_element_3_name>"
}
}
}
Execute S3 Protector Function in AWS console:
With the input data file and mapping file uploaded, follow the steps below to trigger the S3 Protect Function.
Sign in to the AWS Management Console and go to Lambda console.
Select Lambda Function recorded in S3 Protector Lambda Name in Install through CloudFormation section.
On the S3 Protector Function page, choose Test tab.
Copy the json test event into the Event JSON pane - replace bucket name placeholder with your input bucket name.
{ "Records": [ { "s3": { "bucket": { "name": "<PLACEHOLDER_S3_IN_BUCKET_NAME>" }, "object": { "key": "data.csv" } } } ] }Select Test to execute the test event.
Verify execution results:
- Execution is successful if the output of test contains the following:
{
"statusCode": 200,
"body": {
"target": "s3://<PLACEHOLDER_S3_OUT_BUCKET_NAME>/data.<timestamp>.csv"
}
}
If the expected output is not present, please consult the Troubleshooting section for common errors and solutions.
- Download the output file mentioned in the response body in the “target” field. Verify that it was processed according to your mapping.json. If sample policy was used with “unprotect” and “alpha” data element, the output file should contain values below:
first name,last name,email
Lorem,Ipsum,lorem.ipsum@example.com
Dolor,Sit,dolor.sit@example.com
Amet,Consectetur,amet.consectetur@example.com
Adipiscing,Elit,adipiscing.elit@example.com
Vivamus,Elementum,vivamus.elementum@example.com
Restore production configuration:
After S3 Protector Function configuration has been verified, make sure that the following configuration was restored for production environment:
- Cloud Formation configuration - restore values changed in pre-configuration steps at the beginning of this section.
- IAM permissions - remove any additional S3 read/write IAM permissions used to manually upload test datasets to S3.
Configure S3 Lambda Triggers
Follow the steps below to configure Amazon S3 event notification on the input bucket. This will allow Amazon S3 to send an event to S3 Protector Lambda function when an object is created or updated.
Note
The steps below require an AWS Administrator permissions to modify the resource-based Lambda policy. When new S3 trigger is added from the Lambda console, the console modifies the resource-based policy to allow Amazon S3 to invoke the function if the bucket name and account ID match.Note
When uploading multiple files or folders to S3, AWS S3 Lambda Trigger will generate one event per file. As expected, this will result in multiple S3 Protector instances running concurrently, one S3 Protector instance per file.Steps to Add S3 Lambda trigger:
Sign in to the AWS Management Console and open the Amazon Lambda console.
Select Lambda Function recorded in S3 Protector Lambda Name in the installation section.
On the S3 Protector Function page, choose Aliases, then click on Production alias.
In the Function overview pane, choose Add trigger.
Select S3.
Under Bucket, select the bucket recorded in Input S3 Bucket Name in prerequisites section.
Under Event types, select All object create events.
Optionally enter a file prefix.
Enter a file suffix, e.g.:
.csv. You can find the full list of supported file formats in the Features section.Under Recursive invocation, select the check box to acknowledge that using the same Amazon S3 bucket for input and output is not recommended.
Choose Add.
Repeat these steps for additional file suffixes supported by S3 Protector.
Example Usage
This section describes typical usage of S3 Protector.
Prepare data for testing:
Sample datasets and mapping.json files are provided in appendix sections:
- CSV with no header
- CSV with pipe delimiter
Create a new folder in the input S3 bucket:
A new folder must be created in the S3 input bucket for each distinct file schema. Each folder can have a mapping.json file corresponding to the dataset type expected. It is recommended that input folders use S3 encryption:
- From the AWS S3 console, search and select the S3 input bucket created earlier for input files
- Click the Create folder button
- Provide a descriptive name for the type of dataset, e.g. sales orders
- In Server-side encryption, select Enable
- Use the default key type, Amazon S3 key (SSE-S3)
- Click Create folder
Upload the mapping.json and dataset to the folder:
The appropriate mapping.json file must be uploaded to the folder prior to uploading the dataset.
- Choose one of the sample dataset and mapping.json pairs from the appendix. Replace the data elements in mapping.json with similar data elements from your security policy
- From the AWS console, navigate to Amazon S3, search and select the S3 input bucket created earlier for incoming files
- Navigate to the desired folder
- Click the Upload button
- Click Add files
- Upload the mapping.json file
- Click the Upload button
- Now repeat the above step for the sample dataset
Verify output:
Verify the output file was created:
- From the AWS console, navigate to Amazon S3, search and select the S3 output or target bucket created earlier for writing processed files
- Navigate to the corresponding folder
- There should be a non-zero byte file with protected values
- Select the file
- From the menu select Actions | Query with S3 Select
- Click the Run SQL query
- Click the Formatted tab of the resultset
- Verify the data is protected
Troubleshooting / Logs:
Logs are written to CloudWatch. This could provide helpful information if the results are not as expected:
- From the AWS console, navigate to the Lambda service | Functions
- Select and open the Lambda we created for protecting S3 files
- At the top of function’s workspace, click the Monitoring tab
- Click the button View logs in CloudWatch
- Click the latest log stream
- Scroll to the bottom of the stream for the latest log entries
Troubleshooting
By default S3 Protector is set to log minimal information. It is beneficial to increase S3 Protector log level to either ‘config’ or ‘all’ while troubleshooting any error conditions. Use the CloudFormation installation steps to increase ‘MinLogLevel’ function configuration.
S3 Protector Error States
| Error State | Description | Action |
|---|---|---|
| 400 Error | A configuration error has occurred. The standard log should provide a descriptive error message. File processing has not started. Nothing was written to target bucket. | Review the log for descriptive error message. Most likely some configuration parameters will need to be updated before S3 Protector can be re-started for failed file. Use the CloudFormation installation steps to update function configuration. |
| 500 PermissionError | S3 Protector does not have enough permissions to access AWS resources. | Review S3 Protector IAM Policy |
| 500 Exception | An error has occurred. The log may provide additional details. File processing may have started and a partial file may have been written to the target S3 bucket. While S3 Protector does not write unprotected data to partially processed files, S3 Protector automatically removes these files on error. | Review error log for additional information. |
| Status: timeout | S3 Protector ran out of time while processing large files. | Review S3 Protector Timeout Section |
| AWS Lambda crash | Any AWS Lambda function may crash due to intermittent failures. If this occurs a partial file may have been written to the target S3 bucket. Due to the crash, S3 will assume this file to be an incomplete multi-part upload. Incomplete uploads do not appear as a standard S3 files, they are not shown in AWS S3 console. You are still charged for incomplete uploads. | 1. Discover and abort incomplete multi-part uploads for target bucket (e.g. using AWS CLI) 2. Restart S3 Protector for failed file |
Restarting S3 Protector
If S3 Protector fails, it is possible to start S3 Protector for existing source file without re-uploading the file again by using AWS Lambda console. With the input data file and mapping file uploaded, follow the steps below to trigger the S3 Protect Function.
Steps
Sign in to the AWS Management Console and go to Lambda console.
Select Lambda Function recorded in S3 Protector Lambda Name in the CloudFormation installation section.
On the S3 Protector Function page, choose Test tab.
Copy the json test event into the Event JSON pane - replace bucket name placeholder with your input bucket name:
{
"Records": [
{
"s3": {
"bucket": {
"name": "<PLACEHOLDER_S3_IN_BUCKET_NAME>"
},
"object": {
"key": "data.csv"
}
}
}
]
}
- Select Test to execute the test event.
4 -
Prerequisites
| Requirement | Detail |
|---|---|
| S3 Protector distribution and installation scripts | These artifacts are provided by Protegrity |
| Protegrity Cloud Protect API | This product is required. |
| AWS Account | Recommend using the same AWS account as the Protegrity Cloud API deployment. |
5 -
AWS Services
The following table describes the AWS services that may be a part of your Protegrity installation.
| Service | Description |
|---|---|
| Lambda | Provides serverless compute for S3 Protector. |
| S3 | Input and Output data to be processed with S3 Protector. |
| CloudWatch | Application and audit logs, performance monitoring, and alerts. |
6 -
Required Skills and Abilities
| Role / Skillset | Description |
|---|---|
| AWS Account Administrator | To run CloudFormation (or perform steps manually), create/configure S3, VPC and IAM permissions. |
| Protegrity Administrator | The ESA credentials required to read the policy configuration. |