1 - Sample Athena External Functions

Sample Athena External Functions

Method: Tokenization

Type: ALPHA

 

Athena Data Types

Athena Max Size

Protegrity Max Size

VARCHAR

65K (65,535 bytes)

4K (4,096 bytes)

 

 

 

External Function Sample Definitions:

USING EXTERNAL FUNCTION protect(val varchar, el varchar) RETURNS varchar
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect('hello world!', 'deAlpha'); 
USING EXTERNAL FUNCTION unprotect(val varchar, el varchar) RETURNS varchar
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect('UtfVk UHgcD!', 'deAlpha'); 

 

Method: Tokenization

Type: NUMERIC

 

Athena Data Types

Athena Max Size

Protegrity Max Size

DECIMAL

4K (4,096 bytes)

4K (4,096 bytes)

INTEGER

BIGINT

DOUBLE

 

External Function Sample Definitions:

USING EXTERNAL FUNCTION protect(val INTEGER, el varchar) RETURNS INTEGER 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(1234, 'deInt4') as protected_int;

USING EXTERNAL FUNCTION unprotect(val INTEGER, el varchar) RETURNS INTEGER 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(1626684183, 'deInt4') as clear_int;
USING EXTERNAL FUNCTION protect(val BIGINT, el varchar) RETURNS BIGINT 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(2147483648, 'deInt8') as protected_int8;

USING EXTERNAL FUNCTION unprotect(val BIGINT, el varchar) RETURNS BIGINT 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(-5629842771706374842, 'deInt8') as clear_int8;
USING EXTERNAL FUNCTION protect(val decimal(3,2), el varchar) RETURNS decimal(3,2)
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(decimal '1.23', 'deDecimal') as protected_decimal;

USING EXTERNAL FUNCTION unprotect(val decimal(3,2), el varchar) RETURNS decimal(3,2)
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(decimal '6.16', 'deDecimal') as clear_decimal;

Method: Tokenization

Type: Date YYYY-MM-DD

 

Athena Data Types

Athena Max Size

Protegrity Max Size

DATE (any supported format)

10 bytes

10 bytes

 

External Function Sample Definitions:

USING EXTERNAL FUNCTION protect(val DATE, el varchar) RETURNS DATE 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(DATE('2020-11-09'), 'deDate') as protected_date;

USING EXTERNAL FUNCTION unprotect(val DATE, el varchar) RETURNS DATE 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(DATE('1283-09-22'), 'deDate') as clear_date;

 

 

Method: Tokenization

Type: DATETIME YYYY-MM-DD HH:mm:ss

 

Athena Data Types

Athena Max Size

Protegrity Max Size

TIMESTAMP

29 bytes

29 bytes

External Function Sample Definitions:

USING EXTERNAL FUNCTION protect(val TIMESTAMP, el varchar) RETURNS TIMESTAMP 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(timestamp '2020-11-09 02:02:03.123', 'deDOB') as protected_time; 

USING EXTERNAL FUNCTION unprotect(val TIMESTAMP, el varchar) RETURNS TIMESTAMP 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(timestamp '1283-09-22 02:02:03.123', 'deDOB') as clear_time;

 

Method: Encryption

Type: AES

 

Athena Data Types

Athena Max Size

Protegrity Max Size

VARBINARY

 

 

 

External Function Sample Definitions:

USING EXTERNAL FUNCTION protect(val varbinary, el varchar) RETURNS varbinary 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT protect(CAST('protegrity' as varbinary), 'aes256'); 

USING EXTERNAL FUNCTION unprotect(val varbinary, el varchar) RETURNS varbinary 
     LAMBDA '<replace_with_athena_protect_function_name>:Production'
SELECT unprotect(from_hex('a79fa1c1a21f84e0d00acd8b20cf020c'), 'aes256');

 

2 - Installing the Policy Agent and Protector in Different AWS Accounts

Example steps to install Agent in a different AWS account than the Protector

    The Policy Agent Lambda function and Protect Lambda functions can be installed in separate AWS accounts. However, additional configuration is required to authorize the Policy Agent to provision the security policy to a remote Protect Lambda function.

    Create Agent Lambda IAM policy

    1. Login to the AWS account that hosts the Protect Lambda function.

    2. From the AWS IAM console, select Policies > Create Policy.

    3. Select the JSON tab and copy the following snippet.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "LambdaUpdateFunction",
            "Effect": "Allow",
            "Action": [
              "lambda:UpdateFunctionConfiguration"
            ],
            "Resource": [
              "arn:aws:lambda:*:*:function:*"
            ]
          },
          {
            "Sid": "LambdaReadLayerVersion",
            "Effect": "Allow",
            "Action": [
              "lambda:GetLayerVersion",
              "lambda:ListLayerVersions"
            ],
            "Resource": "*"
          },
          {
            "Sid": "LambdaDeleteLayerVersion",
            "Effect": "Allow",
            "Action": "lambda:DeleteLayerVersion",
            "Resource": "arn:aws:lambda:*:*:layer:*:*"
          },
          {
            "Sid": "LambdaPublishLayerVersion",
            "Effect": "Allow",
            "Action": "lambda:PublishLayerVersion",
            "Resource": "arn:aws:lambda:*:*:layer:*"
          },
          {
            "Sid": "S3GetObject",
            "Effect": "Allow",
            "Action": [
              "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
          },
          {
            "Sid": "S3PutObject",
            "Effect": "Allow",
            "Action": [
              "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
          },
          {
            "Sid": "LambdaGetConfiguration",
            "Effect": "Allow",
            "Action": [
                "lambda:GetFunctionConfiguration"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:*"
            ]
          }
        ]
      }
      
    4. Replace the wildcards (*) with the region, account, and resource name information where required.

    5. Select Review policy, type in the policy name, and confirm. Record policy name:

      Agent Lambda Cross Account Policy Name: ___________________

    Create Policy Agent cross-account IAM Role

    1. Login to the AWS account that hosts the Protect Lambda function.

    2. From the AWS IAM console, select Roles > Create Role

    3. Select AWS Service > Lambda . Proceed to Permissions.

    4. Select Policy created in the step above. Proceed to Tags.

    5. Specify Tag, proceed to the final screen. Type in policy name and confirm. Record the name.

      Policy Agent Cross Account IAM Role Name: ___________________

    Allow the Policy Agent Cross-Account Role to be Assumed by the Policy Agent IAM Role

    1. Login to the AWS account that hosts the Protect Lambda function.

    2. Navigate to the previously created IAM Role (Agent Lambda Cross-Account IAM Role Name).

    3. Navigate to Trust Relationships > Edit Trust Relationships.

    4. Modify the Policy Document, replacing the placeholder value indicated in the following snippet as <Agent Lambda IAM Execution Role ARN> with ARN of Agent Lambda IAM Role that was created in Agent Installation.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
      
         "Principal": {
      
                  "AWS": "<Agent Lambda IAM Execution Role Name>"
      
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
      
    5. Click Update Trust Policy.

    Add Assume Role to the Policy Agent Execution IAM Role

    1. Login to the AWS account that hosts the Policy Agent.

    2. Navigate to the Agent Lambda IAM Execution Role that was created in Agent Installation.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
      
         "Principal": {
      
                  "AWS": "<Agent Lambda IAM Execution Role Name>"
      
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
      
    3. Add Inline Policy.

    4. Modify the Policy Document, replacing the placeholder value indicated in the following snippet as <Agent Lambda Cross-Account IAM ARN> with the value recorded in Create Policy Agent cross-account IAM Role.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "sts:AssumeRole"
            ],
            "Resource": "<Agent Lambda Cross-Account IAM  ARN>."
          }
        ]
      }
      
    5. When you are finished, choose Review Policy.

    6. On the Review policy page, type a Name, then choose Create Policy.

    Update the Policy Agent Lambda Configuration

    1. From the AWS console, navigate to Lambda, and select the Policy Agent Lambda function.

    2. Select Configuration tab | Environment variables.

    3. Select Edit and add the following environment variables with the value from Agent Lambda Cross-Account IAM ARN:

      ParameterValue
      AWS_ASSUME_ROLEAgent Lambda Cross-Account IAM ARN
    4. Ensure the values in the Parameters AWS_POLICY_S3_BUCKET, AWS_PROTECT_FN_NAME and AWS_POLICY_LAYER_NAME are all in the Protect Lambda Function AWS Account.

    5. In case custom VPC hostname configuration is used, you will need to set the ENDPOINT_URL. Refer to Policy Agent - Custom VPC Endpoint Hostname Configuration.

      AWS_VPC_ENDPOINT_URL

      <AWS_VPC_ENDPOINT>

    6. Click Save and Run the Lambda. The Lambda will now assume the Role in Protect Lambda Function AWS Account and update the policy cross accounts.

    3 - Integrating Cloud Protect with PPC (Protegrity Provisioned Cluster)

    Concepts for integrating with PPC (Protegrity Provisioned Cluster)

    This guide describes how to configure the Protegrity Policy Agent and Log Forwarder to connect to a Protegrity Provisioned Cluster (PPC), highlighting the differences from connecting to ESA.

    Key Differences: PPC vs ESA

    FeatureESA 10.2PPC (this guide)
    Datastore Key FingerprintOptional/RecommendedRequired
    CA Certificate on AgentOptional/RecommendedOptional/Recommended
    CA Certificate on Log ForwarderOptional/RecommendedNot supported
    Client Certificate Authentication from Log ForwarderOptional/RecommendedNot supported
    IP AddressESA IP addressPPC address

    Prerequisites

    • Access to PPC and required credentials.
    • Tools: curl, kubectl installed.

    Policy Agent Setup with PPC

    Follow these instructions as a guide for understanding specific inputs for Policy Agent integrating with PPC:

    1. Obtain the Datastore Key Fingerprint

      To retrieve the fingerprint for your Policy Agent:

      curl -k -H "Authorization: Bearer ${TOKEN}" -X POST https://${HOST}/pty/v2/pim/datastores/1/export/keys  -H "Content-Type: application/json" --data '{
        "algorithm": "RSA-OAEP-256",
        "description": "example-key-from-kms",
        "pem": "-----BEGIN PUBLIC KEY-----\nABC123... ...890XYZ\n-----END PUBLIC KEY-----"
      }'
      

      Sample Output:

      {"uid":"1","algorithm":"RSA-OAEP-256","fingerprint":"4c:46:d8:05:35:2e:eb:39:4d:39:8e:6f:28:c3:ab:d3:bc:9e:7a:cb:95:cb:b1:8e:b5:90:21:0f:d3:2c:0b:27","description":"example-key-from-kms"}
      

      Record the fingerprint value and configure it as the PTY_DATASTORE_KEY for the Policy Agent.

    2. Retrieve the PPC CA Certificate

      To obtain the CA certificate from PPC:

      kubectl -n api-gateway get secret ingress-certificate-secret -o jsonpath='{.data.ca\.crt}' | base64 -d > CA.pem
      

      Use the CA.pem that was returned as described in Policy Agent Installation.

    3. Configure the PPC Address

      Use the PPC address in place of the ESA IP address wherever required in your configuration.

    Log Forwarder Setup with PPC

    • The Log Forwarder will proceed without certificates and will print a warning if PtyEsaCaServerCert and PtyEsaClientCertificatesSecretId are not provided.
    • No additional certificate or CA configuration is needed for PPC.

    Troubleshooting

    Protector Lambda fails with “AWS KMS Decrypt failed”

    Symptom:

    After a successful Policy Agent run and layer update, the Protector Lambda returns:

    {
      "body": "{\"error_msg\":\"Failed to open decoder: rpdecode decrypt failure: dek callback failed: AWS KMS Decrypt failed: \",\"success\":false}",
      "isBase64Encoded": false,
      "statusCode": 400
    }
    

    The Protector Lambda logs show:

    [SEVERE] [utils.cpp:185] AWS KMS Decrypt failed:
    

    Cause:

    The public key configured in the PPC/ESA datastore does not match the KMS key pair used by the Policy Agent. The policy package is encrypted with the public key stored in the datastore. If that key does not correspond to the KMS key pair whose private key is used for decryption, the Protector Lambda will fail to decrypt the policy.

    Resolution:

    1. Identify the KMS key pair used by the Policy Agent (the key ARN configured during pre-configuration).
    2. Export the public key from that KMS key pair.
    3. In PPC/ESA, ensure the datastore’s export key is configured with the public key from that same KMS key pair. See Obtain the Datastore Key Fingerprint above.
    4. Re-run the Policy Agent to generate a new policy package encrypted with the correct key.
    5. Test the Protector Lambda again.

    Additional Notes

      4 - Security Recommendations

        The following section provides recommendations on configuring Amazon Athena for querying PII Data protected by Protegrity Athena Protector.

        Security is a shared responsibility between AWS and you. When using PII Data in Amazon Athena, it is essential to understand the best practices and keep your Data always protected. In this section, we summarize the configuration needed when using Amazon Athena.

        To read more on Amazon Shared responsibility on Amazon Athena, visit Amazon Athena Security - Amazon Athena

        Logging and Monitoring

        Enable AWS CloudTrail to audit all calls made to Athena API.

        For more information, visit Logging Amazon Athena API Calls with AWS CloudTrail - Amazon Athena .

        Encryption at rest

        AWS S3 buckets

        Amazon Athena lets you run queries on encrypted data stored in Amazon S3 repositories in the same region. Make sure you enable Amazon S3 encryption options supported by Amazon Athena.

        For more information, visit Creating Tables Based on Encrypted Datasets in Amazon S3 - Amazon Athena .

        Query Results and Query History

        Amazon Athena saves the query history in an S3 bucket. If you unprotect data using Amazon Athena Protector, Amazon Athena saves the results (the unprotected data) in an S3 bucket. The query history is viewable by anyone with IAM permissions on the bucket. To remediate, we suggest the following configurations.

        Setting Amazon Athena Workgroup

        You should set up the Amazon Athena Workgroup S3 staging directory and overwrite Client-side settings. It ensures all users comply with the S3 staging directory and encryption setting for the results. Restrict the IAM access to the bucket to the minimum required for Amazon Athena to work.

        Amazon S3 lifecycle policy

        Amazon Athena’s defaults configuration is to store the results for 45 days, and we suggest reducing it to the minimum (1 day) using the Amazon S3 lifecycle policy.

        For more infromation, visit Working with Query Results, Output Files, and Query History - Amazon Athena

        Encrypt Glue Catalog

        Amazon Athena has integration with AWS Glue Data Catalog. If you use it, you can enable encryption in the AWS Glue Data Catalog. It doesn’t encrypt the data, only the Athena table definition. It provides another layer of security on where your data exists and what it includes.

        For more information, visit Encrypting Your Data Catalog. Access from Athena to Encrypted Metadata in the AWS Glue Data Catalog - Amazon Athena.

        Encryption in transit

        To allow only encrypted connections with HTTPS (TLS), you can apply the aws:SecureTransport condition on S3 buckets IAM policies.

        Access Control

        Resources

        Make sure you provide the least privilege access control to Amazon Athena workgroup, S3 buckets, Protegrity Protect Lambda function, AWS KMS (If used for data encryption at rest).

        For more information, visit Identity and Access Management in Athena - Amazon Athena .

        Granting access to use the Cloud Protect UDF

        The ability to use the Cloud Protect UDF from Athena is controlled through IAM permissions. The Athena user/role must have the InvokeFunction permission to the Cloud Protect Lambda function as shown in the following example:

        { 
            "Version": "2012-10-17", 
            "Statement": [ 
              { 
                    "Sid": "ProtectLambdaFunction", 
                    "Effect": "Allow", 
                    "Action": "lambda:InvokeFunction", 
                    "Resource": "<PROTECT_FUNCTION_ARN>" 
               } 
            ] 
        }
        

        The policy above would be used in addition to any other IAM policies required to use Amazon Athena. Refer to the AWS Athena example policy for a typical IAM policy.

        Separate Workgroups

        Create separate Workgroups based on the privacy controls. It provides more control on who can see the Query History and access unprotected data stored there.

        For more information, visit Using Workgroups to Control Query Access and Costs - Amazon Athena .

        AWS Lake Formation

        Amazon Athena can benefit from AWS Lake Formation table and column access policies. It is another layer of security before Protegrity Protect Function and reduces unauthorized requests.

        For more information, visit Using Athena to Query Data Registered With AWS Lake Formation - Amazon Athena .

        5 - Policy Agent - Custom VPC Endpoint Hostname Configuration

        Custom vpc endpoint hostname configuration

        The Policy Agent uses default endpoint hostnames to communicate with other AWS services (for example, secretsmanager.amazonaws.com). This configuration will only work in VPCs where Amazon-provided DNS is available (default VPC configuration with private DNS option enabled for the endpoint). If your VPC uses custom DNS, follow the instructions below to configure the Policy Agent Lambda to use custom endpoint hostnames.

        Identify DNS Hostnames

        To identify DNS hostnames:

        1. From AWS console, select VPC > Endpoints.

        2. Select Secrets Manager endpoint from the list of endpoints.

        3. Under Details > DNS Names, note the private endpoint DNS names adding https:// at the beginning of the endpoint name.

          For example, https://vpce-1234-4pzomrye.kms.us-west-1.vpce.amazonaws.com

        4. Note down DNS names for the KMS and Lambda endpoints:

          AWS_SECRETSMANAGER_ENDPOINT: https://_________________

          AWS_KMS_ENDPOINT: https://_________________

          AWS_LAMBDA_ENDPOINT: https://_________________

        Update the Policy Agent Lambda configuration

        To update policy agent lambda configuration:

        1. From the AWS console, navigate to Lambda, and select the Policy Agent Lambda function.

        2. Select the Configuration section and choose Environment variables.

        3. Select Edit and add the following environment variables with the corresponding endpoint URLs recorded in steps 3-4:

          ParametersValue
          AWS_SECRETSMANAGER_ENDPOINT_URL<AWS_SECRETS_ENDPOINT>
          AWS_KMS_ENDPOINT_URL<AWS KMS ENDPOINT>
          AWS_LAMBDA_ENDPOINT_URL<AWS LAMBDA ENDPOINT>
        4. Click Save and Run the Lambda. The Lambda will now use endpoints you have just configured.

        6 - Protection Methods

        Cloud API supported protection methods

        Protection Methods

        For more information about the protection methods supported by Protegrity, refer to the Protection Methods Reference.

        Tokenization Type

        Supported Input Data Types

        Notes

        Numeric

        Credit Card

        Alpha

        Upper-case Alpha

        Alpha-Numeric

        Upper Alpha-Numeric

        Lower ASCII

        Printable

        Decimal

        Unicode

        Unicode Base64

        Unicode Gen2

        Email

        STRING

        NULL

        Integer

        NUMBER

        NULL

        Date

        Datetime

        STRING

        NULL

        For information about supported formats, refer to the Protection Methods Reference.

        Binary

        STRING

        NULL

        Must be hex encoded unless a different encoding is specified. Another supported encoding is base64.

        Protection Method

        Supported Input Data Types

        Notes

        No Encryption

        STRING

        NUMBER

        NULL

        Encryption Algorithm

        Supported Input Data Types

        Notes

        3DES

        AES-128

        AES-256

        CUSP 3DES

        CUSP AES-128

        CUSP AES-256

        STRING

        Must be hex encoded unless a different encoding is specified. Another supported encoding is base64.

        7 - Configuring Regular Expression to Extract Policy Username

        Extract the policy username from the AWS identity.

        Configuring Regular Expression to Extract Policy Username

        Cloud Protect Lambda Function exposes USERNAME_REGEX configuration to allow extraction of policy username from user in the request.

        • USERNAME_REGEX Lambda Environment configuration

          The USERNAME_REGEX configuration can be used to extract policy username from user in the request. The following are allowed values for USERNAME_REGEX:

          • 1 - Default build-in regular expression is used:

            ^arn:aws:(?:iam|sts)::[0-9]{12}:(?:role|user|group|assumed\-role|federated\-user)\/([\w\/+=,.\-]{1,1024}|[\w\/+=,.\-@]{1,1024})(?:@[a-zA-Z0-9\-]{1,320}(?:\.\w+)+)?$
            
          • ^User regex$ - Custom regex with one capturing group. This group is used to extract the username. Examples below show different regular expression values and the resulting policy user.

        USERNAME_REGEX

        User in the request

        Effective Policy User

        Not Set

        arn:aws:iam::123456789012:user/juliet.snow

        arn:aws:iam::123456789012:user/juliet.snow

        arn:aws:sts::123456789012:assumed-role/TestSaml

        arn:aws:sts::123456789012:assumed-role/TestSaml

        1

        arn:aws:iam::123456789012:user/juliet.snow

        juliet.snow

        arn:aws:sts::123456789012:assumed-role/TestSaml

        TestSaml

        ^arn:aws:(?:iam|sts)::[0-9]{12}:((?:role|user|group|assumed-role|federated-user).*)$
        

        arn:aws:iam::123456789012:user/juliet.snow

        user/juliet.snow

        arn:aws:sts::123456789012:assumed-role/TestSaml

        assumed-role/TestSaml

        8 - Associating ESA Data Store With Cloud Protect Agent

        Configure ESA data store for Policy Agent.

        Associating ESA Data Store With Cloud Protect Agent

        ESA controls which policy is deployed to protector using concept of data store. A data store may contain a list of IP addresses identifying servers allowed to pull the policy associated with that specific data store. Data store may also be defined as default data store, which allows any server to pull the policy, provided it does not belong to any other data stores. Node registration occurs when the policy server (in this case the policy agent) makes a policy request to ESA, where the agent’s IP address is identified by ESA.

        Policy agent lambda source IP address used for node registration on ESA depends on ESA hubcontroller configuration ASSIGN_DATASTORE_USING_NODE_IP and the PTY_ADDIPADDRESSHEADER configuration exposed by the agent lambda.

        The Lambda service uses multiple network interfaces, internal network interface with ephemeral IP range of 169.254.x.x and external network interface with IP range of the VPC subnet the Lambda is associated with. By default, when agent lambda is contacting ESA to register node for policy download, ESA uses agent Lambda VPC IP address. This default behavior is caused by the default ESA hubcontroller configuration ASSIGN_DATASTORE_USING_NODE_IP=false and agent default configuration PTY_ADDIPADDRESSHEADER=yes.

        In some cases, when there is a proxy server between the ESA and agent lambda, the desirable ESA configuration is ASSIGN_DATASTORE_USING_NODE_IP=true. and PTY_ADDIPADDRESSHEADER=no which will cause the ESA to use proxy server IP address.

        The table below shows how the hubcontroller and agent settings will affect node IP registration on ESA.

        Agent source IPAgent VPC subnet IPProxy IPESA config - ASSIGN_DATASTORE_USING_NODE_IPAgent lambda config - PTY_ADDIPADDRESSHEADERAgent node registration IP
        169.254.144.8110.1.2.173No Proxytrueyes169.254.144.81
        trueno10.1.2.173
        falseyes
        falseno
        169.254.144.8110.1.2.17334.230.42.110trueyes169.254.144.81
        trueno34.230.42.110
        falseyes
        falseno