1 - Protegrity Anonymization

Using Protegrity Anonymization with PPC

Protegrity Anonymization is a software solution that processes data by removing personal information and transforming the remaining details to protect privacy. In simple terms, it takes raw data as input, applies techniques like generalization and summarization, and outputs anonymized data. This output can be used for analysis without revealing individual identities.

For more information about Protegrity Anonymization, refer to Protegrity Anonymization.

1.1 - Prerequisites

List of Prerequisites for Protegrity Anonymization.

Ensure the following prerequisites are met:

  1. Tools:

    • helm and kubectl are installed and configured with access Protegrity Provisioned Cluster (PPC).
    • pipis installed in the Python Virtual Environment.
  2. AWS Setup:

    • A Protegrity Provisioned Cluster (PPC) is available.
      For more information about PPC, refer to Protegrity Provisioned Cluster.
    • An AWS account with CLI credentials for configuring AWS is available.
    • An existing VPC with at least two private subnets is available.
    • An S3 bucket for storing anonymization artifacts is available and must exist before installation. The S3 bucket should not be KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.
    • An IAM role (for example, arn:aws:iam::<Account_ID>:role/<Role_Name>) with the required S3 permissions (s3:ListBucket, s3:GetObject, s3:PutObject, s3:DeleteObject) must exist before installation.
    • Sufficient permissions to create namespaces, deployments, secrets, and services.
    • Ensure that the jumpbox can connect to the required repositories. If not already authenticated, then log in to the required repository.
  • For connecting and deploying from the Protegrity Container Registry (PCR), use the following command and the credentials obtained from the My.Protegrity portal during account creation:
helm registry login registry.protegrity.com:9443
  • For connecting and deploying to the local repository, use your local credentials and local repository endpoint as required.
  1. IRSA and OIDC Configurations:

    • AWS Bootstrap access to create IRSA and OIDC configurations is included in the Helm package.

      Note: It is recommended to execute the OIDC setup once with assistance from IT, as it requires elevated AWS permissions.
      The following AWS permissions are required to perform the OIDC Setup.
      iam:CreateOpenIDConnectProvider, iam:ListOpenIDConnectProviders,iam:DeleteOpenIDConnectProvider,eks:DescribeCluster,iam:GetRole, iam:UpdateAssumeRolePolicy, sts:GetCallerIdentity, iam:GetPolicy,iam:CreatePolicy,iam:ListAttachedRolePolicies,iam:AttachRolePolicy

    • Sample Roles and Permissions JSON

        {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "EKSDescribeCluster",
              "Effect": "Allow",
              "Action": "eks:DescribeCluster",
              "Resource": "arn:aws:eks:<REGION>:<ACCOUNT_ID>:cluster/<CLUSTER_NAME>"
            },
            {
              "Sid": "OIDCProviderList",
              "Effect": "Allow",
              "Action": "iam:ListOpenIDConnectProviders",
              "Resource": "*"
            },
            {
              "Sid": "OIDCProviderCreate",
              "Effect": "Allow",
              "Action": "iam:CreateOpenIDConnectProvider",
              "Resource": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<REGION>.amazonaws.com/id/*"
            },
            {
              "Sid": "IAMRoleManagement",
              "Effect": "Allow",
              "Action": [
                "iam:GetRole",
                "iam:UpdateAssumeRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:AttachRolePolicy"
              ],
              "Resource": "arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE_NAME>"
            },
            {
              "Sid": "IAMPolicyManagement",
              "Effect": "Allow",
              "Action": [
                "iam:GetPolicy",
                "iam:CreatePolicy"
              ],
              "Resource": "arn:aws:iam::<ACCOUNT_ID>:policy/<IAM_ROLE_NAME>_<S3_BUCKET_NAME>_<NAMESPACE>_S3Policy"
            },
            {
              "Sid": "STSIdentity",
              "Effect": "Allow",
              "Action": "sts:GetCallerIdentity",
              "Resource": "*"
            }
          ]
        }
      

1.2 - Installing Protegrity Anonymization

Steps to install Protegrity Anonymization

Overview

This project deploys the Protegrity Anonymization SDK stack on Amazon EKS as part of the Protegrity AI Team Edition.
It uses Helm to deploy Kubernetes workloads.

Deployment Steps

1. Prepare Configuration

  1. Create an override_values.yaml file with environment‑specific configuration.

    s3:
      bucketName: "<>"  # S3 bucket name for storage (must exist before installation)
      region: "us-east-1"  # Update AWS region
      iamRoleArn: "<>"  # IAM role ARN with S3 permissions (s3:ListBucket, s3:GetObject, s3:PutObject, s3:DeleteObject) (must exist before installation)
    image:
      anonapi_tag: /anonymization/1.4/containers/anonymization-service:release-1.4.1_13   # Tag name for Anonymization Image.
      postgres_tag: /shared/containers/postgres/17:37      
    

    Note: Ensure the S3 bucket is not KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.

  2. Create namespace for deployment.

    kubectl create namespace anon-ns  
    

    Note: Ensure all necessary parameters are set.

2. IRSA and OIDC Setup

> **Note**: This setup requires elevated privileges and is recommended to be performed with assistance from your IT team.  
  1. Pull and extract the Helm chart using the following command.

      helm pull oci://<Container_Registry_Path>/anonymization/1.4/helm/anonymization-service --version=1.4.1
      tar -xvf anonymization-service-1.4.1.tgz
    
  2. Run the OIDC and IRSA setup script.

    Use the oidc_iam_setup-aws.sh script included with the chart to configure:

    • The OIDC identity provider in AWS IAM.

    • The IAM role trust relationship for the Kubernetes service account.

      sh anonymization-service/oidc_iam_setup-aws.sh <CLUSTER_NAME> <REGION> <IAM_ROLE> <S3_BUCKET_NAME> anon-ns anon-service-account
      #Usage: oidc_iam_setup-aws.sh <CLUSTER_NAME> <REGION> <IAM_ROLE> <S3_BUCKET_NAME> <NAMESPACE> <SERVICE_ACCOUNT_NAME>
      #Ex: oidc_iam_setup-aws.sh CLUSTER_NAME us-east-1 access_ROLE_name anon_bucket anon-ns anon-service-account
      

      Note:

      • The Anonymization service account (anon-service-account) and namespace (anon-ns) are predefined in values.yaml file.
      • Retrieve the cluster name using the following command:
        kubectl get configmap/nfa-config -n default -o jsonpath='{.data.CLUSTER_NAME}'
      
    • Verify successful setup.
      A successful run ends with output similar to the following:

      ✓ Policy already attached to role
      =========================================
      ✓ Setup Complete!
      =========================================
      

3. Deploy

  1. Deploy using the override_values.yaml file.

    helm install pty-anonymization oci://<Container_Registry_Path>/anonymization/1.4/helm/anonymization-service --version=1.4.1 -n anon-ns -f override_values.yaml 
    

4. Monitor

  1. Monitor the deployment process using the following command.

     kubectl get pods -n anon-ns
    

    Verify all pods are in the Running state. The following is the sample output.

    NAME                                  READY   STATUS    RESTARTS   AGE
    anon-app-depl-f5c4d4cd6-42wgn         1/1     Running   0          3m20s
    anon-db-depl-0                        1/1     Running   0          3m20s
    anon-scheduler-depl-7b87fcb74-l5q6v   1/1     Running   0          3m20s
    anon-worker-depl-7c4d95496f-djw7f     1/1     Running   0          3m20s
    anon-worker-depl-7c4d95496f-gnnvp     1/1     Running   0          3m20s
    
  2. Verify all the Anonymization services are deployed.

     kubectl get svc -n anon-ns
    

    The following is the sample output.

    NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
    anon-app-svc    ClusterIP   172.20.151.139   <none>        8090/TCP   61s
    anon-dask-svc   ClusterIP   172.20.224.133   <none>        8786/TCP   61s
    

For more information about building the Request using the REST API, refer to Building the Request using the REST API.

1.3 - Configuring Protegrity Anonymization

Steps to configure Protegrity Anonymization.

Update Role Permission and Create User

After deployment, update the default anonymization_administrator role to include can_create_token permission and then create a user with this role.

Step 1: Update anonymization_administrator role permission

export GATEWAY_URL="https://$(kubectl get configmap/nfa-config -n default -o jsonpath='{.data.FQDN}')"
# 1. Obtain an Authentication Token
TOKEN=$(curl -sk -X POST "${GATEWAY_URL}/api/v1/auth/login/token" \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'loginname=admin&password=Admin123!' \
  -D - -o /dev/null | grep -i 'pty_access_jwt_token' | awk '{print $2}' | tr -d '\r\n')

curl -sk -X PUT \
  "${GATEWAY_URL}/pty/v1/auth/roles" \
  -H 'accept: application/json' \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "anonymization_administrator",
    "description": "Administrator role",
    "permissions": [
      "can_create_token",
      "anonymization_operations_admin"
    ]
  }'

Step 2: Create user with anonymization_administrator role attached

Use the following request payload when creating the user:

{
  "username": "anonymization_admin",
  "email": "anonadmin@example.com",
  "firstName": "Anon",
  "lastName": "User",
  "password": "StrongPassword123!",
  "roles": [
    "anonymization_administrator"
  ]
}

Example API call:

curl -sk -X POST \
  "${GATEWAY_URL}/pty/v1/auth/users" \
  -H 'accept: application/json' \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Content-Type: application/json' \
  -d '{
    "username": "anonymization_admin",
    "email": "anonadmin@example.com",
    "firstName": "Anon",
    "lastName": "User",
    "password": "StrongPassword123!",
    "roles": [
      "anonymization_administrator"
    ]
  }'

1.4 - Protegrity Anonymization Python SDK Installation

Steps to install Python SDK.

Python SDK

The Anonymization service can be accessed programmatically using the Python SDK.

1. Obtain an Authentication Token

export GATEWAY_URL=https://<YOUR_GATEWAY_HOSTNAME>
# Gateway URL  can be obtained using the following command:
# export GATEWAY_URL="https://$(kubectl get configmap/nfa-config -n default -o jsonpath='{.data.FQDN}')"
# Login with the Anon user and get token
TOKEN=$(curl -sk -X POST "${GATEWAY_URL}/api/v1/auth/login/token" \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d 'loginname=anonymization_admin&password=StrongPassword123!' \
-D - -o /dev/null | grep -i 'pty_access_jwt_token' | awk '{print $2}' | tr -d '\r\n')

echo "Access Token: $TOKEN"

Note: Replace default credentials and URLs for production environments.

2. Obtain the Anonymization Python SDK wheel file

curl -sk -X GET -H "Authorization: Bearer $TOKEN" "${GATEWAY_URL}/pty/anonymization/v2/whl" -o anonsdk_dir-1.4.1-py3-none-any.whl

3. Install the SDK in a Python Virtual Environment

pip install anonsdk_dir-1.4.1-py3-none-any.whl

4. Configure SDK Storage

The Python SDK uses intermediate storage to securely exchange data with the Anonymization REST API. Ensure the S3 bucket configured for the Anonymization REST API is accessible to the Python SDK.

Configure the bucket name and access options in the config.yaml file located at $HOME/.pty_anon/config.yaml.

If the directory or file does not exist, create it using the following command.

mkdir -p $HOME/.pty_anon
touch $HOME/.pty_anon/config.yaml

Update config.yaml with the following values:

STORAGE:
  ACCESS_TYPE: 'KEYS'
  CLUSTER_ENDPOINT: s3.amazonaws.com
  BUCKET_NAME: '<YOUR_BUCKET_NAME>'
  ACCESS_KEY: '<AWS_ACCESS_KEY>' 
  SECRET_KEY: '<AWS_SECRET>'

Note: Use static access keys. Temporary session credentials are not supported.

5. Test the Anonymization Python SDK

import anonsdk as asdk

conn = asdk.Connection("<GATEWAY_URL>/", security=asdk.PPCBasedSecurity("anonymization_admin", "StrongPassword123!"))

For example,

conn = asdk.Connection("https://eclipse.aws.protegrity.com/",  security=asdk.PPCBasedSecurity("anonymization_admin", "StrongPassword123!"))

If there is an error while establishing a connection, error appears. Else the connection is established successfully. For more information about SDK usage, refer to Building the request using the Python SDK.

1.5 - Uninstalling and Cleanup Protegrity Anonymization

Steps to uninstall and cleanup Protegrity Anonymization

To remove the Anonymization SDK and all associated Kubernetes resources:

  1. Clear the deployed release.
helm uninstall pty-anonymization -n anon-ns --wait --timeout 300s
  1. Delete the bootstrap credentials secret.
kubectl delete secret/aws-iam-bootstrap-creds -n anon-ns
  1. Delete the persistent volume claim.
kubectl delete pvc/anon-db-persistent-storage-anon-db-depl-0 -n anon-ns
  1. Clear the namespace.
kubectl delete namespace anon-ns  

Optionally clean up IAM roles and OIDC provider created for this deployment, and any S3 artifacts that are no longer needed.

2 - Protegrity Synthetic Data

Using Protegrity Synthetic Data with PPC

Protegrity Synthetic Data is a privacy-enhancing technology that uses real datasets to create artificial data. It does not represent real individuals and has no connection to real people. However, it still provides strong analytical utility and preserves relationships between variables.

For more information about Protegrity Synthetic Data, refer to Protegrity Synthetic Data.

2.1 - Prerequisites

List of Prerequisites for Protegrity Synthetic Data.

Ensure the following prerequisites are met:

  1. AWS Setup:

    • A Protegrity Provisioned Cluster (PPC) is available.
      For more information about PPC, refer to Protegrity Provisioned Cluster.
    • An AWS account with CLI credentials for configuring AWS is available.
    • An existing VPC with at least two private subnets is available.
    • An S3 bucket for storing Synthetic Data artifacts is available. The S3 bucket should not be KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.
    • An IAM role (for example, arn:aws:iam::<Account_ID>:role/<Role_Name>) with the required S3 permissions (s3:ListBucket, s3:GetObject, s3:PutObject, s3:DeleteObject) must exist before installation.
    • Ensure that the jumpbox can connect to the required repositories. If not already authenticated, then log in to the required repository.
  • For connecting and deploying from the Protegrity Container Registry (PCR), use the following command and the credentials obtained from the My.Protegrity portal during account creation:
helm registry login registry.protegrity.com:9443
  • For connecting and deploying to the local repository, use your local credentials and local repository endpoint as required.
  • Obtain the AMI ID for the EKS GPU-optimized image (al2023-x86_64-nvidia-1.34-*) that corresponds to your deployment region.

Note: Each AWS region has a unique AMI ID.

Option A: The following table provides the list of AMI IDs using the image amazon-eks-node-al2023-x86_64-nvidia-1.34-v20260318.

RegionAMI ID
us-east-1ami-0f7f4d7faa23356aa
us-east-2ami-0a141ce97ca2c1af3
us-west-1ami-04a45eb5f6059b9d9
us-west-2ami-00e8faebba1a101ef
ca-central-1ami-02c2ad3c354a88163
eu-central-1ami-0aa92277e9e206598
eu-north-1ami-0874c52f23e149b20
eu-west-1ami-02f2605e47dbbcb50
eu-west-2ami-01e015a107c483424
eu-west-3ami-0cff81abc55208298
ap-south-1ami-01e2773386d0b5694
ap-northeast-1ami-0c8df61d509a15cc0
ap-northeast-2ami-03b2e2c4cf0061b02
ap-northeast-3ami-00e67c624db51074d
ap-southeast-1ami-08b7a3ccd049b8575
ap-southeast-2ami-0037bc089c3a280e9
sa-east-1ami-040480fd2f61a5da1
**Option B**: If your region is not listed in the AMI IDs table, run the following AWS CLI command to find the AMI ID dynamically.  

```bash
aws ec2 describe-images \
    --region <YOUR_REGION> \
    --owners 602401143452 \
    --filters "Name=name,Values=amazon-eks-node-al2023-x86_64-nvidia-1.34-*" \
    --query "sort_by(Images, &CreationDate)[-1].{Id:ImageId,Name:Name,Created:CreationDate}" \
    --output table
```

Note:

  • Synthetic Data requires static IAM access keys for AWS authentication. IRSA (IAM Roles for Service Accounts) is not supported for this release.

  • Create a static access key for an IAM user. These static keys are required to create the Kubernetes secret for S3 access during deployment.

    For more information about creating new access keys for an IAM user, refer to Create new access keys for an IAM user - Amazon Keyspaces.

    Check with your IT department for permission to launch AWS nodes with instanceFamily: "g4dn" and instanceSize: "2xlarge".

  1. Tools:

    • helm and kubectl are installed and configured with access to your Kubernetes cluster.
    • Sufficient permissions to create namespaces, deployments, secrets, and services.

2.2 - Installing Protegrity Synthetic Data

Steps to install Protegrity Synthetic Data

Helm Deployment

This project deploys the Protegrity Synthetic Data stack on Amazon EKS as a Protegrity AI Team Edition Feature. It uses Helm to deploy Kubernetes workloads.

Deployment Steps

1. Prepare Configuration

  1. Create a namespace for the deployment.

    kubectl create namespace syntheticdata-ns
    
  2. Create a Kubernetes secret using the static IAM access keys for S3 bucket access.

    kubectl -n syntheticdata-ns create secret generic synthobjectstore-creds \
    --from-literal=access_key=YOUR_STATIC_ACCESS_KEY_ID \
    --from-literal=secret_key=YOUR_STATIC_SECRET_ACCESS_KEY
    

    Note: Use static access keys, not temporary session credentials, when creating this secret. These keys allow the Synthetic Data service to access the configured S3 bucket.

  3. Create override_values.yaml file with your specific configuration details, such as

     objectstorage:
       endpoint: "s3.us-east-1.amazonaws.com"  # Update the region 
       bucketName: "<>"  # S3 bucket name for storage (must exist before installation)
     image:
       syndataapi_tag: /synthetic-data/1.0/containers/syntheticdata-service:1.0.1.27
       postgres_tag: /shared/containers/postgres/17:37
     karpenter:
       gpu:
         nodeclass:
           amiId: ami-0f7f4d7faa23356aa   # ID for us-east-1. Update based on your region.
    

    Note:

    • Ensure the S3 bucket is not KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.
    • Ensure all necessary parameters are set.

2. Deploy

Run the following command to deploy the stack:

helm install pty-synthetic-data oci://<Container_Registry_Path>/synthetic-data/1.0/helm/syntheticdata-service --version=1.0.1 -n syntheticdata-ns --values override_values.yaml

3. Monitor

  1. Monitor the deployment process using:

    kubectl get pods -n syntheticdata-ns
    

    Verify all pods are in the Running state. The following is the sample output.

    NAME                                            READY   STATUS    RESTARTS   AGE
    pty-synthetic-data-nvidia-device-plugin-5648s   1/1     Running   0          3d17h
    syn-db-depl-0                                   1/1     Running   0          3d17h
    syn-scheduler-depl-6696687695-fcsvj             1/1     Running   0          3d17h
    syn-worker-depl-6bf8dcf965-5w2j2                1/1     Running   0          3d17h
    syn-worker-depl-6bf8dcf965-zr829                1/1     Running   0          3d17h
    syndata-app-depl-6c8cb85f89-rpf5j               1/1     Running   0          3d17h
    
  2. Verify all the Synthetic Data services are deployed.

    kubectl get svc -n syntheticdata-ns
    

    The following is the sample output.

    NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    syn-dask-svc      ClusterIP   172.20.177.37   <none>        8786/TCP   3d17h
    syn-db-svc        ClusterIP   172.20.208.6    <none>        5432/TCP   3d17h
    syndata-app-svc   ClusterIP   172.20.231.58   <none>        8095/TCP   3d17h
    

For more information about building the REST API request, refer to Building the Request Using the REST API.

2.3 - Configuring Protegrity Synthetic Data

Steps to configure Protegrity Synthetic Data.

Update Role Permission and Create User

After deployment, update the default syntheticdata_administrator role to include can_create_token permission, then create a user with this role.

Step 1: Update syntheticdata_administrator role permission

export GATEWAY_URL="https://$(kubectl get configmap/nfa-config -n default -o jsonpath='{.data.FQDN}')"
# 1. Obtain an Authentication Token
TOKEN=$(curl -sk -X POST "${GATEWAY_URL}/api/v1/auth/login/token" \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'loginname=admin&password=Admin123!' \
  -D - -o /dev/null | grep -i 'pty_access_jwt_token' | awk '{print $2}' | tr -d '\r\n')

curl -sk -X PUT \
  "${GATEWAY_URL}/pty/v1/auth/roles" \
  -H 'accept: application/json' \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "syntheticdata_administrator",
    "description": "Administrator role",
    "permissions": [
      "can_create_token",
      "syntheticdata_operations_admin"
    ]
  }'

Step 2: Create user with syntheticdata_administrator role attached

Use the following request payload when creating the user:

{
  "username": "syntheticdata_admin",
  "email": "syntheticdata_admin@example.com",
  "firstName": "SyntheticData",
  "lastName": "User",
  "password": "StrongPassword123!",
  "roles": [
    "syntheticdata_administrator"
  ]
}

Example API call:

curl -sk -X POST \
  "${GATEWAY_URL}/pty/v1/auth/users" \
  -H 'accept: application/json' \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Content-Type: application/json' \
  -d '{
    "username": "syntheticdata_admin",
    "email": "syntheticdata_admin@example.com",
    "firstName": "SyntheticData",
    "lastName": "User",
    "password": "StrongPassword123!",
    "roles": [
      "syntheticdata_administrator"
    ]
  }'

2.4 - Uninstalling and Cleanup Protegrity Synthetic Data

Steps to uninstall and cleanup Protegrity Synthetic Data

To remove the Synthetic Data and all associated Kubernetes resources:

  1. Clear the deployed release.
helm uninstall pty-synthetic-data -n syntheticdata-ns --wait --timeout 420s
  1. Delete the S3 credentials secret.
kubectl delete secret/synthobjectstore-creds -n syntheticdata-ns
  1. Delete the persistent volume claim.
kubectl delete pvc/syn-db-persistent-storage-syn-db-depl-0 -n syntheticdata-ns
  1. Clear the namespace.
kubectl delete namespace syntheticdata-ns  

Optionally clean up any S3 artifacts that are no longer needed.