Protegrity Synthetic Data
Using Protegrity Synthetic Data with PPC
Protegrity Synthetic Data is a privacy-enhancing technology that uses real datasets to create artificial data. It does not represent real individuals and has no connection to real people. However, it still provides strong analytical utility and preserves relationships between variables.
For more information about Protegrity Synthetic Data, refer to Protegrity Synthetic Data.
1 - Prerequisites
List of Prerequisites for Protegrity Synthetic Data.
Ensure the following prerequisites are met:
AWS Setup:
- A Protegrity Provisioned Cluster (PPC) is available.
For more information about PPC, refer to Protegrity Provisioned Cluster. - An AWS account with CLI credentials for configuring AWS is available.
- An existing VPC with at least two private subnets is available.
- An S3 bucket for storing Synthetic Data artifacts is available. The S3 bucket should not be KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.
- An IAM role (for example,
arn:aws:iam::<Account_ID>:role/<Role_Name>) with the required S3 permissions (s3:ListBucket, s3:GetObject, s3:PutObject, s3:DeleteObject) must exist before installation. - Ensure that the jumpbox can connect to the required repositories. If not already authenticated, then log in to the required repository.
- For connecting and deploying from the Protegrity Container Registry (PCR), use the following command and the credentials obtained from the My.Protegrity portal during account creation:
helm registry login registry.protegrity.com:9443
- For connecting and deploying to the local repository, use your local credentials and local repository endpoint as required.
- Obtain the AMI ID for the EKS GPU-optimized image (
al2023-x86_64-nvidia-1.34-*) that corresponds to your deployment region.
Note: Each AWS region has a unique AMI ID.
Option A: The following table provides the list of AMI IDs using the image amazon-eks-node-al2023-x86_64-nvidia-1.34-v20260318.
| Region | AMI ID |
|---|
| us-east-1 | ami-0f7f4d7faa23356aa |
| us-east-2 | ami-0a141ce97ca2c1af3 |
| us-west-1 | ami-04a45eb5f6059b9d9 |
| us-west-2 | ami-00e8faebba1a101ef |
| ca-central-1 | ami-02c2ad3c354a88163 |
| eu-central-1 | ami-0aa92277e9e206598 |
| eu-north-1 | ami-0874c52f23e149b20 |
| eu-west-1 | ami-02f2605e47dbbcb50 |
| eu-west-2 | ami-01e015a107c483424 |
| eu-west-3 | ami-0cff81abc55208298 |
| ap-south-1 | ami-01e2773386d0b5694 |
| ap-northeast-1 | ami-0c8df61d509a15cc0 |
| ap-northeast-2 | ami-03b2e2c4cf0061b02 |
| ap-northeast-3 | ami-00e67c624db51074d |
| ap-southeast-1 | ami-08b7a3ccd049b8575 |
| ap-southeast-2 | ami-0037bc089c3a280e9 |
| sa-east-1 | ami-040480fd2f61a5da1 |
**Option B**: If your region is not listed in the AMI IDs table, run the following AWS CLI command to find the AMI ID dynamically.
```bash
aws ec2 describe-images \
--region <YOUR_REGION> \
--owners 602401143452 \
--filters "Name=name,Values=amazon-eks-node-al2023-x86_64-nvidia-1.34-*" \
--query "sort_by(Images, &CreationDate)[-1].{Id:ImageId,Name:Name,Created:CreationDate}" \
--output table
```
Note:
Synthetic Data requires static IAM access keys for AWS authentication. IRSA (IAM Roles for Service Accounts) is not supported for this release.
Create a static access key for an IAM user. These static keys are required to create the Kubernetes secret for S3 access during deployment.
For more information about creating new access keys for an IAM user, refer to Create new access keys for an IAM user - Amazon Keyspaces.
Check with your IT department for permission to launch AWS nodes with instanceFamily: "g4dn" and instanceSize: "2xlarge".
Tools:
helm and kubectl are installed and configured with access to your Kubernetes cluster.- Sufficient permissions to create namespaces, deployments, secrets, and services.
2 - Installing Protegrity Synthetic Data
Steps to install Protegrity Synthetic Data
Helm Deployment
This project deploys the Protegrity Synthetic Data stack on Amazon EKS as a Protegrity AI Team Edition Feature.
It uses Helm to deploy Kubernetes workloads.
Deployment Steps
1. Prepare Configuration
Create a namespace for the deployment.
kubectl create namespace syntheticdata-ns
Create a Kubernetes secret using the static IAM access keys for S3 bucket access.
kubectl -n syntheticdata-ns create secret generic synthobjectstore-creds \
--from-literal=access_key=YOUR_STATIC_ACCESS_KEY_ID \
--from-literal=secret_key=YOUR_STATIC_SECRET_ACCESS_KEY
Note: Use static access keys, not temporary session credentials, when creating this secret. These keys allow the Synthetic Data service to access the configured S3 bucket.
Create override_values.yaml file with your specific configuration details, such as
objectstorage:
endpoint: "s3.us-east-1.amazonaws.com" # Update the region
bucketName: "<>" # S3 bucket name for storage (must exist before installation)
image:
syndataapi_tag: /synthetic-data/1.0/containers/syntheticdata-service:1.0.1.27
postgres_tag: /shared/containers/postgres/17:37
karpenter:
gpu:
nodeclass:
amiId: ami-0f7f4d7faa23356aa # ID for us-east-1. Update based on your region.
Note:
- Ensure the S3 bucket is not KMS encrypted. The bucket must use default SSE-S3 encryption or no encryption.
- Ensure all necessary parameters are set.
2. Deploy
Run the following command to deploy the stack:
helm install pty-synthetic-data oci://<Container_Registry_Path>/synthetic-data/1.0/helm/syntheticdata-service --version=1.0.1 -n syntheticdata-ns --values override_values.yaml
3. Monitor
Monitor the deployment process using:
kubectl get pods -n syntheticdata-ns
Verify all pods are in the Running state. The following is the sample output.
NAME READY STATUS RESTARTS AGE
pty-synthetic-data-nvidia-device-plugin-5648s 1/1 Running 0 3d17h
syn-db-depl-0 1/1 Running 0 3d17h
syn-scheduler-depl-6696687695-fcsvj 1/1 Running 0 3d17h
syn-worker-depl-6bf8dcf965-5w2j2 1/1 Running 0 3d17h
syn-worker-depl-6bf8dcf965-zr829 1/1 Running 0 3d17h
syndata-app-depl-6c8cb85f89-rpf5j 1/1 Running 0 3d17h
Verify all the Synthetic Data services are deployed.
kubectl get svc -n syntheticdata-ns
The following is the sample output.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
syn-dask-svc ClusterIP 172.20.177.37 <none> 8786/TCP 3d17h
syn-db-svc ClusterIP 172.20.208.6 <none> 5432/TCP 3d17h
syndata-app-svc ClusterIP 172.20.231.58 <none> 8095/TCP 3d17h
For more information about building the REST API request, refer to Building the Request Using the REST API.
3 - Configuring Protegrity Synthetic Data
Steps to configure Protegrity Synthetic Data.
Update Role Permission and Create User
After deployment, update the default syntheticdata_administrator role to include can_create_token permission, then create a user with this role.
Step 1: Update syntheticdata_administrator role permission
export GATEWAY_URL="https://$(kubectl get configmap/nfa-config -n default -o jsonpath='{.data.FQDN}')"
# 1. Obtain an Authentication Token
TOKEN=$(curl -sk -X POST "${GATEWAY_URL}/api/v1/auth/login/token" \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d 'loginname=admin&password=Admin123!' \
-D - -o /dev/null | grep -i 'pty_access_jwt_token' | awk '{print $2}' | tr -d '\r\n')
curl -sk -X PUT \
"${GATEWAY_URL}/pty/v1/auth/roles" \
-H 'accept: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-H 'Content-Type: application/json' \
-d '{
"name": "syntheticdata_administrator",
"description": "Administrator role",
"permissions": [
"can_create_token",
"syntheticdata_operations_admin"
]
}'
Step 2: Create user with syntheticdata_administrator role attached
Use the following request payload when creating the user:
{
"username": "syntheticdata_admin",
"email": "syntheticdata_admin@example.com",
"firstName": "SyntheticData",
"lastName": "User",
"password": "StrongPassword123!",
"roles": [
"syntheticdata_administrator"
]
}
Example API call:
curl -sk -X POST \
"${GATEWAY_URL}/pty/v1/auth/users" \
-H 'accept: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-H 'Content-Type: application/json' \
-d '{
"username": "syntheticdata_admin",
"email": "syntheticdata_admin@example.com",
"firstName": "SyntheticData",
"lastName": "User",
"password": "StrongPassword123!",
"roles": [
"syntheticdata_administrator"
]
}'
4 - Uninstalling and Cleanup Protegrity Synthetic Data
Steps to uninstall and cleanup Protegrity Synthetic Data
To remove the Synthetic Data and all associated Kubernetes resources:
- Clear the deployed release.
helm uninstall pty-synthetic-data -n syntheticdata-ns --wait --timeout 420s
- Delete the S3 credentials secret.
kubectl delete secret/synthobjectstore-creds -n syntheticdata-ns
- Delete the persistent volume claim.
kubectl delete pvc/syn-db-persistent-storage-syn-db-depl-0 -n syntheticdata-ns
- Clear the namespace.
kubectl delete namespace syntheticdata-ns
Optionally clean up any S3 artifacts that are no longer needed.