Protegrity Synthetic Data Architecture
Communication between Protegrity Synthetic Data, the Dask Scheduler, and Dask Workers is detailed in this section.
An overview of the communication is shown in the following figure.

The Synthetic Data system includes the following core components:
Key Pods and Services
Synthetic Data App Pod
- Orchestrates Synthetic Data generation.
MLFlow Pod
- Captures model training and evaluation.
- Hosted in containers for scalability.
MinIO Pod
- Stores models, model artifacts, and generated reports.
- Used by both MLFlow and Synthetic Data App pods.
SQL Database Server Pod
- Provides storage for MLFlow experiments metadata.
Data Generation Interfaces
Synthetic Data can be generated using:
- REST APIs
- Swagger UI
These interfaces allow developers and data scientists to interact with the system programmatically or visually.
Access and Networking
Users access the Protegrity Synthetic Data using HTTP over default port 8095 and other services using the following ports:
| Port | Communication Path |
|---|---|
| 5000 | MLFlow pod |
| 5432 | SQL Database Server |
| 8095 | Protegrity Synthetic Data Service |
| 9000 | MinIO |
Cloud Hosting Options
Like the Protegrity Anonymization API, the entire Synthetic Data API can be hosted using any cloud-provided Kubernetes service, including:
- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)
- Red Hat OpenShift
- Other Kubernetes platforms
This flexibility allows organizations to scale Synthetic Data generation securely across environments.
Feedback
Was this page helpful?