Protegrity Synthetic Data Architecture

Communication between Protegrity Synthetic Data, the Dask Scheduler, and Dask Workers is detailed in this section.

An overview of the communication is shown in the following figure. Synthetic Data Components

The Synthetic Data system includes the following core components:

Key Pods and Services

  • Synthetic Data App Pod

    • Orchestrates Synthetic Data generation.
  • MLFlow Pod

    • Captures model training and evaluation.
    • Hosted in containers for scalability.
  • MinIO Pod

    • Stores models, model artifacts, and generated reports.
    • Used by both MLFlow and Synthetic Data App pods.
  • SQL Database Server Pod

    • Provides storage for MLFlow experiments metadata.

Data Generation Interfaces

Synthetic Data can be generated using:

  • REST APIs
  • Swagger UI

These interfaces allow developers and data scientists to interact with the system programmatically or visually.

Access and Networking

Users access the Protegrity Synthetic Data using HTTP over default port 8095 and other services using the following ports:

PortCommunication Path
5000MLFlow pod
5432SQL Database Server
8095Protegrity Synthetic Data Service
9000MinIO

Cloud Hosting Options

Like the Protegrity Anonymization API, the entire Synthetic Data API can be hosted using any cloud-provided Kubernetes service, including:

  • Amazon Elastic Kubernetes Service (EKS)
  • Google Kubernetes Engine (GKE)
  • Microsoft Azure Kubernetes Service (AKS)
  • Red Hat OpenShift
  • Other Kubernetes platforms

This flexibility allows organizations to scale Synthetic Data generation securely across environments.


Last modified : November 07, 2025