This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

About Protegrity Synthetic Data

Summary of the Protegrity Synthetic Data architecture, including its components, communication flow, access methods, and hosting options.

Protegrity’s Synthetic Data solution is a Synthetic Data generator which generates artificial data that is realistic, statistically accurate, and privacy-safe. This data unlocks the full potential of AI and analytics. By creating entirely new data that mirrors the patterns of your original datasets but contains no sensitive information you can train and test AI models without risk. You can also scale these models without exposure or compliance violations.

1 - Protegrity Synthetic Data Architecture

Communication between Protegrity Synthetic Data, the Dask Scheduler, and Dask Workers is detailed in this section.

An overview of the communication is shown in the following figure. Synthetic Data Components

The Synthetic Data system includes the following core components:

Key Pods and Services

  • Synthetic Data App Pod

    • Orchestrates Synthetic Data generation.
  • MLFlow Pod

    • Captures model training and evaluation.
    • Hosted in containers for scalability.
  • MinIO Pod

    • Stores models, model artifacts, and generated reports.
    • Used by both MLFlow and Synthetic Data App pods.
  • SQL Database Server Pod

    • Provides storage for MLFlow experiments metadata.

Data Generation Interfaces

Synthetic Data can be generated using:

  • REST APIs
  • Swagger UI

These interfaces allow developers and data scientists to interact with the system programmatically or visually.

Access and Networking

Users access the Protegrity Synthetic Data using HTTP over default port 8095 and other services using the following ports:

PortCommunication Path
5000MLFlow pod
5432SQL Database Server
8095Protegrity Synthetic Data Service
9000MinIO

Cloud Hosting Options

Like the Protegrity Anonymization API, the entire Synthetic Data API can be hosted using any cloud-provided Kubernetes service, including:

  • Amazon Elastic Kubernetes Service (EKS)
  • Google Kubernetes Engine (GKE)
  • Microsoft Azure Kubernetes Service (AKS)
  • Red Hat OpenShift
  • Other Kubernetes platforms

This flexibility allows organizations to scale Synthetic Data generation securely across environments.