Supported Models

Describes multiple generative modeling techniques.

Models supported by Protegrity Synthetic Data 1.0.1

Protegrity Synthetic Data 1.0.1 supports tabular synthetic data generation using GAN‑based models, including TVAE and diffusion‑based techniques. These models are used to generate privacy-safe synthetic tabular data while preserving:

Column types and schema compatibility
Statistical distributions
Relationships and correlations between variables
Utility for analytics and ML workloads

The following are the modeling techniques:

Generative Adversarial Networks (GANs) – It is considered as a primary approach which is used to learn the structure and statistical properties of real tabular datasets and generate Synthetic Data.
Tabular Variational Autoencoders (TVAE) – It is explicitly listed as a supported technique for Synthetic Data generation.
Diffusion-based models – It is also explicitly mentioned as a supported Synthetic Data generation.

All three models learn from the structure and statistical properties of real datasets, but they differ in how they learn and generate data, and in the trade‑offs they offer. These models have some inherent limitations. They require sufficient input data to train reliably. They are slower than anonymization or pseudonymization techniques and cannot be used in scenarios that require re‑identification or record‑level traceability. Model training and maintenance introduce moderate cost and operational overhead, and data fidelity is statistical rather than exact, particularly for rare or highly constrained patterns.

Switching between the Protegrity Synthetic Data Models

Step 1: Decide the target model type

Protegrity Synthetic Data supports multiple generative model types, including:

GAN‑based models
Diffusion‑based models
Model selection is controlled using the request configuration, not by modifying an existing trained model.

Step 2: Update the request payload to specify the model type

When building a Protegrity Synthetic Data generation request, use the typeHint field to explicitly select the model.

Use the following to switch to a diffusion model:

"typeHint": {
  "model_type": "tabdiff"
}

Note: If typeHint is not specified, the system may automatically determine the most appropriate model during training.

Step 3: Submit the updated request and trigger model training

Switching models requires a new training run. Protegrity Synthetic Data follows a structured pipeline that includes:

Configuration validation
Automatic preprocessing
Training of the Protegrity Synthetic Data generator model
Evaluation against real data
Protegrity Synthetic Data generation

Note: Training is not instantaneous and can take from minutes to hours depending on configuration and data size.

Step 4: Review evaluation results for the new model

After switching models and generating data:

Review evaluation and similarity metrics.
Validate privacy protection and analytical utility.

Protegrity Synthetic Data explicitly evaluates generated data against the real dataset as part of the workflow.

Step 5: Version or archive models as needed

This is an optional step. Protegrity Synthetic Data provides model management capabilities to track and manage trained models. Each training run produces a separate model artifact, which can be reused or archived independently.

Feedback

Was this page helpful?

Last modified : March 24, 2026