Supported Models
Models supported by Protegrity Synthetic Data 1.0.1
Protegrity Synthetic Data 1.0.1 supports tabular synthetic data generation using GAN‑based models, including TVAE and diffusion‑based techniques. These models are used to generate privacy-safe synthetic tabular data while preserving:
- Column types and schema compatibility
- Statistical distributions
- Relationships and correlations between variables
- Utility for analytics and ML workloads
The following are the modeling techniques:
Generative Adversarial Networks (GANs) – It is considered as a primary approach which is used to learn the structure and statistical properties of real tabular datasets and generate Synthetic Data.
Tabular Variational Autoencoders (TVAE) – It is explicitly listed as a supported technique for Synthetic Data generation.
Diffusion-based models – It is also explicitly mentioned as a supported Synthetic Data generation.
All three models learn from the structure and statistical properties of real datasets, but they differ in how they learn and generate data, and in the trade‑offs they offer. These models have some inherent limitations. They require sufficient input data to train reliably. They are slower than anonymization or pseudonymization techniques and cannot be used in scenarios that require re‑identification or record‑level traceability. Model training and maintenance introduce moderate cost and operational overhead, and data fidelity is statistical rather than exact, particularly for rare or highly constrained patterns.
Switching between the Protegrity Synthetic Data Models
Step 1: Decide the target model type
Protegrity Synthetic Data supports multiple generative model types, including:
- GAN‑based models
- Diffusion‑based models
Model selection is controlled using the request configuration, not by modifying an existing trained model.
Step 2: Update the request payload to specify the model type
When building a Protegrity Synthetic Data generation request, use the typeHint field to explicitly select the model.
Use the following to switch to a diffusion model:
"typeHint": {
"model_type": "tabdiff"
}
Note: If
typeHintis not specified, the system may automatically determine the most appropriate model during training.
Step 3: Submit the updated request and trigger model training
Switching models requires a new training run. Protegrity Synthetic Data follows a structured pipeline that includes:
- Configuration validation
- Automatic preprocessing
- Training of the Protegrity Synthetic Data generator model
- Evaluation against real data
- Protegrity Synthetic Data generation
Note: Training is not instantaneous and can take from minutes to hours depending on configuration and data size.
Step 4: Review evaluation results for the new model
After switching models and generating data:
- Review evaluation and similarity metrics.
- Validate privacy protection and analytical utility.
Protegrity Synthetic Data explicitly evaluates generated data against the real dataset as part of the workflow.
Step 5: Version or archive models as needed
This is an optional step. Protegrity Synthetic Data provides model management capabilities to track and manage trained models. Each training run produces a separate model artifact, which can be reused or archived independently.
Feedback
Was this page helpful?