High-Level Workflow

Mon, 01 Jan 0001 00:00:00 +0000

The Protegrity Synthetic Data follows a structured pipeline to generate Synthetic Data:

Configuration Validation
Optimal Real Data Usage
Automatic Data Preprocessing
Training of Protegrity Synthetic Data Generator Model
Evaluation Against Real Data
Protegrity Synthetic Data Generation
Machine Learning Operations

Configuration Validation

Training Protegrity Synthetic Data generators is a slow process, taking from a couple of minutes to several hours depending on the configurations used. To optimize compute time, several validations are proactively done to ensure a valid configuration before any training takes place. If any violations are found, descriptive exceptions are returned to the user.

Building the Request Using the REST API

Mon, 01 Jan 0001 00:00:00 +0000

Identifying the Source and Target

In this step, you specify the source real dataset from which you wish to produce Protegrity Synthetic Data and a target, where corresponding Synthetic Data will be saved.

The following file formats are supported:

Comma separated values (CSV)

The following data storages have been tested for Protegrity Synthetic Data:

Local File System
Amazon S3

The following data storage types can also be used for the Protegrity Synthetic Data:

Building the Protegrity Synthetic Data Request on

High-Level Workflow

Configuration Validation

Building the Request Using the REST API

Identifying the Source and Target