Introduction on

Privacy-Preserving Characteristics

Mon, 01 Jan 0001 00:00:00 +0000

No Direct Link to Real Individuals

Protegrity Synthetic Data is generated from learned patterns in real datasets but does not contain any actual personal records. This ensures:

No 1:1 mapping between synthetic and real data.
No re-identification risk, even when used in sensitive domains, such as healthcare or finance.

Compliance with Privacy Regulations

General Data Protection Regulation (GDPR): Synthetic Data is considered anonymous under GDPR. It lacks identifiable links to real individuals.
Health Insurance Portability and Accountability Act (HIPAA): It qualifies under Safe Harbor and Expert Determination methods. This makes it suitable for healthcare data use, without being classified as Protected Health Information (PHI).

Built-In Privacy Safeguards

Protegrity’s Synthetic Data solution includes multiple privacy-enhancing features:

Comparison with Other Privacy-Enhancing Technologies

Mon, 01 Jan 0001 00:00:00 +0000

The following section provides details about Protegrity Synthetic Data and other data protection methods.

Pseudonymization replaces real data with tokens for certain attributes, such as Personally Identifiable Information (PII). However, this method still uses real data, and the analytical value is perfect unless other attributes are tokenized.
Anonymization reduces the risk of reidentification by transforming quasi-identifiers. However, careful balancing of utility and privacy is needed to minimize the impact on downstream usage.

Protegrity Synthetic Data Overview

Mon, 01 Jan 0001 00:00:00 +0000

Protegrity Synthetic Data is a privacy-enhancing technology that uses real datasets to create artificial data. It does not represent real individuals and has no connection to real people. However, it still provides strong analytical utility and preserves relationships between variables.

Key Characteristics of Protegrity Synthetic Data

Feature	Synthetic Data
Represents real people	False. It has no direct link to real individuals.
Closeness to real individuals	Low. It preserves relationships between variables and real data.
Analytics and advanced analytics	High utility. It is suitable for ML, forecasting, and testing.
Maintain data types	Guaranteed. It preserves the schema compatibility.
Internal and external sharing	Possible. It is compliant with privacy regulations like GDPR and HIPAA.
Simulating rare scenarios	Possible. It simulates rare scenarios, fraud patterns, or edge cases not present in production.
Risk of re-identification	Low. It minimizes the risk of re-identification compared to Anonymization or Pseudonymization.
Data progression	Possible. It can be used to create data trends that might change over time.
Cost	Moderate. It incurs varying costs depending on the complexity of the data and the synthesis methods used.
Scalability	High. It can be generated in large volumes as needed.
Maintenance	Moderate. It requires periodic updates to reflect changes in real data.

Protegrity Synthetic Data is a powerful tool for privacy compliance. It:

How Protegrity Synthetic Data is Generated

Mon, 01 Jan 0001 00:00:00 +0000

Protegrity Synthetic Data is a privacy-enhancing technology that creates artificial datasets. It works by learning from the structure and statistical properties of real data. It is designed to preserve analytical utility while protecting individual privacy. The process involves three key stages:

Stage 1: Extract Characteristics from Original Data

The system analyzes the original dataset to understand its structure and relationships:

Characteristics	Examples
Column types	string, integer, categorical
Value distributions	age ranges, frequency of pet types
Relationships between variables	age and pet ownership patterns

Stage 2: Generate Fictional Records

Based on the extracted characteristics, synthetic records are created using advanced modeling techniques: