Metrics Knowledge Base

This describes the different synthetic data metrics used in evaluation with their definitions and explanations.

Propensity Score Mean Squared Error (PMSE)

Measures how well synthetic data mimics real data by testing whether a machine learning classifier can distinguish between them.

How it works: Combines real and synthetic data with labels, trains a Random Forest classifier using cross-validation, and calculates the squared difference between predicted probabilities and actual proportions.

Interpretation:

0.0 → Perfect similarity, classifier cannot distinguish synthetic from real data
0.25 → Poor similarity, classifier easily detects synthetic data

Why it matters: Provides a robust, general-purpose utility metric for tabular synthetic data that works across different feature types and preserves complex relationships.

Jensen-Shannon Divergence (Information Preserved)

Measures how much statistical information from the original data is preserved in synthetic data by comparing probability distributions column-by-column.

How it works: Splits data into folds, calculates JS divergence for categorical and continuous columns separately, then converts divergence into an information preservation score and aggregates results.

Interpretation:

1.0 → Perfect information preservation
0.0 → Complete information loss

Why it matters: Provides column-level granularity to identify which features are poorly synthesized and ensures downstream analyses remain statistically valid.

Sensitive Attribute Reconstruction Potential (SARP)

Measures privacy risk by quantifying whether synthetic data makes sensitive attributes easier to reconstruct from quasi-identifiers, compared to the original dataset.

How it works: Identifies quasi-identifiers, trains a classifier to predict each sensitive attribute from quasi-identifiers, then compares prediction accuracy between synthetic and real data.

Interpretation:

0.0 → No privacy degradation
1.0 → Maximum privacy risk increase

Why it matters: Essential for privacy compliance and protects against adversaries trying to reconstruct sensitive information from synthetic datasets.

Fairness

Evaluates whether synthetic data maintains fairness across different confounder groups by measuring consistency in data quality and similarity preservation between subgroups.

How it works: Splits data by confounder columns (e.g., race, gender, age groups), applies propensity score analysis within each group, then compares similarity scores across all groups.

Interpretation:

0.0 → Perfect fairness parity across all groups
1.0 → Maximum disparity between groups

Why it matters: Ensures synthetic data doesn’t amplify existing biases and prevents downstream models from discriminating against protected groups.

TabSynDexScore (Composite Score)

Provides a comprehensive assessment of synthetic tabular data quality by combining statistical fidelity, correlation preservation, and distributional similarity into a unified score.

How it works: Combines three equally-weighted components: statistical similarity (comparing means, standard deviations, medians), correlation preservation (comparing relationship matrices), and distinguishability testing (training a classifier to tell real from synthetic).

Interpretation:

1.0 → Excellent quality, synthetic data nearly indistinguishable from real
0.0 → Poor quality, synthetic data generation requires major improvements

Why it matters: Captures multiple aspects of data quality in one metric, enabling easy comparison across datasets with granular component breakdown for targeted improvements.

Membership Inference Attack (MIA)

Measures privacy risk by simulating attacks that attempt to determine whether specific individuals’ records were used in the training dataset that generated the synthetic data.

How it works: Simulates a skilled attacker who tries to match synthetic records to real ones using quasi-identifiers, then trains a model to predict whether a given real record was part of the dataset used to generate synthetic data.

Interpretation:

0.0 → Minimal privacy risk, synthetic data reveals little about real individuals
1.0 → Maximum privacy risk, synthetic data strongly indicates whether someone’s record was used

Why it matters: Essential for privacy compliance (GDPR, HIPAA) and prevents inference about specific people’s participation in datasets.

Association Risk

Quantifies privacy risk by measuring how easily an attacker could link individual records between datasets, simulating re-identification attacks on anonymized or synthetic data.

How it works: Preprocesses datasets into comparable form, computes association probabilities using distance-based or model-based approaches, then reports the highest observed probability as the risk score.

Interpretation:

0.0 → No linkage risk
1.0 → Maximum association probability, critical re-identification risk

Why it matters: Essential for privacy compliance and protects against linkage attacks using auxiliary datasets, providing measurable privacy guarantees for data releases.

Feedback

Was this page helpful?

Last modified : September 30, 2025