This is the multi-page printable view of this section. Click here to print.
Using Sample Protegrity Anonymization Jobs
1 - Sample Data Sets
Adult Dataset: Here is an extract of the dataset, the complete dataset can be found in the adult.csv file in the samples directory. Adult Dataset: Here is an extract of the dataset, the complete dataset can be found in the adult.csv file in the samples directory.
sex;age;race;marital-status;education;native-country;citizenSince;weight;workclass;occupation;salary-class
Male;39;White;Never-married;Bachelors;United-States;08-01-1971;185.38;State-gov;Adm-clerical;<=50K
Male;50;White;Married-civ-spouse;Bachelors;United-States;19-04-1960;176.32;Self-emp-not-inc;Exec-managerial;<=50K
Male;38;White;Divorced;HS-grad;United-States;07-12-1971;159.13;Private;Handlers-cleaners;<=50K
Male;53;Black;Married-civ-spouse;11th;United-States;22-05-1957;170.45;Private;Handlers-cleaners;<=50K
Female;28;Black;Married-civ-spouse;Bachelors;Cuba;03-02-1982;178.79;Private;Prof-specialty;<=50K
Female;37;White;Married-civ-spouse;Masters;United-States;06-12-1972;161.65;Private;Exec-managerial;<=50K
Female;49;Black;Married-spouse-absent;9th;Jamaica;18-04-1961;162.73;Private;Other-service;<=50K
Male;52;White;Married-civ-spouse;HS-grad;United-States;21-05-1958;171.75;Self-emp-not-inc;Exec-managerial;>50K
Female;31;White;Never-married;Masters;United-States;31-12-1978;164.03;Private;Prof-specialty;>50K
Male;42;White;Married-civ-spouse;Bachelors;United-States;11-02-1968;186.33;Private;Exec-managerial;>50K
Male;37;Black;Married-civ-spouse;Some-college;United-States;06-12-1972;189.49;Private;Exec-managerial;>50K
Male;30;Asian-Pac-Islander;Married-civ-spouse;Bachelors;India;01-02-1980;178.70;State-gov;Prof-specialty;>50K
Female;23;White;Never-married;Bachelors;United-States;08-04-1987;183.22;Private;Adm-clerical;<=50K
Male;32;Black;Never-married;Assoc-acdm;United-States;01-01-1978;156.63;Private;Sales;<=50K
Male;34;Amer-Indian-Eskimo;Married-civ-spouse;7th-8th;Mexico;03-12-1975;173.41;Private;Transport-moving;<=50K
Male;25;White;Never-married;HS-grad;United-States;06-03-1985;170.72;Self-emp-not-inc;Farming-fishing;<=50K
Male;32;White;Never-married;HS-grad;United-States;01-01-1978;174.91;Private;Machine-op-inspct;<=50K
Male;38;White;Married-civ-spouse;11th;United-States;07-12-1971;176.47;Private;Sales;<=50K
Female;43;White;Divorced;Masters;United-States;12-02-1967;179.88;Self-emp-not-inc;Exec-managerial;>50K
Male;40;White;Married-civ-spouse;Doctorate;United-States;09-01-1970;170.80;Private;Prof-specialty;>50K
Female;54;Black;Separated;HS-grad;United-States;23-06-1956;171.61;Private;Other-service;<=50K
Male;35;Black;Married-civ-spouse;9th;United-States;04-12-1974;183.71;Federal-gov;Farming-fishing;<=50K
Male;43;White;Married-civ-spouse;11th;United-States;12-02-1967;158.63;Private;Transport-moving;<=50K
Female;59;White;Divorced;HS-grad;United-States;28-07-1951;181.64;Private;Tech-support;<=50K
Male;56;White;Married-civ-spouse;Bachelors;United-States;25-06-1954;171.80;Local-gov;Tech-support;>50K
Male;19;White;Never-married;HS-grad;United-States;12-05-1991;172.74;Private;Craft-repair;<=50K
Male;39;White;Divorced;HS-grad;United-States;08-01-1971;159.41;Private;Exec-managerial;<=50K
Male;49;White;Married-civ-spouse;HS-grad;United-States;18-04-1961;176.76;Private;Craft-repair;<=50K
Male;23;White;Never-married;Assoc-acdm;United-States;08-04-1987;164.43;Local-gov;Protective-serv;<=50K
Male;20;Black;Never-married;Some-college;United-States;11-05-1990;157.60;Private;Sales;<=50K
Male;45;White;Divorced;Bachelors;United-States;14-03-1965;176.38;Private;Exec-managerial;<=50K
Male;30;White;Married-civ-spouse;Some-college;United-States;01-02-1980;160.60;Federal-gov;Adm-clerical;<=50K
Male;22;Black;Married-civ-spouse;Some-college;United-States;09-04-1988;173.41;State-gov;Other-service;<=50K
Male;48;White;Never-married;11th;Puerto-Rico;17-04-1962;189.50;Private;Machine-op-inspct;<=50K
Male;21;White;Never-married;Some-college;United-States;10-05-1989;162.76;Private;Machine-op-inspct;<=50K
Female;19;White;Married-AF-spouse;HS-grad;United-States;12-05-1991;158.42;Private;Adm-clerical;<=50K
Male;48;White;Married-civ-spouse;Assoc-acdm;United-States;17-04-1962;160.75;Self-emp-not-inc;Prof-specialty;<=50K
Male;31;White;Married-civ-spouse;9th;United-States;31-12-1978;172.10;Private;Machine-op-inspct;<=50K
Male;53;White;Married-civ-spouse;Bachelors;United-States;22-05-1957;189.74;Self-emp-not-inc;Prof-specialty;<=50K
Male;24;White;Married-civ-spouse;Bachelors;United-States;07-04-1986;170.08;Private;Tech-support;<=50K
Female;49;White;Separated;HS-grad;United-States;18-04-1961;173.71;Private;Adm-clerical;<=50K
Male;25;White;Never-married;HS-grad;United-States;06-03-1985;160.52;Private;Handlers-cleaners;<=50K
Male;57;Black;Married-civ-spouse;Bachelors;United-States;26-07-1953;178.12;Federal-gov;Prof-specialty;>50K
Male;53;White;Married-civ-spouse;HS-grad;United-States;22-05-1957;186.11;Private;Machine-op-inspct;<=50K
Female;44;White;Divorced;Masters;United-States;13-02-1966;162.80;Private;Exec-managerial;<=50K
Male;41;White;Married-civ-spouse;Assoc-voc;United-States;10-01-1969;172.39;State-gov;Craft-repair;<=50K
Male;29;White;Never-married;Assoc-voc;United-States;02-02-1981;168.83;Private;Prof-specialty;<=50K
Female;25;Other;Married-civ-spouse;Some-college;United-States;06-03-1985;179.12;Private;Exec-managerial;<=50K
Female;47;White;Married-civ-spouse;Prof-school;Honduras;16-03-1963;163.02;Private;Prof-specialty;>50K
Male;50;White;Divorced;Bachelors;United-States;19-04-1960;172.18;Federal-gov;Exec-managerial;>50K
2 - Sample Requests for Protegrity Anonymization
Tree-based Aggregation for Attributes with k-Anonymity
This sample uses the following attributes:
- Source: Local file system
- Target: Amazon S3 bucket
- Data set: 1 Quasi Identifier
- Suppression: 0.01
- Privacy Model: K-Anonimity with k value as 50
In this example, the data has custom delimiters.
{
"source": {
"type": "File",
"file": {
"name": "samples/adult.csv",
"props": {
"sep": ";"
}
}
},
"attributes": [
{
"name": "age",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Masking Based",
"hierarchyType": "Rule",
"rule": {
"masking": {
"maskOrder": "Right To Left",
"maskChar": "*",
"maxDomainSize": 2
}
}
}
}
],
"privacyModel": {
"k": {
"kValue": 50
}
},
"config": {
"maxSuppression": 0.01
},
"target": {
"type": "File",
"file": {
"name": "s3://<Your-S3-BucketName>/anon-adult-e1.csv",
"props": {
"lineterminator": "\n"
},
"accessOptions": {
"key": "<Your-S3-API Key>",
"secret": "<Your-S3-API Secret>"
}
}
}
}
#import the anonsdk library
import anonsdk as asdk
import pandas as pd
# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>
#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)
#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult-e1.csv"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})
# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path,sep=";")
df.head()
# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)
# configure masking of string datatype
anon_object["age"] = asdk.Gen_Mask(maskchar="*",maskOrder="R",maxLength=2)
#Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(50)
anon_object.config['maxSuppression'] = 0.01
# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)
# check the status of the job <check the status iteratively until 'status': 'Completed' >
job.status()
# check the comparative risk statistics from the source and result dataset
job.riskStat()
# check the comparative utility statistics from the source and result dataset
job.utilityStat()
Tree-based Aggregation for Attributes with k-Anonymity, l-Diversity, and t-Closeness
This sample uses the following attributes:
- Source: Local file system
- Target: Amazon S3 bucket
- Data set: 4 Quasi Identifiers, 2 Sensitive Attributes
- Suppression: 0.10
- Privacy Model: K with value 3, T-closeness with value 0.2, and L-diversity with value 2
In this example, for an attribute, the generalization hierarchy is a part of the request.
{
"source": {
"type": "File",
"file": {
"name": "samples/adult.csv",
"props": {
"sep": ";",
"decimal": ",",
"quotechar": "\"",
"escapechar": "\\",
"encoding": "utf-8"
}
}
},
"attributes": [
{
"name": "marital-status",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_marital-status.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "native-country",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_native-country.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "occupation",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_occupation.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "race",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data",
"data": {
"hierarchy": [
[
"White",
"*"
],
[
"Asian-Pac-Islander",
"*"
],
[
"Amer-Indian-Eskimo",
"*"
],
[
"Black",
"*"
]
],
"defaultHierarchy": [
"Other",
"*"
]
}
}
},
{
"name": "sex",
"dataType": "String",
"classificationType": "Sensitive Attribute"
},
{
"name": "salary-class",
"dataType": "String",
"classificationType": "Sensitive Attribute"
}
],
"config": {
"maxSuppression": 0.10
},
"privacyModel": {
"k": {
"kValue": 3
},
"tcloseness": [
{
"name": "salary-class",
"emdType": "EMD with equal ground distance",
"tFactor": 0.2
}
],
"ldiversity": [
{
"name": "sex",
"lFactor": 2,
"lType": "Distinct-l-diversity"
}
]
},
"target": {
"type": "File",
"file": {
"name": "s3://<Your-S3-BucketName>/anon-adult_klt.csv",
"props": {
"lineterminator": "\n"
},
"accessOptions": {
"key": "<Your-S3-API Key>",
"secret": "<Your-S3-API Secret>"
}
}
}
}
#import the anonsdk library
import anonsdk as asdk
import pandas as pd
# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>
#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)
#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_klt.csv"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})
# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
# create AnonObject with connection, dataframe metadata and source path
df = pd.read_csv(source_csv_path,sep=";")
df.head()
anon_object = asdk.AnonElement(conn, df, source_datastore)
# configuration
hierarchy_marital_status_path = "samples/hierarchy/adult_hierarchy_marital-status.csv"
df_ms = pd.read_csv(hierarchy_marital_status_path,sep=";").compute()
print(df_ms)
anon_object['marital-status']=asdk.Gen_Tree(df_ms)
hierarchy_native_country_path = "samples/hierarchy/adult_hierarchy_native-country.csv"
df_nc = pd.read_csv(hierarchy_native_country_path,sep=";").compute()
print(df_nc)
anon_object['nativecountry']=asdk.Gen_Tree(df_nc)
hierarchy_occupation_path = "hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path).compute()
print(df_occ)
anon_object['occupation']=asdk.Gen_Tree(df_occ)
df_race = pd.DataFrame(data={"lvl0":["White","Asian-Pac-Islander","Amer-Indian","Black","Other"], "lvl1":["*","*","*","*","*"]})
anon_object['race']=asdk.Gen_Tree(df_race)
#Configure K-anonymity , suppression allowed in the dataset
anon_object.config.k = asdk.K(3)
anon_object.config['maxSuppression'] = 0.10
#Configure L-diversity and T-closeness
anon_object["sex"]=asdk.LDiv(lfactor=2)
anon_object["salary-class"]=asdk.TClose(tfactor=0.2)
# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)
# check the status of the job
job.status()
# check the comparative risk statistics from the source and result dataset
job.riskStat()
# check the comparative utility statistics from the source and result dataset
job.utilityStat()
Micro-Aggregation and Generalization with Aggregates
This sample uses the following attributes:
- Source: Local file system
- Target: Amazon S3 bucket
- Data set: 2 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 2 Micro Aggregations, and 2 Sensitive Attributes
- Suppression: 0.50
- Privacy Model: K with value 5, T-closeness with value 0.2, and L-diversity with value 2
{
"source": {
"type": "File",
"file": {
"name": "samples/adult.csv",
"props": {
"sep": ";"
}
}
},
"attributes": [
{
"name": "age",
"dataType": "Integer",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Micro Aggregation",
"aggregateFn": "GMean"
},
{
"name": "marital-status",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Micro Aggregation",
"aggregateFn": "Mode"
},
{
"name": "native-country",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_native-country.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "occupation",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_occupation.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "race",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Aggregation Based",
"hierarchyType": "Aggregate",
"aggregateFn": "Mode"
}
},
{
"name": "sex",
"classificationType": "Sensitive Attribute",
"dataType": "String"
},
{
"name": "salary-class",
"classificationType": "Sensitive Attribute",
"dataType": "String"
}
],
"config": {
"maxSuppression": 0.50
},
"privacyModel": {
"k": {
"kValue": 5
},
"tcloseness": [
{
"name": "salary-class",
"emdType": "EMD with equal ground distance",
"tFactor": 0.2
}
],
"ldiversity": [
{
"name": "sex",
"lType": "Distinct-l-diversity",
"lFactor": 2
}
]
},
"target": {
"type": "File",
"file": {
"name": "s3://<Your-S3-BucketName>/anon-adult_micro.csv",
"props": {
"lineterminator": "\n"
},
"accessOptions": {
"key": "<Your-S3-API Key>",
"secret": "<Your-S3-API Secret>"
}
}
}
}
#import the anonsdk library
import anonsdk as asdk
import pandas as pd
# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>
#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)
#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_micro.csv"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})
# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path,sep=";")
df.head()
# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)
# configuration
hierarchy_native_country_path = "hierarchy/adult_hierarchy_native-country.csv"
df_nc = pd.read_csv(hierarchy_native_country_path,sep=";")
print(df_nc)
anon_object['nativecountry']=asdk.Gen_Tree(df_nc)
hierarchy_occupation_path = "samples/hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path)
print(df_occ)
anon_object['marital-status']=asdk.Gen_Tree(df_occ)
# applying aggregation rules
anon_object['age']=asdk.MicroAgg(asdk.AggregateFunction.GMean)
anon_object['race']=asdk.Gen_Agg(asdk.AggregateFunction.Mode)
# applying micro-aggregation rule
anon_object['marital-status']=asdk.MicroAgg(asdk.AggregateFunction.Mode)
#Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(5)
anon_object.config['maxSuppression'] = 0.50
#Configure L-diversity and T-closeness
anon_object["sex"]=asdk.LDiv(lfactor=2)
anon_object["salary-class"]=asdk.TClose(tfactor=0.2)
# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)
# check the status of the job
job.status()
# check the comparative risk statistics from the source and result dataset
job.riskStat()
# check the comparative utility statistics from the source and result dataset
job.utilityStat()
Parquet File Format
This sample uses the following attributes:
- Source: Local file system
- Target: Amazon S3 bucket in the Parquet format
- Data set: 4 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 1 Micro Aggregation, and 1 Sensitive Attribute
- Suppression: 0.4
- Privacy Model: K with value 350 and L-diversity with value 2
In this example, for an attribute, the generalization hierarchy is part of the request.
{
"source": {
"type": "File",
"file": {
"name": "samples/adult.csv",
"props": {
"sep": ";",
"decimal": ",",
"quotechar": "\"",
"escapechar": "\\",
"encoding": "utf-8"
}
}
},
"attributes": [
{
"name": "age",
"dataType": "Integer",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"hierarchyType": "Rule",
"type": "Rounding",
"rule": {
"interval": {
"levels": [
"5",
"10",
"50",
"100"
],
"lowerBound":"5",
"upperBound":"100"
}
}
}
},
{
"name": "marital-status",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Micro Aggregation",
"aggregateFn": "Mode"
},
{
"name": "citizenSince",
"dataType": "Date",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Rounding",
"hierarchyType": "Rule",
"rule": {
"daterange": {
"levels": [
"WD.M.Y",
"FD.M.Y",
"QTR.Y",
"Y"
]
}
}
},
"props": {
"dateformat": "dd-mm-yyyy"
}
},
{
"name": "occupation",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_occupation.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "race",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"dataType": "String",
"generalization": {
"type": "Aggregation Based",
"hierarchyType": "Aggregate",
"aggregateFn": "Mode"
}
},
{
"name": "salary-class",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Masking Based",
"hierarchyType": "Rule",
"rule": {
"masking": {
"maskOrder": "Left To Right",
"maskChar": "*",
"maxDomainSize": 3
}
}
}
},
{
"name": "sex",
"dataType": "String",
"classificationType": "Sensitive Attribute"
}
],
"config": {
"maxSuppression": 0.4,
"redactOutliers": true,
"suppressionData": "Any"
},
"privacyModel": {
"k": {
"kValue": 350
},
"ldiversity": [
{
"name": "sex",
"lType": "Distinct-l-diversity",
"lFactor": 2
}
]
},
"target": {
"type": "File",
"file": {
"name": "s3://<Your-S3-BucketName>/anon-adult-rules",
"format": "Parquet",
"accessOptions": {
"key": "<Your-S3-API Key>",
"secret": "<Your-S3-API Secret>"
}
}
}
}
It is not applicable for SDK functions.
Retaining and Redacting
This sample uses the following attributes:
- Source: Local file system
- Target: Amazon S3 bucket in the Parquet format
- Data set: 2 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 1 Micro Aggregation, 1 Non-Sensitive Attribute, 1 Identifying Attribute, and 2 Sensitive Attributes
- Suppression: 0.10
- Privacy Model: K with value 200 and L-diversity with value 2
In this example, for an attribute, the generalization hierarchy is part of the request.
{
"source": {
"type": "File",
"file": {
"name": "samples/adult.csv",
"props": {
"sep": ";",
"decimal": ",",
"quotechar": "\"",
"escapechar": "\\",
"encoding": "utf-8"
}
}
},
"attributes": [
{
"name": "age",
"dataType": "Integer",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Rounding",
"hierarchyType": "Rule",
"rule": {
"interval": {
"levels": [
"5",
"10",
"50",
"100"
]
}
}
}
},
{
"name": "marital-status",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Micro Aggregation",
"aggregateFn": "Mode"
},
{
"name": "occupation",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Tree Based",
"hierarchyType": "Data Store",
"dataStore": {
"type": "File",
"format": "CSV",
"file": {
"name": "samples/hierarchy/adult_hierarchy_occupation.csv",
"props": {
"delimiter": ";",
"quotechar": "\"",
"header": null
}
}
}
}
},
{
"name": "race",
"dataType": "String",
"classificationType": "Quasi Identifier",
"dataTransformationType": "Generalization",
"generalization": {
"type": "Aggregation Based",
"hierarchyType": "Aggregate",
"aggregateFn": "Mode"
}
},
{
"name": "citizenSince",
"dataType": "Date",
"classificationType": "Identifying Attribute"
},
{
"name": "education",
"dataType": "String",
"classificationType": "Non-Sensitive Attribute"
},
{
"name": "salary-class",
"dataType": "String",
"classificationType": "Sensitive Attribute"
},
{
"name": "sex",
"dataType": "String",
"classificationType": "Sensitive Attribute"
}
],
"config": {
"maxSuppression": 0.10,
"suppressionData": "Any"
},
"privacyModel": {
"k": {
"kValue": 200
},
"ldiversity": [
{
"name": "sex",
"lType": "Distinct-l-diversity",
"lFactor": 2
},
{
"name": "salary-class",
"lType": "Distinct-l-diversity",
"lFactor": 2
}
]
},
"target": {
"type": "File",
"file": {
"name": "s3://<Your-S3-BucketName>/anon-adult_retd",
"format": "Parquet",
"accessOptions": {
"key": "<Your-S3-API Key>",
"secret": "<Your-S3-API Secret>"
}
}
}
}
# import the anonsdk library
import anonsdk as asdk
import pandas as pd
# s3 bucket credentials
s3_key = < AWS_Key >
s3_secret = < AWS_Secret >
# set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)
# Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_retd"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key, "secret": s3_secret})
# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path, sep=";")
df.head()
# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)
# configuration
hierarchy_occupation_path = "samples/hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path, sep=";")
print(df_occ)
anon_object['marital-status'] = asdk.Gen_Tree(df_occ)
anon_object['marital-status'] = asdk.MicroAgg(asdk.AggregateFunction.Mode)
anon_object['race'] = asdk.Gen_Agg(asdk.AggregateFunction.Mode)
anon_object['age'] = asdk.Gen_Interval([5, 10, 50, 100])
anon_object['citizenSince'] = asdk.Preserve()
anon_object['education'] = asdk.Preserve()
anon_object['salary-class'] = asdk.Redact()
anon_object['sex'] = asdk.Redact()
# Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(200)
anon_object.config['maxSuppression'] = 0.10
# Configure L-diversity
anon_object["sex"] = asdk.LDiv(lfactor=2)
anon_object["salary-class"] = asdk.LDiv(lfactor=2)
# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object, target_datastore, force=True)
# check the status of the job
job.status()
# check the comparative risk statistics from the source and result dataset
job.riskStat()
# check the comparative utility statistics from the source and result dataset
job.utilityStat()
3 - Samples for cloud-related source and destination files
"source": {
"type": "File",
"file": {
"name": "s3://<path_to_dataset>",
"accessOptions": {
"key": "API Key",
"secret": "Secret Key"
}
}
}
"source": {
"type": "File",
"file": {
"name": "adl://<path-to-dataset>",
"accessOptions":{
"tenant_id": Tenant_ID,
"client_id": Client_ID,
"client_secret": Client_Secret_Key
}
}
}
"source": {
"type": "File",
"file": {
"name": "abfs://<path_to_source_file>",
"accessOptions":{
"account_name": "<account_name>",
"account_key": "<Account_key>”
}
},
"format": "CSV"
}