This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Using Sample Protegrity Anonymization Jobs

Sample anonymization jobs that you can use for working with and testing Protegrity Anonymization.

1 - Sample Data Sets

Use the following dataset to test Protegrity Anonymization. This dataset is comprehensive and can give you thorough insights into working with Protegrity Anonymization.

Adult Dataset: Here is an extract of the dataset, the complete dataset can be found in the adult.csv file in the samples directory. Adult Dataset: Here is an extract of the dataset, the complete dataset can be found in the adult.csv file in the samples directory.

sex;age;race;marital-status;education;native-country;citizenSince;weight;workclass;occupation;salary-class
Male;39;White;Never-married;Bachelors;United-States;08-01-1971;185.38;State-gov;Adm-clerical;<=50K
Male;50;White;Married-civ-spouse;Bachelors;United-States;19-04-1960;176.32;Self-emp-not-inc;Exec-managerial;<=50K
Male;38;White;Divorced;HS-grad;United-States;07-12-1971;159.13;Private;Handlers-cleaners;<=50K
Male;53;Black;Married-civ-spouse;11th;United-States;22-05-1957;170.45;Private;Handlers-cleaners;<=50K
Female;28;Black;Married-civ-spouse;Bachelors;Cuba;03-02-1982;178.79;Private;Prof-specialty;<=50K
Female;37;White;Married-civ-spouse;Masters;United-States;06-12-1972;161.65;Private;Exec-managerial;<=50K
Female;49;Black;Married-spouse-absent;9th;Jamaica;18-04-1961;162.73;Private;Other-service;<=50K
Male;52;White;Married-civ-spouse;HS-grad;United-States;21-05-1958;171.75;Self-emp-not-inc;Exec-managerial;>50K
Female;31;White;Never-married;Masters;United-States;31-12-1978;164.03;Private;Prof-specialty;>50K
Male;42;White;Married-civ-spouse;Bachelors;United-States;11-02-1968;186.33;Private;Exec-managerial;>50K
Male;37;Black;Married-civ-spouse;Some-college;United-States;06-12-1972;189.49;Private;Exec-managerial;>50K
Male;30;Asian-Pac-Islander;Married-civ-spouse;Bachelors;India;01-02-1980;178.70;State-gov;Prof-specialty;>50K
Female;23;White;Never-married;Bachelors;United-States;08-04-1987;183.22;Private;Adm-clerical;<=50K
Male;32;Black;Never-married;Assoc-acdm;United-States;01-01-1978;156.63;Private;Sales;<=50K
Male;34;Amer-Indian-Eskimo;Married-civ-spouse;7th-8th;Mexico;03-12-1975;173.41;Private;Transport-moving;<=50K
Male;25;White;Never-married;HS-grad;United-States;06-03-1985;170.72;Self-emp-not-inc;Farming-fishing;<=50K
Male;32;White;Never-married;HS-grad;United-States;01-01-1978;174.91;Private;Machine-op-inspct;<=50K
Male;38;White;Married-civ-spouse;11th;United-States;07-12-1971;176.47;Private;Sales;<=50K
Female;43;White;Divorced;Masters;United-States;12-02-1967;179.88;Self-emp-not-inc;Exec-managerial;>50K
Male;40;White;Married-civ-spouse;Doctorate;United-States;09-01-1970;170.80;Private;Prof-specialty;>50K
Female;54;Black;Separated;HS-grad;United-States;23-06-1956;171.61;Private;Other-service;<=50K
Male;35;Black;Married-civ-spouse;9th;United-States;04-12-1974;183.71;Federal-gov;Farming-fishing;<=50K
Male;43;White;Married-civ-spouse;11th;United-States;12-02-1967;158.63;Private;Transport-moving;<=50K
Female;59;White;Divorced;HS-grad;United-States;28-07-1951;181.64;Private;Tech-support;<=50K
Male;56;White;Married-civ-spouse;Bachelors;United-States;25-06-1954;171.80;Local-gov;Tech-support;>50K
Male;19;White;Never-married;HS-grad;United-States;12-05-1991;172.74;Private;Craft-repair;<=50K
Male;39;White;Divorced;HS-grad;United-States;08-01-1971;159.41;Private;Exec-managerial;<=50K
Male;49;White;Married-civ-spouse;HS-grad;United-States;18-04-1961;176.76;Private;Craft-repair;<=50K
Male;23;White;Never-married;Assoc-acdm;United-States;08-04-1987;164.43;Local-gov;Protective-serv;<=50K
Male;20;Black;Never-married;Some-college;United-States;11-05-1990;157.60;Private;Sales;<=50K
Male;45;White;Divorced;Bachelors;United-States;14-03-1965;176.38;Private;Exec-managerial;<=50K
Male;30;White;Married-civ-spouse;Some-college;United-States;01-02-1980;160.60;Federal-gov;Adm-clerical;<=50K
Male;22;Black;Married-civ-spouse;Some-college;United-States;09-04-1988;173.41;State-gov;Other-service;<=50K
Male;48;White;Never-married;11th;Puerto-Rico;17-04-1962;189.50;Private;Machine-op-inspct;<=50K
Male;21;White;Never-married;Some-college;United-States;10-05-1989;162.76;Private;Machine-op-inspct;<=50K
Female;19;White;Married-AF-spouse;HS-grad;United-States;12-05-1991;158.42;Private;Adm-clerical;<=50K
Male;48;White;Married-civ-spouse;Assoc-acdm;United-States;17-04-1962;160.75;Self-emp-not-inc;Prof-specialty;<=50K
Male;31;White;Married-civ-spouse;9th;United-States;31-12-1978;172.10;Private;Machine-op-inspct;<=50K
Male;53;White;Married-civ-spouse;Bachelors;United-States;22-05-1957;189.74;Self-emp-not-inc;Prof-specialty;<=50K
Male;24;White;Married-civ-spouse;Bachelors;United-States;07-04-1986;170.08;Private;Tech-support;<=50K
Female;49;White;Separated;HS-grad;United-States;18-04-1961;173.71;Private;Adm-clerical;<=50K
Male;25;White;Never-married;HS-grad;United-States;06-03-1985;160.52;Private;Handlers-cleaners;<=50K
Male;57;Black;Married-civ-spouse;Bachelors;United-States;26-07-1953;178.12;Federal-gov;Prof-specialty;>50K
Male;53;White;Married-civ-spouse;HS-grad;United-States;22-05-1957;186.11;Private;Machine-op-inspct;<=50K
Female;44;White;Divorced;Masters;United-States;13-02-1966;162.80;Private;Exec-managerial;<=50K
Male;41;White;Married-civ-spouse;Assoc-voc;United-States;10-01-1969;172.39;State-gov;Craft-repair;<=50K
Male;29;White;Never-married;Assoc-voc;United-States;02-02-1981;168.83;Private;Prof-specialty;<=50K
Female;25;Other;Married-civ-spouse;Some-college;United-States;06-03-1985;179.12;Private;Exec-managerial;<=50K
Female;47;White;Married-civ-spouse;Prof-school;Honduras;16-03-1963;163.02;Private;Prof-specialty;>50K
Male;50;White;Divorced;Bachelors;United-States;19-04-1960;172.18;Federal-gov;Exec-managerial;>50K

2 - Sample Requests for Protegrity Anonymization

Modify and use the sample requests provided here for anonymizing your dataset. Use these requests as a template or as a guideline for building the required request.

Tree-based Aggregation for Attributes with k-Anonymity

This sample uses the following attributes:

  • Source: Local file system
  • Target: Amazon S3 bucket
  • Data set: 1 Quasi Identifier
  • Suppression: 0.01
  • Privacy Model: K-Anonimity with k value as 50

In this example, the data has custom delimiters.

{
    "source": {
        "type": "File",
        "file": {
            "name": "samples/adult.csv",
            "props": {
                "sep": ";"
            }
        }
    },
    "attributes": [
        {
            "name": "age",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Masking Based",
                "hierarchyType": "Rule",
                "rule": {
                    "masking": {
                        "maskOrder": "Right To Left",
                        "maskChar": "*",
                        "maxDomainSize": 2
                    }
                }
            }
        }
    ],
    "privacyModel": {
        "k": {
            "kValue": 50
        }
    },
    "config": {
        "maxSuppression": 0.01
    },
    "target": {
        "type": "File",
        "file": {
            "name": "s3://<Your-S3-BucketName>/anon-adult-e1.csv",
            "props": {
                "lineterminator": "\n"
            },
            "accessOptions": {
                "key": "<Your-S3-API Key>",
                "secret": "<Your-S3-API Secret>"
            }
        }
    }
}
#import  the anonsdk library
import anonsdk as asdk
import pandas as pd

# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>

#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)

#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult-e1.csv"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})

# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path,sep=";")
df.head()

# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)
# configure masking of string datatype
anon_object["age"] = asdk.Gen_Mask(maskchar="*",maskOrder="R",maxLength=2)

#Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(50)
anon_object.config['maxSuppression'] = 0.01

# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)

# check the status of the job <check the status iteratively until  'status': 'Completed' >
job.status()

# check the comparative risk statistics from the source and result dataset
job.riskStat()

# check the comparative utility statistics from the source and result dataset
job.utilityStat()

Tree-based Aggregation for Attributes with k-Anonymity, l-Diversity, and t-Closeness

This sample uses the following attributes:

  • Source: Local file system
  • Target: Amazon S3 bucket
  • Data set: 4 Quasi Identifiers, 2 Sensitive Attributes
  • Suppression: 0.10
  • Privacy Model: K with value 3, T-closeness with value 0.2, and L-diversity with value 2

In this example, for an attribute, the generalization hierarchy is a part of the request.

{
    "source": {
        "type": "File",
        "file": {
            "name": "samples/adult.csv",
            "props": {
                "sep": ";",
                "decimal": ",",
                "quotechar": "\"",
                "escapechar": "\\",
                "encoding": "utf-8"
            }
        }
    },
    "attributes": [
        {
            "name": "marital-status",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data Store",
                "dataStore": {
                    "type": "File",
                    "format": "CSV",
                    "file": {
                        "name": "samples/hierarchy/adult_hierarchy_marital-status.csv",
                        "props": {
                            "delimiter": ";",
                            "quotechar": "\"",
                            "header": null
                        }
                    }
                }
            }
        },
        {
            "name": "native-country",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data Store",
                "dataStore": {
                    "type": "File",
                    "format": "CSV",
                    "file": {
                        "name": "samples/hierarchy/adult_hierarchy_native-country.csv",
                        "props": {
                            "delimiter": ";",
                            "quotechar": "\"",
                            "header": null
                        }
                    }
                }
            }
        },
        {
            "name": "occupation",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data Store",
                "dataStore": {
                    "type": "File",
                    "format": "CSV",
                    "file": {
                        "name": "samples/hierarchy/adult_hierarchy_occupation.csv",
                        "props": {
                            "delimiter": ";",
                            "quotechar": "\"",
                            "header": null
                        }
                    }
                }
            }
        },
        {
            "name": "race",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data",
                "data": {
                    "hierarchy": [
                        [
                            "White",
                            "*"
                        ],
                        [
                            "Asian-Pac-Islander",
                            "*"
                        ],
                        [
                            "Amer-Indian-Eskimo",
                            "*"
                        ],
                        [
                            "Black",
                            "*"
                        ]
                    ],
                    "defaultHierarchy": [
                        "Other",
                        "*"
                    ]
                }
            }
        },
        {
            "name": "sex",
            "dataType": "String",
            "classificationType": "Sensitive Attribute"
        },
        {
            "name": "salary-class",
            "dataType": "String",
            "classificationType": "Sensitive Attribute"
        }
    ],
    "config": {
        "maxSuppression": 0.10
    },
    "privacyModel": {
        "k": {
            "kValue": 3
        },
        "tcloseness": [
            {
                "name": "salary-class",
                "emdType": "EMD with equal ground distance",
                "tFactor": 0.2
            }
        ],
        "ldiversity": [
            {
                "name": "sex",
                "lFactor": 2,
                "lType": "Distinct-l-diversity"
            }
        ]
    },
    "target": {
        "type": "File",
        "file": {
            "name": "s3://<Your-S3-BucketName>/anon-adult_klt.csv",
            "props": {
                "lineterminator": "\n"
            },
            "accessOptions": {
                "key": "<Your-S3-API Key>",
                "secret": "<Your-S3-API Secret>"
            }
        }
    }
}
#import the anonsdk library
import anonsdk as asdk
import pandas as pd

# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>

#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)

#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_klt.csv"

# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})

# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")

# create AnonObject with connection, dataframe metadata and source path
df = pd.read_csv(source_csv_path,sep=";")
df.head()
anon_object = asdk.AnonElement(conn, df, source_datastore)

# configuration
hierarchy_marital_status_path = "samples/hierarchy/adult_hierarchy_marital-status.csv"
df_ms = pd.read_csv(hierarchy_marital_status_path,sep=";").compute()
print(df_ms)
anon_object['marital-status']=asdk.Gen_Tree(df_ms)

hierarchy_native_country_path = "samples/hierarchy/adult_hierarchy_native-country.csv"
df_nc = pd.read_csv(hierarchy_native_country_path,sep=";").compute()
print(df_nc)
anon_object['nativecountry']=asdk.Gen_Tree(df_nc)

hierarchy_occupation_path = "hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path).compute()
print(df_occ)
anon_object['occupation']=asdk.Gen_Tree(df_occ)

df_race = pd.DataFrame(data={"lvl0":["White","Asian-Pac-Islander","Amer-Indian","Black","Other"], "lvl1":["*","*","*","*","*"]})
anon_object['race']=asdk.Gen_Tree(df_race)

#Configure K-anonymity , suppression allowed in the dataset
anon_object.config.k = asdk.K(3)
anon_object.config['maxSuppression'] = 0.10

#Configure L-diversity and T-closeness
anon_object["sex"]=asdk.LDiv(lfactor=2)
anon_object["salary-class"]=asdk.TClose(tfactor=0.2)

# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)

# check the status of the job
job.status()

# check the comparative risk statistics from the source and result dataset
job.riskStat()

# check the comparative utility statistics from the source and result dataset
job.utilityStat()

Micro-Aggregation and Generalization with Aggregates

This sample uses the following attributes:

  • Source: Local file system
  • Target: Amazon S3 bucket
  • Data set: 2 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 2 Micro Aggregations, and 2 Sensitive Attributes
  • Suppression: 0.50
  • Privacy Model: K with value 5, T-closeness with value 0.2, and L-diversity with value 2
{
    "source": {
        "type": "File",
        "file": {
            "name": "samples/adult.csv",
            "props": {
                "sep": ";"
            }
        }
    },
    "attributes": [
        {
            "name": "age",
            "dataType": "Integer",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Micro Aggregation",
            "aggregateFn": "GMean"
        },
        {
            "name": "marital-status",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Micro Aggregation",
            "aggregateFn": "Mode"
        },
        {
            "name": "native-country",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data Store",
                "dataStore": {
                    "type": "File",
                    "format": "CSV",
                    "file": {
                        "name": "samples/hierarchy/adult_hierarchy_native-country.csv",
                        "props": {
                            "delimiter": ";",
                            "quotechar": "\"",
                            "header": null
                        }
                    }
                }
            }
        },
        {
            "name": "occupation",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Tree Based",
                "hierarchyType": "Data Store",
                "dataStore": {
                    "type": "File",
                    "format": "CSV",
                    "file": {
                        "name": "samples/hierarchy/adult_hierarchy_occupation.csv",
                        "props": {
                            "delimiter": ";",
                            "quotechar": "\"",
                            "header": null
                        }
                    }
                }
            }
        },
        {
            "name": "race",
            "dataType": "String",
            "classificationType": "Quasi Identifier",
            "dataTransformationType": "Generalization",
            "generalization": {
                "type": "Aggregation Based",
                "hierarchyType": "Aggregate",
                "aggregateFn": "Mode"
            }
        },
        {
            "name": "sex",
            "classificationType": "Sensitive Attribute",
            "dataType": "String"
        },
        {
            "name": "salary-class",
            "classificationType": "Sensitive Attribute",
            "dataType": "String"
        }
    ],
    "config": {
        "maxSuppression": 0.50
    },
    "privacyModel": {
        "k": {
            "kValue": 5
        },
        "tcloseness": [
            {
                "name": "salary-class",
                "emdType": "EMD with equal ground distance",
                "tFactor": 0.2
            }
        ],
        "ldiversity": [
            {
                "name": "sex",
                "lType": "Distinct-l-diversity",
                "lFactor": 2
            }
        ]
    },
    "target": {
        "type": "File",
        "file": {
            "name": "s3://<Your-S3-BucketName>/anon-adult_micro.csv",
            "props": {
                "lineterminator": "\n"
            },
            "accessOptions": {
                "key": "<Your-S3-API Key>",
                "secret": "<Your-S3-API Secret>"
            }
        }
    }
}
#import the anonsdk library
import anonsdk as asdk
import pandas as pd

# s3 bucket credentials
s3_key = <AWS_Key>
s3_secret = <AWS_Secret>

#set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)

#Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_micro.csv"
# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key,"secret": s3_secret})

# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path,sep=";")
df.head()

# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)

# configuration
hierarchy_native_country_path = "hierarchy/adult_hierarchy_native-country.csv"
df_nc = pd.read_csv(hierarchy_native_country_path,sep=";")
print(df_nc)
anon_object['nativecountry']=asdk.Gen_Tree(df_nc)

hierarchy_occupation_path = "samples/hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path)
print(df_occ)
anon_object['marital-status']=asdk.Gen_Tree(df_occ)

# applying aggregation rules
anon_object['age']=asdk.MicroAgg(asdk.AggregateFunction.GMean)
anon_object['race']=asdk.Gen_Agg(asdk.AggregateFunction.Mode)

# applying micro-aggregation rule
anon_object['marital-status']=asdk.MicroAgg(asdk.AggregateFunction.Mode)

#Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(5)
anon_object.config['maxSuppression'] = 0.50

#Configure L-diversity and T-closeness
anon_object["sex"]=asdk.LDiv(lfactor=2)
anon_object["salary-class"]=asdk.TClose(tfactor=0.2)

# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object,target_datastore ,force=True)

# check the status of the job
job.status()

# check the comparative risk statistics from the source and result dataset
job.riskStat()

# check the comparative utility statistics from the source and result dataset
job.utilityStat()

Parquet File Format

This sample uses the following attributes:

  • Source: Local file system
  • Target: Amazon S3 bucket in the Parquet format
  • Data set: 4 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 1 Micro Aggregation, and 1 Sensitive Attribute
  • Suppression: 0.4
  • Privacy Model: K with value 350 and L-diversity with value 2

In this example, for an attribute, the generalization hierarchy is part of the request.

    {
        "source": {
            "type": "File",
            "file": {
                "name": "samples/adult.csv",
                "props": {
                    "sep": ";",
                    "decimal": ",",
                    "quotechar": "\"",
                    "escapechar": "\\",
                    "encoding": "utf-8"
                }
            }
        },
        "attributes": [
            {
                "name": "age",
                "dataType": "Integer",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "hierarchyType": "Rule",
                    "type": "Rounding",
                    "rule": {
                        "interval": {
                            "levels": [
                                "5",
                                "10",
                                "50",
                                "100"
                            ],
                            "lowerBound":"5",
                            "upperBound":"100"
                        }
                    }
                }
            },
            {
                "name": "marital-status",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Micro Aggregation",
                "aggregateFn": "Mode"
            },
            {
                "name": "citizenSince",
                "dataType": "Date",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Rounding",
                    "hierarchyType": "Rule",
                    "rule": {
                        "daterange": {
                            "levels": [
                                "WD.M.Y",
                                "FD.M.Y",
                                "QTR.Y",
                                "Y"
                            ]
                        }
                    }
                },
                "props": {
                    "dateformat": "dd-mm-yyyy"
                }
            },
            {
                "name": "occupation",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Tree Based",
                    "hierarchyType": "Data Store",
                    "dataStore": {
                        "type": "File",
                        "format": "CSV",
                        "file": {
                            "name": "samples/hierarchy/adult_hierarchy_occupation.csv",
                            "props": {
                                "delimiter": ";",
                                "quotechar": "\"",
                                "header": null
                            }
                        }
                    }
                }
            },
            {
                "name": "race",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "dataType": "String",
                "generalization": {
                    "type": "Aggregation Based",
                    "hierarchyType": "Aggregate",
                    "aggregateFn": "Mode"
                }
            },
            {
                "name": "salary-class",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Masking Based",
                    "hierarchyType": "Rule",
                    "rule": {
                        "masking": {
                            "maskOrder": "Left To Right",
                            "maskChar": "*",
                            "maxDomainSize": 3
                        }
                    }
                }
            },
            {
                "name": "sex",
                "dataType": "String",
                "classificationType": "Sensitive Attribute"
            }
        ],
        "config": {
            "maxSuppression": 0.4,
            "redactOutliers": true,
            "suppressionData": "Any"
        },
        "privacyModel": {
            "k": {
                "kValue": 350
            },
            "ldiversity": [
                {
                    "name": "sex",
                    "lType": "Distinct-l-diversity",
                    "lFactor": 2
                }
            ]
        },
        "target": {
            "type": "File",
            "file": {
                "name": "s3://<Your-S3-BucketName>/anon-adult-rules",
                "format": "Parquet",
                "accessOptions": {
                    "key": "<Your-S3-API Key>",
                    "secret": "<Your-S3-API Secret>"
                }
            }
        }
    }
It is not applicable for SDK functions.

Retaining and Redacting

This sample uses the following attributes:

  • Source: Local file system
  • Target: Amazon S3 bucket in the Parquet format
  • Data set: 2 Quasi Identifiers, 1 Aggregation-based Quasi Identifier, 1 Micro Aggregation, 1 Non-Sensitive Attribute, 1 Identifying Attribute, and 2 Sensitive Attributes
  • Suppression: 0.10
  • Privacy Model: K with value 200 and L-diversity with value 2

In this example, for an attribute, the generalization hierarchy is part of the request.

    {
        "source": {
            "type": "File",
            "file": {
                "name": "samples/adult.csv",
                "props": {
                    "sep": ";",
                    "decimal": ",",
                    "quotechar": "\"",
                    "escapechar": "\\",
                    "encoding": "utf-8"
                }
            }
        },
        "attributes": [
            {
                "name": "age",
                "dataType": "Integer",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Rounding",
                    "hierarchyType": "Rule",
                    "rule": {
                        "interval": {
                            "levels": [
                                "5",
                                "10",
                                "50",
                                "100"
                            ]
                        }
                    }
                }
            },
            {
                "name": "marital-status",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Micro Aggregation",
                "aggregateFn": "Mode"
            },
            {
                "name": "occupation",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Tree Based",
                    "hierarchyType": "Data Store",
                    "dataStore": {
                        "type": "File",
                        "format": "CSV",
                        "file": {
                            "name": "samples/hierarchy/adult_hierarchy_occupation.csv",
                            "props": {
                                "delimiter": ";",
                                "quotechar": "\"",
                                "header": null
                            }
                        }
                    }
                }
            },
            {
                "name": "race",
                "dataType": "String",
                "classificationType": "Quasi Identifier",
                "dataTransformationType": "Generalization",
                "generalization": {
                    "type": "Aggregation Based",
                    "hierarchyType": "Aggregate",
                    "aggregateFn": "Mode"
                }
            },
            {
                "name": "citizenSince",
                "dataType": "Date",
                "classificationType": "Identifying Attribute"
            },
            {
                "name": "education",
                "dataType": "String",
                "classificationType": "Non-Sensitive Attribute"
            },
            {
                "name": "salary-class",
                "dataType": "String",
                "classificationType": "Sensitive Attribute"
            },
            {
                "name": "sex",
                "dataType": "String",
                "classificationType": "Sensitive Attribute"
            }
        ],
        "config": {
            "maxSuppression": 0.10,
            "suppressionData": "Any"
        },
        "privacyModel": {
            "k": {
                "kValue": 200
            },
            "ldiversity": [
                {
                    "name": "sex",
                    "lType": "Distinct-l-diversity",
                    "lFactor": 2
                },
                {
                    "name": "salary-class",
                    "lType": "Distinct-l-diversity",
                    "lFactor": 2
                }
            ]
        },
        "target": {
            "type": "File",
            "file": {
                "name": "s3://<Your-S3-BucketName>/anon-adult_retd",
                "format": "Parquet",
                "accessOptions": {
                    "key": "<Your-S3-API Key>",
                    "secret": "<Your-S3-API Secret>"
                }
            }
        }
    }
# import the anonsdk library
import anonsdk as asdk
import pandas as pd

# s3 bucket credentials
s3_key = < AWS_Key >
s3_secret = < AWS_Secret >

# set the source path for anonymization
# dataset path
source_csv_path = "adult.csv"
# create Store Object source_datastore
source_datastore = asdk.FileDataStore(source_csv_path)

# Set the target path for anonymized result
# anonymized file path
target_csv_path = "s3://target/anon-adult_retd"

# create Store Object target_datastore
target_datastore = asdk.FileDataStore(target_csv_path, access_options={"key": s3_key, "secret": s3_secret})

# Create connection Object with Rest API server
conn = asdk.Connection("https://anon.protegrity.com/")
df = pd.read_csv(source_csv_path, sep=";")
df.head()

# create AnonObject with connection, dataframe metadata and source path
anon_object = asdk.AnonElement(conn, df, source_datastore)

# configuration
hierarchy_occupation_path = "samples/hierarchy/adult_hierarchy_occupation.csv"
df_occ = pd.read_csv(hierarchy_occupation_path, sep=";")
print(df_occ)
anon_object['marital-status'] = asdk.Gen_Tree(df_occ)
anon_object['marital-status'] = asdk.MicroAgg(asdk.AggregateFunction.Mode)
anon_object['race'] = asdk.Gen_Agg(asdk.AggregateFunction.Mode)
anon_object['age'] = asdk.Gen_Interval([5, 10, 50, 100])
anon_object['citizenSince'] = asdk.Preserve()
anon_object['education'] = asdk.Preserve()
anon_object['salary-class'] = asdk.Redact()
anon_object['sex'] = asdk.Redact()

# Configure K-anonymity , suppression in the dataset allowed
anon_object.config.k = asdk.K(200)
anon_object.config['maxSuppression'] = 0.10

# Configure L-diversity
anon_object["sex"] = asdk.LDiv(lfactor=2)
anon_object["salary-class"] = asdk.LDiv(lfactor=2)

# Send Anonymization request with Transformation Configuration with the target store
job = asdk.anonymize(anon_object, target_datastore, force=True)

# check the status of the job
job.status()

# check the comparative risk statistics from the source and result dataset
job.riskStat()

# check the comparative utility statistics from the source and result dataset
job.utilityStat()

3 - Samples for cloud-related source and destination files

Code for specifying the source and destination for AWS and Azure.
"source": {
      "type": "File",
      "file": {
        "name": "s3://<path_to_dataset>",
        "accessOptions": {
            "key": "API Key",
            "secret": "Secret Key"
        }
      }
    }
  
"source": {
      "type": "File",
      "file": {
        "name": "adl://<path-to-dataset>",
        "accessOptions":{
            "tenant_id": Tenant_ID,
            "client_id": Client_ID,
            "client_secret": Client_Secret_Key
        }
      }
    }
  
"source": {
    "type": "File",
    "file": {
      "name": "abfs://<path_to_source_file>",
      "accessOptions":{
        "account_name": "<account_name>",
        "account_key": "<Account_key>” 
      }
    },
    "format": "CSV"
  }