Using the Auto Anonymizer

The Auto Anonymizer feature of Protegrity Anonymization is a powerful feature for performing anonymization. It processes the data to generate a template for completing Protegrity Anonymization requests.

1: Using mode to Auto Anonymize
2: Using Infer to Anonymize

The Auto Anonymizer feature is simple and easy to configure. It is built to analyze the data and produce an output that has a balance of both generalization and value. The output of the auto anonymizer should always be verified by a human with dataset knowledge. The output is merely a suggestion and should not be used without further inspection.

Protegrity Anonymization analyzes a sample of the data from the dataset. This sample is then analyzed to build a template for performing the anonymization. The template building takes time, based on the size of the dataset and the nature of the data itself.

You can specify the parameters such as, the various fields for redacting, for anonymizing the data. You can use the Auto Anonymizer feature to automatically analyze the data and perform the required anonymization. This feature can also scan the data and perform the best optimization for providing high quality anonymized data. The various parameters used for performing auto anonymization are configurable and can be optimized to suite your business need or requirements. Additionally, frequently performed fields can be created and stored to enable you to build the anonymization request faster and with minimal information before runtime.

A brief flow of the steps for auto anonymization is shown in the following figure.

The user provides the data, column identification, and anonymization parameters, if required. Protegrity Anonymization analyzes the parameters provided and analyzes the dataset. Various Protegrity Anonymization models are generated and analyzed. The parameters, such as, the K, l, and t values, along with the data available in the dataset are used for processing the request. The results are compared and finally, the dataset is processed using the model and parameters that have the best anonymization output.

Consider the following sample graph.

Protegrity Anonymization will first auto assign the privacy levels for the various columns in the dataset. Direct identifiers will be redacted from the dataset. Next, models will be created using different values for K-anonymity, l-diversity, and t-closeness. The values will be analyzed, and the best values selected, such as, the values at point b in the graph. The dataset will then be anonymized using the values determined to complete the anonymization request.

The user can specify the values that must be used, if required. Protegrity Anonymization will consider the values specified by the user and continue to auto generate the remaining values accordingly.

Note: The auto anonymization runs the same request using different values, the anonymization request will take more time to complete compared to a regular anonymization request.

You can use measure, mode, and Infer for Auto Anonymization.

For more information about the measure API, refer to Measure API.

The difference between using mode and Infer is provided in the following table.

Mode	Infer
Analyzes the dataset and performs the anonymization job.	Only analyzes the dataset.
The result set is the output.	Updates the models used for performing the anonymization job.
You cannot retrieve the attributes for the job.	You can view the auto generated job attribute values, such as K-anonymity, that will be used for performing the job using the describe method.
You can specify target variables for focusing the anonymization job with the anonymization function.	You can specify target variables for focus before performing the anonymization job or even modify the model after performing the anonymization job.

1 - Using mode to Auto Anonymize

Details how to analyze the dataset to determine optimal anonymization settings and retain user‑defined configurations during auto anonymization.

Set the mode to Auto to auto anonymize. The auto anonymization auto-detects the data-domain, classification type, hierarchies, and anonymization configuration in Protegrity Anonymization. Any user-defined configuration, such as, QI attribute assignments, hierarchy, and K value, are retained and considered while performing the auto anonymization. You can also specify the targetVariable that must be considered for obtaining the best possible result set in terms of quality data while performing the anonymization job.

Ensure that you complete the following checks before starting the Protegrity Anonymization job:

Verify that the destination file is not in use and that the required permissions are set for creating and modifying the destination file.
Ensure that the disk is not full and enough free space is available to save the destination file.
Verify that you have imported the Pythonic SDK, for example, import anonsdk as asdk.

The folowing table shows the auto anonymization information.

Using mode to Auto Anonymize Information	Description
Function	job = asdk.anonymize(e, targetVariable="targetVariable", mode=“Auto”)
Parameters	targetVariable: The field specified here is used as a focus point for performing the anonymization.
Return Type	It returns the result set after performing the anonymization job.
Sample Request	job = asdk.anonymize(e, targetVariable=“date”, mode=“Auto”)

For more sample requests that you can use, refer to Sample Requests for Protegrity Anonymization.

Note: You can use e.measure() to modify the request and view different outcomes of the result set.

For more information about the measure API, refer to Measure API.

2 - Using Infer to Anonymize

Use the Infer API to start auto-detecting the data-domain, classification type, hierarchies, and anonymization configuration in Protegrity Anonymization.

Any user-defined configuration, such as, QI attribute assignments, hierarchy, and K value, are retained and considered while performing the auto anonymization.

Ensure that you complete the following checks before starting the Protegrity Anonymization job:

Verify that the destination file is not in use and that the required permissions are set for creating and modifying the destination file.
Ensure that the disk is not full and enough free space is available to save the destination file.
Verify that you have imported the Pythonic SDK, for example, import anonsdk as asdk.

The folowing table shows the auto anonymization information.

Using Infer to Anonymize Information	Description
Function	infer(targetVariable)
Parameters	targetVariable: The field specified here is used as a focus point for performing the anonymization.
Return Type	Returns an anon element with all the detected classifications and hierarchies generated.
Sample Request	e.infer(targetVariable=‘income’)

For more sample requests that you can use, refer to Sample Requests for Protegrity Anonymization.

Note: You can use e.measure() to modify the request and view different outcomes of the result set.

For more information about the measure API, refer to Measure API.