Understanding Protegrity Anonymization Components

Protegrity Anonymization components are leveraged to anonymize datasets.

Protegrity Anonymization is composed of the following main components:

  • Protegrity Anonymization REST Server: This core component exposes a REST interface through which clients can interact with the Protegrity Anonymization service. It uses an in-memory task queue and stores anonymized datasets and respective metadata on persistent storage. Protegrity Anonymization tasks are submitted to a queue and are handled in first-in first out fashion. It invokes the Dask Scheduler to perform the Protegrity Anonymization task.

Note: Only one anonymization task is executed at a time in Protegrity Anonymization.

  • REST Client: The client connects to the Protegrity Anonymization REST Server using an API tool, such as Postman, to create, send, and receive the Protegrity Anonymization request. It also provides a Swagger interface detailing the APIs available. The Swagger interface can also be used as a REST client for raising API requests.
  • Python SDK: It is the Python programmatic interface used to communicate with the REST server.
  • Anon-Storage: It is used to read data from and write data to the storage. It uses the S3 bucket framework to perform file operations.
  • Anon-DB: It is a PostgreSQL database that is used to store metadata related to Protegrity Anonymization jobs.
  • Dask Scheduler: This component analyzes the work load and distributes processing of the dataset to one or more Dask Workers. The scheduler can invoke additional workers or reduce the number of workers required for processing the task. The Dask Scheduler analyzes the dataset as a whole and allocates a small chunk of the dataset to each worker.
  • Dask Worker: This component is registered with the Dask Scheduler and processes the dataset. It is the Dask library that handles the interaction and interface with the data sets and the storage. Protegrity Anonymization supports cloud storage, S3 bucket, and other storages compatible with Kubernetes. The repository can also be kept outside the container. The Dask Worker works on a subset of the entire data.

Last modified : March 24, 2026