Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.

General Architecture

High level view of the main components and interactions.

The main components of the Protegrity Data Discovery product are as follows:

  • Classification service: The Classification Service serves as the primary access point for all classification-related interactions. It orchestrates various back-end components known as Providers, which are responsible for executing the actual classification tasks.

  • Pattern and Context classification providers: The Providers function as specialized modules in identifying and classifying Personally Identifiable Information (PII). They analyze input data to detect, classify, and locate sensitive information.

The Pattern classification provider is a rule-based system that identifies PII using predefined patterns and heuristics. It is fast, customizable, and suitable for structured data with known formats.

The Context classification provider is an LLM based designed within Protegrity. A machine learning model that detects PII using context and semantics. It is flexible, effective with unstructured data, and adapts to varied patterns.

The general architecture is illustrated in the following figure.

CalloutDescription
1The user enters the data to be classified for sensitive data as text body and sends the request to the Classification service.
2This Classification service then distributes the request to the Pattern and Context classification service providers to process the data.
3The Pattern and Context classification providers process the data based on their logic and classify them in the form of a response to the Classification service.
4The Classification service then aggregates the responses from the service providers and sends it to the user.
Last modified : September 03, 2025