Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.

Harmonizing Provider Outputs

Aggregate responses under a similar category.

Based on the detection logic, the Pattern and Context classification providers might classify the same data in different labels. The classification service standardizes provider outputs into a unified response.

Consider the example, You can visit our office located in New York City.

  • Context provider might categorize New York City as CITY.
  • Pattern provider might categorize New York City as LOCATION.

This can cause an inconsistency in the outputs generated across the providers.

Data Discovery ensures standardization of responses by aggregating similar outputs of the providers under a common classification name. In the example shown, the classification service will categorize New York City under the category LOCATION.

For a complete reference, see the supported classification entities and their harmonization categories.

Harmonization Process

The following pointers illustrate the harmonization process in detail.

Providers Mapping Entities

Each provider is responsible for mapping its identified entities to harmonized classification entities that are consistent with those used by other providers. This ensures that the classification service can accurately aggregate and interpret responses across multiple providers. When a provider’s classification is harmonized, the response must include the originally identified entity alongside the harmonized classification.

The following snippet shows how the Context classification provider initially classified the entity as CITY, which was then harmonized into the category LOCATION.

{
  "providers": "...",
  "classifications": {
    "LOCATION": [
      {
        "score": 0.9222000122070313,
        "location": {
          "start_index": 36,
          "end_index": 49
        },
        "classifiers": [
          {
            "provider_index": 0,
            "name": "SpacyRecognizer",
            "score": 0.85,
            "original_entity": "LOCATION",
            "details": {}
          },
          {
            "provider_index": 1,
            "name": "context",
            "score": 0.9944000244140625,
            "original_entity": "CITY",
            "details": {}
          }
        ]
      }
    ]
  }
}

Grouping by Matching Indexes

The entities are grouped together only if the responses shared by the providers contain the same start_index, end_index, and similar classification entity. If the start_index and end_index differ, the entities will not be grouped together.

As shown in the following snippet, the Context and Pattern providers classify the data as IT_IDENTITY_CARD and ID_CARD respectively. These are then grouped under the NATIONAL_ID category by the classification service.

{
  "providers": ...,
  "classifications": {
    "NATIONAL_ID": [
      {
        "score": 0.9236000061035157,
        "location": {
          "start_index": 14,
          "end_index": 25
        },
        "classifiers": [
          {
            "provider_index": 0,
            "name": "pattern_classification",
            "score": 0.85,
            "original_entity": "IT_IDENTITY_CARD" 
          }, {
            "provider_index": 1,
            "name": "context_classification",
            "score": 0.9972000122070312,
            "original_entity": "ID_CARD" 
          }
        ]
      }
    ]
  }
}

Non-Matching Indexes

If the responses for start_index and end_index differ, the entities will not be grouped together. However, the entities will appear under a common classification name.

The following table illustrates a common classification name for multiple providers.

ProviderOriginal Entity LabelsCommon Classification Name
Pattern ProviderLOCATIONLOCATION
Context ProviderCITY, STATE, COUNTRY, COUNTY, ZIP_CODE, STREET, BUILDING, GEO_COORDINATELOCATION

The following snippet illustrates the sample.

{
  "providers": "...",
  "classifications": {
    "LOCATION": [
      {
        "score": 0.9236000061035157,
        "location": {
          "start_index": 0,
          "end_index": 35
        },
        "classifiers": [
          {
            "provider_index": 0,
            "name": "pattern_provider",
            "score": 0.85,
            "original_entity": "LOCATION"
          }
        ]
      },
      {
        "score": 0.9236000061035157,
        "location": {
          "start_index": 0,
          "end_index": 17
        },
        "classifiers": [
          {
            "provider_index": 1,
            "name": "context_provider",
            "score": 0.9972000122070312,
            "original_entity": "STREET"
          }
        ]
      },
      {
        "score": 0.9236000061035157,
        "location": {
          "start_index": 20,
          "end_index": 22
        },
        "classifiers": [
          {
            "provider_index": 1,
            "name": "context_provider",
            "score": 0.9972000122070312,
            "original_entity": "BUILDING"
          }
        ]
      },
      {
        "score": 0.9236000061035157,
        "location": {
          "start_index": 25,
          "end_index": 31
        },
        "classifiers": [
          {
            "provider_index": 1,
            "name": "context_provider",
            "score": 0.9972000122070312,
            "original_entity": "ZIP_CODE"
          }
        ]
      }
    ]
  }
}
Last modified : November 10, 2025