This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Transform

Identify, Classify & Transform sensitive data.

1: Label Text API

1.1: Handling Overlapping Conflicts
1.2: Sample Response Default
1.3:
1.4:

1 - Label Text API

Identify and classify plain-text sensitive data. Replace the sensitive data with labels of the classified data types, such as, <CREDIT_CARD> and so on.

Method

POST

URL

http://{Host Address}/pty/data-discovery/v2/transform/label

Query Parameters

score_threshold

Type: float
Description: Optional. Label results where the score is greater than this threshold.
Values: Minimum 0, Maximum 1.0
Default: 0.7

include_providers

Type: binary
Description: Optional. Include details of the service providers in the response.
Values: Yes / No
Default: No

include_classification_details

Type: binary
Description: Optional. Include classification details in the response.
Values: Yes / No
Default: No

Body

Content type must be text/plain and in UTF-8 format.
Body size is limited to 10K Bytes

Sample Request

curl -X POST "http://<Host_address>/pty/data-discovery/v2/transform/label?score_threshold=0.85" \
          -H "Content-Type: text/plain" \
          --data "Jake lives at 15 Main st, Hamden 06517, Connecticut."

import requests
    
    url = "http://<Host_address>/pty/data-discovery/v2/transform/label"
    params = {"score_threshold": 0.85}
    headers = {"Content-Type": "text/plain"}
    data = "Jake lives at 15 Main st, Hamden 06517, Connecticut."
    
    response = requests.post(url, params=params, headers=headers, data=data, verify=False)
    
    print("Status code:", response.status_code)
    print("Response JSON:", response.json())

URL: POST `http://<Host_address>/pty/data-discovery/v2/transform/label`
   Query Parameters:
   -score_threshold (optional), float between 0.0 and 1.0, default: 0.
   Headers:
   -Content-Type: text/plain
   Body:
   -Jake lives at 15 Main st, Hamden 06517, Connecticut.

Sample Responses

{
    "transform": {
        "text": "[PERSON] lives at [LOCATION] [LOCATION], [LOCATION] [LOCATION], [LOCATION]."
    },
    "providers": [
        {
            "name": "Pattern Classification Provider",
            "version": "...",
            "status": 200,
            "elapsed_time": 0.011328935623168945,
            "config_provider": {
                "name": "Pattern",
                "address": "http://pattern_provider_service:8051",
                "supported_content_types": []
            }
        },
        {
            "name": "Context Classification Provider",
            "version": "...",
            "status": 200,
            "elapsed_time": 0.03895401954650879,
            "config_provider": {
                "name": "Context",
                "address": "http://context_provider_service:8052",
                "supported_content_types": []
            }
        }
    ],
    "classifications": {
        "LOCATION": [
            {
                "score": 0.85,
                "location": {
                    "start_index": 17,
                    "end_index": 24
                },
                "classifiers": [
                    {
                        "provider_index": 0,
                        "name": "SpacyRecognizer",
                        "score": 0.85,
                        "original_entity": "LOCATION",
                        "details": {}
                    }
                ]
            },
            {
                "score": 0.9240000128746033,
                "location": {
                    "start_index": 26,
                    "end_index": 32
                },
                "classifiers": [
                    {
                        "provider_index": 0,
                        "name": "SpacyRecognizer",
                        "score": 0.85,
                        "original_entity": "LOCATION",
                        "details": {}
                    },
                    {
                        "provider_index": 1,
                        "name": "context",
                        "score": 0.9980000257492065,
                        "original_entity": "CITY",
                        "details": {}
                    }
                ]
            },
            {
                "score": 0.9244499981403351,
                "location": {
                    "start_index": 40,
                    "end_index": 51
                },
                "classifiers": [
                    {
                        "provider_index": 0,
                        "name": "SpacyRecognizer",
                        "score": 0.85,
                        "original_entity": "LOCATION",
                        "details": {}
                    },
                    {
                        "provider_index": 1,
                        "name": "context",
                        "score": 0.9988999962806702,
                        "original_entity": "STATE",
                        "details": {}
                    }
                ]
            },
            {
                "score": 0.9958999752998352,
                "location": {
                    "start_index": 14,
                    "end_index": 16
                },
                "classifiers": [
                    {
                        "provider_index": 1,
                        "name": "context",
                        "score": 0.9958999752998352,
                        "original_entity": "BUILDING",
                        "details": {}
                    }
                ]
            },
            {
                "score": 0.9983999729156494,
                "location": {
                    "start_index": 33,
                    "end_index": 38
                },
                "classifiers": [
                    {
                        "provider_index": 1,
                        "name": "context",
                        "score": 0.9983999729156494,
                        "original_entity": "ZIPCODE",
                        "details": {}
                    }
                ]
            }
        ],
        "PERSON": [
            {
                "score": 0.8819000124931335,
                "location": {
                    "start_index": 0,
                    "end_index": 4
                },
                "classifiers": [
                    {
                        "provider_index": 1,
                        "name": "context",
                        "score": 0.8819000124931335,
                        "original_entity": "NAME",
                        "details": {}
                    }
                ]
            }
        ]
    }
}

The fields for the transform section are described as follows:

Name	Example Response	Description
transform.text	[PERSON] lives at [LOCATION]..	The labed input text with classified entities listed by name in place of the original sensitive data

The fields for the providers section are described as follows:

Name	Example Response	Description
providers	Array	Array of provider objects that participated in the request, including their respective success or failure codes.
providers[n].name	Pattern Classification Provider	Product name of the provider.
providers[n].version	2.0.0	Version of the provider.
providers[n].status	200	HTTP response code returned by the provider.
providers[n].elapsed_time	0.028	Time, in seconds, taken by the provider to process the request.
providers[n].config_provider	Object	Object containing configuration details for each provider.
providers[n].config_provider.name	Pattern	Internal name of the provider.
providers[n].config_provider.address	http://pattern_provider_service:8051	Network address or endpoint of the provider.
providers[n].config_provider.supported_content_types	[]	Array of supported content types. An empty array indicates support for all content types.

The fields for the classificartion section are described as follows:

Name	Example Response	Description
classifications	Dictionary	A dictionary mapping entity types (e.g., “PERSON”, “PHONE_NUMBER”) to arrays of occurrence objects. Each key is an entity type, and its value is a list of detected occurrences, each containing location and classifier details.
classifications[’entity’][n].score	0.9238	The confidence score for the detected entity, aggregated from all contributing classifiers.
classifications[’entity’][n].location	Object	An object specifying the location of the entity within the input text.
classifications[’entity’][n].location.start_index	14	The starting index of the entity in the input text.
classifications[’entity’][n].location.end_index	25	The ending index of the entity in the input text.
classifications[’entity’][n].classifiers	Array	An array of classifier objects that contributed to the entity detection.
classifications[’entity’][n].classifiers[m].provider_index	0	The index of the provider in the top-level providers array.
classifications[’entity’][n].classifiers[m].name	SpacyRecognizer	The name of the classifier. A provider may have multiple classifiers.
classifications[’entity’][n].classifiers[m].score	0.85	The score assigned by the classifier for the entity detection.
classifications[’entity’][n].classifiers[m].original_entity	PERSON	The original entity type detected by the classifier. See Harmonization for details.
classifications[’entity’][n].classifiers[m].details	Object	Optional. Additional key-value details provided by the classifier.

Response Codes

Response Code	Description
200	Successful Response.
206	Partial Content. Only some providers classifed data successfully.
400	Bad Request. Invalid input parameters or content.
413	Payload too large.
415	Unsupported media type.
422	Untrusted input. For more information, refer to Input Validation
502	Bad Gateway. All upstream providers failed; no successful data aggregation possible.
598	Unexpected internal server error. Check server logs.
599	Internal server error. Check server logs.

1.1 - Handling Overlapping Conflicts

Resolving conflicts between entities that label sensitive data.

While classifying data, the providers may label an identical text under two different entities. This distinction arises from the detection strategies the classifiers adopt. Data Discovery handles these conflicts by applying certain rules on these conflicting entities.

The rules for handling the conflicting entities are as follows:

No overlap: If the two entities do not conflict, retain the results in the original form.
For example, Jake Filbert lives in Connecticut. If only Jake Filbert is identified, the result will be labeled as [NAME] lives in Connecticut.
Full overlap: If both the entities overlap, the following logic will be applied:
- Select the entity with a higher confidence score.
- If both the entities contain the same confidence score, select the first entity.
For example, Jake Filbert lives in Connecticut. Here, the name is recognized as [USER] with a score 0.7 and [NAME] with a score 0.9. As [NAME] has a higher score, the result will be labeled as [NAME] lives in Connecticut.
One entity contained in other: If one entity is completely contained in the other, select the entity with the longer text.
For example, jake@email.com. Here, the classifiers may recognize the text as [NAME] and [EMAIL]. As [EMAIL] is the longer text, the result will be labeled as [EMAIL].
Partial intersection. If the two entities overlap partially, the result will be a combination of both.
For example, 092-33445. Here, the classifiers may recognize the text as [PHONE_NUMBER] and [SSN]. The result will be labeled as [PHONE_NUMBER&SSN].

1.2 - Sample Response Default

Sample Response Default.

{ “transform”: { “text”: “[PERSON] lives at [LOCATION] [LOCATION], [LOCATION] [LOCATION], [LOCATION].” } }

The fields are described as follows:

Name	Example Response	Description
transform.text	[PERSON] lives at [LOCATION]..	The labed input text with classified entities listed by name in place of the original sensitive data

1.3 -

Name	Example Response	Description
transform.text	[PERSON] lives at [LOCATION]..	The labed input text with classified entities listed by name in place of the original sensitive data