This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Classify Text API

Classify plain text unstructured data.

    Method

    POST

    URL

    http://{Host Address}/pty/data-discovery/v2/classify/text

    Query Parameters

    score_threshold

    • Type: float
    • Description: Optional. Exclude results with a score lower than this threshold.
    • Values: Minimum 0, Maximum 1.0
    • Default: 0.7

    Body

    • Content type must be a plain text and in an UTF-8 format.

    • Length of the body is limited to 10K Bytes.

    Sample Request

    curl -X POST "http://<Host_address>/pty/data-discovery/v2/classify/text?score_threshold=0.85" \
              -H "Content-Type: text/plain" \
              --data "You can reach Dave Elliot by phone 203-555-1286"
    import requests
        
        url = "http://<Host_address>/pty/data-discovery/v2/classify/text"
        params = {"score_threshold": 0.85}
        headers = {"Content-Type": "text/plain"}
        data = "You can reach Dave Elliot by phone 203-555-1286"
        
        response = requests.post(url, params=params, headers=headers, data=data, verify=False)
        
        print("Status code:", response.status_code)
        print("Response JSON:", response.json())
    URL: POST `http://<Host_address>/pty/data-discovery/v2/classify/text`
       Query Parameters:
       -score_threshold (optional), float between 0.0 and 1.0, default: 0.
       Headers:
       -Content-Type: text/plain
       Body:
       -You can reach Dave Elliot by phone 203-555-1286

    Sample Response

    {
        "providers": [
            {
                "name": "Pattern Classification Provider",
                "version": "...",
                "status": 200,
                "elapsed_time": 0.028261899948120117,
                "config_provider": {
                    "name": "Pattern",
                    "address": "http://pattern_provider_service:8051",
                    "supported_content_types": []
                }
            },
            {
                "name": "Context Classification Provider",
                "version": "...",
                "status": 200,
                "elapsed_time": 0.040960073471069336,
                "config_provider": {
                    "name": "Context",
                    "address": "http://context_provider_service:8052",
                    "supported_content_types": []
                }
            }
        ],
        "classifications": {
            "PERSON": [
                {
                    "score": 0.9238499879837037,
                    "location": {
                        "start_index": 14,
                        "end_index": 25
                    },
                    "classifiers": [
                        {
                            "provider_index": 0,
                            "name": "SpacyRecognizer",
                            "score": 0.85,
                            "original_entity": "PERSON",
                            "details": {}
                        },
                        {
                            "provider_index": 1,
                            "name": "context",
                            "score": 0.9976999759674072,
                            "original_entity": "NAME",
                            "details": {}
                        }
                    ]
                }
            ],
            "PHONE_NUMBER": [
                {
                    "score": 0.9995999932289124,
                    "location": {
                        "start_index": 35,
                        "end_index": 47
                    },
                    "classifiers": [
                        {
                            "provider_index": 1,
                            "name": "context",
                            "score": 0.9995999932289124,
                            "original_entity": "PHONE",
                            "details": {}
                        }
                    ]
                }
            ]
        }
    }

    Response Fields Description

    Providers Section

    NameExample ResponseDescription
    providersArrayArray of provider objects that participated in the request, including their respective success or failure codes.
    providers[n].namePattern Classification ProviderProduct name of the provider.
    providers[n].version2.0.0Version of the provider.
    providers[n].status200HTTP response code returned by the provider.
    providers[n].elapsed_time0.028Time, in seconds, taken by the provider to process the request.
    providers[n].config_providerObjectObject containing configuration details for each provider.
    providers[n].config_provider.namePatternInternal name of the provider.
    providers[n].config_provider.addresshttp://pattern_provider_service:8051Network address or endpoint of the provider.
    providers[n].config_provider.supported_content_types[]Array of supported content types. An empty array indicates support for all content types.

    Classifications Section

    NameExample ResponseDescription
    classificationsDictionaryA dictionary mapping entity types (e.g., “PERSON”, “PHONE_NUMBER”) to arrays of occurrence objects. Each key is an entity type, and its value is a list of detected occurrences, each containing location and classifier details.
    classifications[’entity’][n].score0.9238The confidence score for the detected entity, aggregated from all contributing classifiers.
    classifications[’entity’][n].locationObjectAn object specifying the location of the entity within the input text.
    classifications[’entity’][n].location.start_index14The starting index of the entity in the input text.
    classifications[’entity’][n].location.end_index25The ending index of the entity in the input text.
    classifications[’entity’][n].classifiersArrayAn array of classifier objects that contributed to the entity detection.
    classifications[’entity’][n].classifiers[m].provider_index0The index of the provider in the top-level providers array.
    classifications[’entity’][n].classifiers[m].nameSpacyRecognizerThe name of the classifier. A provider may have multiple classifiers.
    classifications[’entity’][n].classifiers[m].score0.85The score assigned by the classifier for the entity detection.
    classifications[’entity’][n].classifiers[m].original_entityPERSONThe original entity type detected by the classifier. See Harmonization for details.
    classifications[’entity’][n].classifiers[m].detailsObjectOptional. Additional key-value details provided by the classifier.

    Response Codes

    Response CodeDescription
    200Successful Response.
    206Partial Content. Only some providers classifed data successfully.
    400Bad Request. Invalid input parameters or content.
    413Payload too large.
    415Unsupported media type.
    422Untrusted input. For more information, refer to Input Validation
    502Bad Gateway. All upstream providers failed; no successful data aggregation possible.
    598Unexpected internal server error. Check server logs.
    599Internal server error. Check server logs.