Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.
Classify Text API
Classify plain text unstructured data.
POST http://{Host Address}/pty/data-discovery/v1.1/classify
Query Parameters
score_threshold
- Type:
float - Description: Optional. Exclude results with a score lower than this threshold.
- Values: Minimum 0, Maximum 1.0
- Default: 0.00
Body
Content type must be a plain text and in an UTF-8 format.
Length of the body is limited to 10K Bytes.
Sample Request
curl -X POST "http://<SERVER_IP>/pty/data-discovery/v1.1/classify?score_threshold=0.85" \
-H "Content-Type: text/plain" \
--data "You can reach Dave Elliot by phone 203-555-1286"import requests
url = "http://<SERVER_IP>/pty/data-discovery/v1.1/classify"
params = {"score_threshold": 0.85}
headers = {"Content-Type": "text/plain"}
data = "You can reach Dave Elliot by phone 203-555-1286"
response = requests.post(url, params=params, headers=headers, data=data, verify=False)
print("Status code:", response.status_code)
print("Response JSON:", response.json())URL: POST `http://<SERVER_IP>/pty/data-discovery/v1.1/classify`
Query Parameters:
-score_threshold (optional), float between 0.0 and 1.0, default: 0.
Headers:
-Content-Type: text/plain
Body:
-You can reach Dave Elliot by phone 203-555-1286Sample Response
{
"providers": [
{
"name": "Pattern Classification Provider",
"version": "1.1.1",
"status": 200,
"elapsed_time": 0.028261899948120117,
"config_provider": {
"name": "Pattern",
"address": "http://pattern_provider_service:8051",
"supported_content_types": []
}
},
{
"name": "Context Classification Provider",
"version": "1.1.1",
"status": 200,
"elapsed_time": 0.040960073471069336,
"config_provider": {
"name": "Context",
"address": "http://context_provider_service:8052",
"supported_content_types": []
}
}
],
"classifications": {
"PERSON": [
{
"score": 0.9238499879837037,
"location": {
"start_index": 14,
"end_index": 25
},
"classifiers": [
{
"provider_index": 0,
"name": "SpacyRecognizer",
"score": 0.85,
"original_entity": "PERSON",
"details": {}
},
{
"provider_index": 1,
"name": "context",
"score": 0.9976999759674072,
"original_entity": "NAME",
"details": {}
}
]
}
],
"PHONE_NUMBER": [
{
"score": 0.9995999932289124,
"location": {
"start_index": 35,
"end_index": 47
},
"classifiers": [
{
"provider_index": 1,
"name": "context",
"score": 0.9995999932289124,
"original_entity": "PHONE",
"details": {}
}
]
}
]
}
}Response Fields Description
Providers Section
| Name | Example Response | Description |
|---|---|---|
| providers | Array | Array of provider objects that participated in the request, including their respective success or failure codes. |
| providers[n].name | Pattern Classification Provider | Product name of the provider. |
| providers[n].version | 1.0.0 | Version of the provider. |
| providers[n].status | 200 | HTTP response code returned by the provider. |
| providers[n].elapsed_time | 0.028 | Time, in seconds, taken by the provider to process the request. |
| providers[n].config_provider | Object | Object containing configuration details for each provider. |
| providers[n].config_provider.name | Pattern | Internal name of the provider. |
| providers[n].config_provider.address | http://pattern_provider_service:8051 | Network address or endpoint of the provider. |
| providers[n].config_provider.supported_content_types | [] | Array of supported content types. An empty array indicates support for all content types. |
Classifications Section
| Name | Example Response | Description |
|---|---|---|
| classifications | Dictionary | A dictionary mapping entity types (e.g., “PERSON”, “PHONE_NUMBER”) to arrays of occurrence objects. Each key is an entity type, and its value is a list of detected occurrences, each containing location and classifier details. |
| classifications[’entity’][n].score | 0.9238 | The confidence score for the detected entity, aggregated from all contributing classifiers. |
| classifications[’entity’][n].location | Object | An object specifying the location of the entity within the input text. |
| classifications[’entity’][n].location.start_index | 14 | The starting index of the entity in the input text. |
| classifications[’entity’][n].location.end_index | 25 | The ending index of the entity in the input text. |
| classifications[’entity’][n].classifiers | Array | An array of classifier objects that contributed to the entity detection. |
| classifications[’entity’][n].classifiers[m].provider_index | 0 | The index of the provider in the top-level providers array. |
| classifications[’entity’][n].classifiers[m].name | SpacyRecognizer | The name of the classifier. A provider may have multiple classifiers. |
| classifications[’entity’][n].classifiers[m].score | 0.85 | The score assigned by the classifier for the entity detection. |
| classifications[’entity’][n].classifiers[m].original_entity | PERSON | The original entity type detected by the classifier. See Harmonization for details. |
| classifications[’entity’][n].classifiers[m].details | Object | Optional. Additional key-value details provided by the classifier. |
Response Codes
| Response Code | Description |
|---|---|
| 200 | Successful Response. |
| 206 | Partial Content. Only some providers classifed data successfully. |
| 400 | Bad Request. Invalid input parameters or content. |
| 413 | Payload too large. |
| 415 | Unsupported media type. |
| 422 | Untrusted input. For more information, refer to Input Validation |
| 502 | Bad Gateway. All upstream providers failed; no successful data aggregation possible. |
| 598 | Unexpected internal server error. Check server logs. |
| 599 | Internal server error. Check server logs. |