Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.
Classify CSV API
Classify structured CSV data.
POST https://{Host Address}/pty/data-discovery/v1.1/classify
Query Parameters
score_threshold
- Type:
float - Description: Optional. Exclude results with a score lower than this threshold.
- Values: Minimum 0, Maximum 1.0
- Default: 0.00
has_headers
- Type:
boolean - Description: Optional. Indicates whether the first row represents the column header.
- Values: true/false
- Default: true
column_delimiter
- Type:
char - Description: Optional. Delimiter to separate the columns.
- Values: , |
- Default: ,
quote_char
- Type:
char - Description: Optional. Character to quote fields containing special characters, such as, the column_delimiter or new-line characters.
- Values: ""
Body
Content type should be
text/csvand in UTF-8 format.Body size is limited to 10K Bytes
Sample Request
curl -X POST "https://<SERVER_IP>/pty/data-discovery/v1.1/classify?score_threshold=0.85" \
--header 'Content-Type: text/csv' \
--data-raw 'Social Security Number,Credit Card Number,IBAN,Phone Number
589-25-1068,349384370543801,FR43 9255 4858 47BG 3EBG U4OK O18,(483) 9440301
636-36-3077,4041594844904,AL50 8947 4215 KAEY GAPM NLYC FNZG,(113) 5143119
748-82-2375,3558175715821800,AT34 4082 9269 0841 5702,(763) 5136237
516-62-9861,560221027976015000,FR22 0068 7181 11FB UG8H ECEM 306,(726) 6031636
121-49-9409,374283320982549,DK37 5687 8459 8060 79,(624) 9205200
838-73-3299,5558216060144900,CR54 8952 8144 6403 4765 0,(356) 9479541
439-11-5310,5048376143641900,RS76 6213 4824 0184 8983 74,(544) 5623326
564-06-8466,3543299511845640,EE51 6882 3443 7863 4703,(702) 6093849
518-54-5443,3543019452249540,IT65 D000 3874 2801 Z15I LNLL OOX,(584) 8618371'import requests
url = "https://<SERVER_IP>/pty/data-discovery/v1.1/classify"
params = {"score_threshold": 0.85}
headers = {"Content-Type": "text/csv"}
data = """Social Security Number,Credit Card Number,IBAN,Phone Number
589-25-1068,349384370543801,FR43 9255 4858 47BG 3EBG U4OK O18,(483) 9440301
636-36-3077,4041594844904,AL50 8947 4215 KAEY GAPM NLYC FNZG,(113) 5143119
748-82-2375,3558175715821800,AT34 4082 9269 0841 5702,(763) 5136237
516-62-9861,560221027976015000,FR22 0068 7181 11FB UG8H ECEM 306,(726) 6031636
121-49-9409,374283320982549,DK37 5687 8459 8060 79,(624) 9205200
838-73-3299,5558216060144900,CR54 8952 8144 6403 4765 0,(356) 9479541
439-11-5310,5048376143641900,RS76 6213 4824 0184 8983 74,(544) 5623326
564-06-8466,3543299511845640,EE51 6882 3443 7863 4703,(702) 6093849
518-54-5443,3543019452249540,IT65 D000 3874 2801 Z15I LNLL OOX,(584) 8618371
"""
response = requests.post(url, params=params, headers=headers, data=data, verify=False)
print("Status code:", response.status_code)
try:
print("Response JSON:", response.json())
except ValueError:
print("Response Text:", response.text)
URL: POST `https://<SERVER_IP>/pty/data-discovery/v1.1/classify`
Query Parameters:
-score_threshold (optional), float between 0.0 and 1.0, default: 0.
-has_headers (optional), Indicates whether the first row represents the column header.
-column_delimiter (optional), Delimiter to separate the columns.
-quote_char (optional), Character to quote fields containing special characters, such as, the column_delimiter or new-line characters.
Headers:
-Content-Type: text/csv
Body:
-Social Security Number,Credit Card Number,IBAN,Phone Number
589-25-1068,349384370543801,FR43 9255 4858 47BG 3EBG U4OK O18,(483) 9440301
636-36-3077,4041594844904,AL50 8947 4215 KAEY GAPM NLYC FNZG,(113) 5143119
748-82-2375,3558175715821800,AT34 4082 9269 0841 5702,(763) 5136237
516-62-9861,560221027976015000,FR22 0068 7181 11FB UG8H ECEM 306,(726) 6031636
121-49-9409,374283320982549,DK37 5687 8459 8060 79,(624) 9205200
838-73-3299,5558216060144900,CR54 8952 8144 6403 4765 0,(356) 9479541
439-11-5310,5048376143641900,RS76 6213 4824 0184 8983 74,(544) 5623326
564-06-8466,3543299511845640,EE51 6882 3443 7863 4703,(702) 6093849
518-54-5443,3543019452249540,IT65 D000 3874 2801 Z15I LNLL OOX,(584) 8618371
Sample Response
{
"providers": [
{
"name": "Pattern Classification Provider",
"version": "1.1.0",
"status": 200,
"elapsed_time": 0.31273603439331055,
"config_provider": {
"name": "Pattern",
"address": "http://pattern_provider_service:8051",
"supported_content_types": []
}
},
{
"name": "Context Classification Provider",
"version": "1.1.0",
"status": 200,
"elapsed_time": 1.1383004188537598,
"config_provider": {
"name": "Context",
"address": "http://context_provider_service:8052",
"supported_content_types": []
}
}
],
"classifications": {
"SOCIAL_SECURITY_ID": [
{
"score": 0.9994888835483127,
"rows_processed": 9,
"location": {
"column_name": "Social Security Number",
"column_index": 0
},
"classifiers": [
{
"provider_index": 1,
"name": "context",
"rows_with_classification": 9,
"total_classifications": 9,
"score": 0.9994888835483127,
"details": {}
}
]
}
],
"CREDIT_CARD": [
{
"score": 0.9986333317226834,
"rows_processed": 9,
"location": {
"column_name": "Credit Card Number",
"column_index": 1
},
"classifiers": [
{
"provider_index": 1,
"name": "context",
"rows_with_classification": 9,
"total_classifications": 9,
"score": 0.9986333317226834,
"details": {}
}
]
}
],
"BANK_ACCOUNT": [
{
"score": 0.7901234567901234,
"rows_processed": 9,
"location": {
"column_name": "IBAN",
"column_index": 2
},
"classifiers": [
{
"provider_index": 0,
"name": "IbanRecognizer",
"rows_with_classification": 8,
"total_classifications": 8,
"score": 0.8888888888888888,
"details": {}
}
]
}
],
"PHONE_NUMBER": [
{
"score": 0.9961333341068692,
"rows_processed": 9,
"location": {
"column_name": "Phone Number",
"column_index": 3
},
"classifiers": [
{
"provider_index": 1,
"name": "context",
"rows_with_classification": 9,
"total_classifications": 9,
"score": 0.9961333341068692,
"details": {}
}
]
}
]
}
}Response Fields Description
Providers Section
| Name | Example Response | Description |
|---|---|---|
| providers | Array | Array of provider objects that participated in the request, including their respective success or failure codes. |
| providers[n].name | Pattern Classification Provider | Product name of the provider. |
| providers[n].version | 1.0.0 | Version of the provider. |
| providers[n].status | 200 | HTTP response code returned by the provider. |
| providers[n].elapsed_time | 0.028 | Time, in seconds, taken by the provider to process the request. |
| providers[n].config_provider | Object | Object containing configuration details for each provider. |
| providers[n].config_provider.name | Pattern | Internal name of the provider. |
| providers[n].config_provider.address | http://pattern_provider_service:8051 | Network address or endpoint of the provider. |
| providers[n].config_provider.supported_content_types | [] | Array of supported content types. An empty array indicates support for all content types. |
Classifications Section
| Name | Example Response | Description |
|---|---|---|
| classifications | Dictionary | A dictionary mapping entity types (e.g., “SOCIAL_SECURITY_ID”, “CREDIT_CARD”) to arrays of occurrence objects. Each key is an entity type, and its value is a list of detected occurrences, each containing location, classifier, and row details. |
| classifications[’entity’][n].score | 0.9995 | The confidence score for the detected entity, aggregated and calculated from all contributing classifiers and their |
| reported scores. | ||
| classifications[’entity’][n].rows_processed | 9 | The number of rows passed to and processed by the classification request. |
| classifications[’entity’][n].location | Object | An object specifying the location of the entity within the CSV data. |
| classifications[’entity’][n].location.column_name | Social Security Number | The name of the column in which the entity was detected. |
| classifications[’entity’][n].location.column_index | 0 | The index of the column in which the entity was detected. |
| classifications[’entity’][n].classifiers | Array | An array of classifier objects that contributed to the entity detection. |
| classifications[’entity’][n].classifiers[m].provider_index | 1 | The index of the provider in the top-level providers array. |
| classifications[’entity’][n].classifiers[m].name | context | The name of the classifier. A provider may have multiple classifiers. |
| classifications[’entity’][n].classifiers[m].score | 0.9995 | The score assigned by the classifier for the entity detection. |
| classifications[’entity’][n].classifiers[m].rows_with_classification | 9 | The number of rows in which the entity was classified by this classifier. |
| classifications[’entity’][n].classifiers[m].total_classifications | 9 | The total number of classifications made by this classifier in this location. it is possible to find multiple entities within a single column, e.g., date and time, complex address, etc'. |
| classifications[’entity’][n].classifiers[m].details | Object | Optional. Additional key-value details provided by the classifier. |
Response Codes
| Response Code | Description |
|---|---|
| 200 | Successful Response. |
| 206 | Partial Content. Only some providers classifed data successfully. |
| 400 | Bad Request. Invalid input parameters or content. |
| 413 | Payload too large. |
| 415 | Unsupported media type. |
| 422 | Untrusted input. For more information, refer to Input Validation |
| 502 | Bad Gateway. All upstream providers failed; no successful data aggregation possible. |
| 598 | Unexpected internal server error. Check server logs. |
| 599 | Internal server error. Check server logs. |