Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.

Input Sanitization

Normalizing input data.

The Classification service in Data Discovery offers a security feature that sanitizes the input text. It ensures that invalid, fancy, or maliculous words are sanitized and normalized before data is classified. It normalizes, replaces hieroglyphs, and removes white spaces from input text.

The following are few examples of characters that will be converted to plaintext.

“Ⅷ” will be converted to “VIII”.
“𝓉𝑒𝓍𝓉” will be converted to “text”.
“Ｐｅｐ” will be converted to “Pep”.

For security purposes, this feature is enabled in the application and it is recommended not to disable this feature.

However, if this feature is to be disabled, perform the following steps:

Navigate to the docker_compose directory.
Edit the docker-compose.yaml file.
Under the environment section of classification_service, append the security parameter as follows.

- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}

Save the changes.
Run the compose_down.sh file to undeploy the application.
Run the compose_up.sh file to redeploy the application.

Navigate to the /eks/helm/classification_app directory.
Edit the values.yaml file.
Under securitySettings section, configure the security settings parameter as follows.

ENABLE_ALL_SECURITY_CONTROLS: false

Save the changes.
Navigate to the eks directory and run the aws_undeploy.sh file to undeploy the application.
Run the aws_deploy.sh file to redeploy the application.

Last modified : June 26, 2025