Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.

Input Validation

Rejecting unsanitized data.

The Classification service in Data Discovery offers an input validation security feature that rejects invalid input data. Data that is malformed, non-normalized, containing homoglyphs, hieroglyphs, mixed Unicode variants, or control characters is considered as unsanitized or invalid data. These are rejected and will not be classified.

The following are few examples of data that will be rejected:

  • 𝓉𝑒𝓍𝓉
  • Pep

Before invoking the Classification endpoint, ensure that the input text is normalized. Replace invalid characters by their corresponding normalized plaintext characters. If the input text contains any invalid character, a status code of 422 and a message Untrusted input is returned.

For security purposes, the application rejects unsanitized data by default. It is recommended that this feature remains enabled. However, to override this feature, perform the following steps.

  1. Navigate to the docker_compose directory.

  2. Edit the docker-compose.yaml file.

  3. Under the environment section of classification_service, append the security parameter as follows.

- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}
  
  1. Save the changes.

  2. If the application is already running, stop the containers first:

docker compose down
  
  1. Start the application with your configuration changes following the Docker Compose deployment guide:
docker compose up -d
  
  1. Navigate to the /eks/helm/classification_app directory.

  2. Create a values-override.yaml file with the required custom configuration.

securitySettings:
    ENABLE_ALL_SECURITY_CONTROLS: false
  
  1. Save the changes.

  2. If the application is already deployed, uninstall using the following command.

helm uninstall data-discovery-classification --namespace default --wait
  
  1. Run the following installation command.
helm install data-discovery-classification . \
    --namespace default \
    --create-namespace \
    --wait \
    --wait-for-jobs \
    --timeout 900s \
    -f values-override.yaml
  
Last modified : August 15, 2025