Data Discovery is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.
Input Sanitization
The Classification service in Data Discovery offers a security feature that sanitizes the input text. It ensures that invalid, fancy, or maliculous words are sanitized and normalized before data is classified. It normalizes, replaces hieroglyphs, and removes white spaces from input text.
The following are few examples of characters that will be converted to plaintext.
- “Ⅷ” will be converted to “VIII”.
- “𝓉𝑒𝓍𝓉” will be converted to “text”.
- “Pep” will be converted to “Pep”.
For security purposes, this feature is enabled in the application and it is recommended not to disable this feature.
However, if this feature is to be disabled, perform the following steps:
Navigate to the
docker_composedirectory.Edit the
docker-compose.yamlfile.Under the
environmentsection ofclassification_service, append the security parameter as follows.
- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}
Save the changes.
Run the
compose_down.shfile to undeploy the application.Run the
compose_up.shfile to redeploy the application.
Navigate to the
/eks/helm/classification_appdirectory.Edit the
values.yamlfile.Under
securitySettingssection, configure the security settings parameter as follows.
ENABLE_ALL_SECURITY_CONTROLS: false
Save the changes.
Navigate to the
eksdirectory and run theaws_undeploy.shfile to undeploy the application.Run the
aws_deploy.shfile to redeploy the application.