Deploying the Application
Deploying the application and components.
The step-by-step deployment of Data Discovery on Amazon EKS is explained here. Each component builds on the previous, ensuring a reliable and production-ready environment.
The deployment is separated into two main phases:
- Phase 1: Infrastructure (Terraform) - Provisions the EKS cluster and underlying AWS resources
- Phase 2: Applications (Helm) - Deploys Kubernetes components and the Data Discovery application
After completing Step 1 (Terraform), if an existing EKS cluster is used, configure the kubectl context to connect to the cluster:
aws eks update-kubeconfig --region <region> --name <cluster-name>
# Replace `<region>` with your AWS region and `<cluster-name>` with your EKS cluster name.
1 - EKS Control Plane Provisioning (Terraform)
Deploy the required infrastructure - Terraform setup for EKS cluster, IAM roles, and VPC
Before you Begin
Ensure that the following points are considered.
Configuring the Parameters
Configure the following parameters in the terraform.tfvars file available in the terraform directory.
| Name | Description | Type | Required |
|---|
vpc_id | Existing VPC ID. | string | Yes |
vpc_subnet_ids | List of private subnet IDs. | list(string) | Yes |
cluster_name | Name of the EKS cluster. Default set to "eks-terraform". | string | No |
aws_region | Region for the AWS deployment. Default set to "us-east-1". | string | No |
eks_cluster_role_arn | Existing IAM role for EKS control plane. Default set to null. | string | No |
eks_node_role_arn | Existing IAM role for node group. Default set to null. | string | No |
Run the following script to deploy the application.
cd terraform
terraform init
terraform apply -auto-approve
Verifying the Installation
Run the following commands to verify the deployment.
Sample output:
eks_cluster_name = "eks-terraform"
eks_cluster_endpoint = "<Endpoint URL>"
eks_cluster_region = "us-east-1"
eks_update_kubeconfig_command = "aws eks update-kubeconfig --region us-east-1 --name eks-terraform"
Run the following command to verify the cluster that was created.
Sample output:
NAME NODECLASS NODES READY AGE
general-purpose default 0 True ...
system default 0 True ...
Updating kubeconfig after Deployment
After deploying the cluster, update the local kubeconfig to interact with the cluster. The following commands links the kubeconfig command to the new EKS cluster.
$(terraform output -raw eks_update_kubeconfig_command)
2 - Metrics Server
Deploy a Metrics Server for autoscaling capabilities.
Requirements
Run the following command to connect a local environment to the EKS cluster.
aws eks update-kubeconfig --region <region> --name <cluster-name>
Installing the Component
cd helm/metrics-server
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server || true
helm repo update
helm dependency build
helm install metrics-server . \
--namespace kube-system \
--create-namespace
For any custom configuration changes, create a values-override.yaml file and add -f values-override.yaml to the helm install command. It is not recommended to modify the configurations in the values.yaml file.
Verifying the Installation
Check that the Metrics Server deployment is ready:
kubectl get deployment metrics-server -n kube-system
Sample output.
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 ...
Run the following command to verify that node metrics are available.
Uninstalling the Component
Run the following command to uninstall the Metrics Server:
helm uninstall metrics-server \
--namespace kube-system
3 - Karpenter NodePool
Deploy a Karpenter NodePool for EKS to enable automatic node provisioning and scaling for Data Discovery workloads.
Requirements
An EKS cluster is provisioned.
The cluster is connected and the kubeconfig is properly configured.
karpenter.sh/v1 CRDs are available. Auto Mode includes these by default.
Run the following command to connect a local environment to the EKS cluster.
aws eks update-kubeconfig --region <region> --name <cluster-name>
Installing the Component
cd helm/karpenter-node-pool
helm install karpenter-nodepool . \
--namespace default \
--create-namespace
Verifying the Installation
Run the following command to check the NodePool resource.
Sample output after the process is completed.
NAME NODECLASS NODES READY AGE
m5-large-node-pool default 0 True ...
No nodes will appear until a matching workload is scheduled. Node creation is confirmed after a pod requests this NodePool’s label.
Uninstalling the Component
Run the following command to uninstall the Karpenter NodePool.
helm uninstall karpenter-nodepool \
--namespace default
Ensure that no workloads are actively using this NodePool before removal. Any running pods scheduled on nodes from this pool may be terminated during the uninstall process.
4 - Ingress Controller
Deploy an internal-only NGINX ingress controller with private AWS NLB for a secure TLS-only access to Data Discovery services within your VPC.
Requirements
Run the following command to connect a local environment to the EKS cluster.
aws eks update-kubeconfig --region <region> --name <cluster-name>
Configuration
This chart wraps the official ingress-nginx chart using the alias private-ingress and allows to customize the default certificate that is used on all TLS communications handled by this controller.
To configure TLS certificates, place the certificate files in the following folder.
ingress-controller/certs/tls.crt
ingress-controller/certs/tls.key
For more information about creating TLS certificates, refer to Create and configure certificates (AWS docs)
It is recommended not to edit the values.yaml file unless required. To customize configurations, create a values-override.yaml file with the desired changes and use the -f values-override.yaml flag during installation.
Installing the Component
cd helm/ingress-controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx || true
helm repo update
helm dependency build
helm install ingress-controller . \
--namespace ingress-nginx \
--create-namespace \
--set-file tls.crt=./certs/tls.crt \
--set-file tls.key=./certs/tls.key
If TLS is not configured, ommit the --set-file tls lines in the command above.
For any custom configuration changes, create a values-override.yaml file and add -f values-override.yaml to the helm install command. It is not recommended to modify the configurations in the values.yaml file.
This deploys the controller (and a TLS secret if configured) under the ingress-nginx namespace and exposes it through an internal AWS NLB.
Verifying the Installation
Checking the controller pods
kubectl get pods -n ingress-nginx
Example output:
NAME READY STATUS RESTARTS AGE
private-ingress-controller-xxx 1/1 Running 0 ...
Confirming the service is created
kubectl get svc -n ingress-nginx
Example output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
private-ingress-controller LoadBalancer 10.x.x.x internal-<hash>.<region>.elb.amazonaws.com 443:xxxx/TCP
Checking the IngressClass
Example output:
NAME CONTROLLER PARAMETERS AGE
private-nginx k8s.io/ingress-nginx <none> ...
This IngressClass is automatically used by any Ingress with no ingressClassName or one explicitly set to private-nginx.
Uninstalling the Component
Run the following command to uninstall the Ingress Controller.
helm uninstall ingress-controller \
--namespace ingress-nginx
This will remove the AWS Load Balancer and make any applications using this ingress controller inaccessible from outside the cluster. Ensure all dependent services are stopped or reconfigured before removal.
5 - Data Discovery Classification
Deploy the Data Discovery Classification service with Pattern and Context providers for data classification and transformation.
Requirements
The following requirements are mandatory before deploying the product.
The following components are optional.
Run the following command to connect a local environment to the EKS cluster.
aws eks update-kubeconfig --region <region> --name <cluster-name>
Installing the Service
- Define the docker registry credentials that were provided in the environment variables:
export DOCKER_USERNAME=myuser
export DOCKER_PASSWORD=mypassword
- Install the chart using the following command.
cd helm/data-discovery-classification
helm install data-discovery-classification . \
--namespace default \
--create-namespace \
--wait \
--wait-for-jobs \
--timeout 900s \
--set docker.creds.username=$DOCKER_USERNAME \
--set docker.creds.password=$DOCKER_PASSWORD
Note: For any custom configuration changes, create a values-override.yaml file and add -f values-override.yaml to the helm install command instead of modifying the default values.yaml file.
The --wait flag with a 15-minute timeout is recommended as the installation typically completes in 5-7 minutes due to large Docker image downloads. Monitor the installation progress in another terminal using the verification commands.
If a registry is used that does not require basic authentication (e.g., ECR or a private registry), ommit the --set docker lines in the command above.
Verifying the Installation
Get Deployments, Services, and HPAs
kubectl get deploy,svc,hpa -n default
Expected output:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/classification-deployment 1/1 1 1 ...
deployment.apps/context-provider-deployment 1/1 1 1 ...
deployment.apps/pattern-provider-deployment 1/1 1 1 ...
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/classification-service ClusterIP 172.20.x.x <none> 8050/TCP ...
service/context-provider-service ClusterIP 172.20.x.x <none> 8052/TCP ...
service/pattern-provider-service ClusterIP 172.20.x.x <none> 8051/TCP ...
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/classification-service-hpa Deployment/classification-deployment cpu: 50%/50% 1 5 1 ...
horizontalpodautoscaler.autoscaling/context-provider-hpa Deployment/context-provider-deployment cpu: 65%/65% 1 20 1 ...
horizontalpodautoscaler.autoscaling/pattern-provider-hpa Deployment/pattern-provider-deployment cpu: 90%/90% 1 3 1 ...
All deployments must show 1/1 in the READY column after deployment is completed. During startup, it is an expected behaviour to see 0/1 and cpu: <unknown>.
Ingress
kubectl get ingress -n default
Expected output:
NAME CLASS HOSTS ADDRESS PORTS AGE
classification-ingress-rule private-nginx * <load-balancer-dns>.elb.amazonaws.com. 443 ...
Ingress Endpoint Testing
INGRESS_HOST=$(kubectl get svc ingress-controller-private-ingress-controller \
-n ingress-nginx \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
# Fallback to IP
if [ -z "$INGRESS_HOST" ]; then
INGRESS_HOST=$(kubectl get svc ingress-controller-private-ingress-controller \
-n ingress-nginx \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
fi
echo "Ingress available at: $INGRESS_HOST"
Running Requests
curl -k https://$INGRESS_HOST/readiness
curl -k https://$INGRESS_HOST/healthz
curl -k https://$INGRESS_HOST/startup
curl -k -X POST https://$INGRESS_HOST/pty/data-discovery/v1.1/classify \
-H 'Content-Type: text/plain' \
--data 'You can reach Dave Elliot by phone 203-555-1286'
Custom Configuration
The chart is production-ready and the required configurations and default container images are set in the values.yaml file. However, customized container images can also be configured.
To use your own container images, perform the following steps:
- Create a
values-override.yaml file with the following configuration.
docker:
registry: "<Address of the image-repository>"
# e.g.:
# docker:
# registry: "registry.protegrity.com"
serviceImages:
classification: "<Name of the classification-image>"
pattern: "<Name of the pattern-provider-image>"
context: "<Name of the context-provider-image>"
# e.g.:
# serviceImages:
# classification: "products/data_discovery/1.1/classification_service:latest"
# pattern: "products/data_discovery/1.1/pattern_classification_provider:latest"
# context: "products/data_discovery/1.1/context_classification_provider:latest"
- Run the following installation command.
helm install data-discovery-classification . \
--namespace default \
--create-namespace \
--wait \
--wait-for-jobs \
--timeout 900s \
--set docker.creds.username=$DOCKER_USERNAME \
--set docker.creds.password=$DOCKER_PASSWORD \
-f values-override.yaml
Uninstalling the Service
Run the following command to uninstall the Data Discovery Classification application.
helm uninstall data-discovery-classification \
--namespace default \
--wait \
--timeout 300s
This will remove the classification, pattern provider, and context provider services. Also, the associated ConfigMaps, Services, and HPA resources will be removed. Any persistent data or logs will be lost during this process.
Resources may take a couple of minutes to be fully terminated. Re-installing immediately after uninstall can lead to an inconsistent state. Wait for all pods to be completely removed before reinstalling.
Troubleshooting
Run the following commands to inspect the state of the deployment.
Viewing all Pods in the Namespace
kubectl get pods -n default
Viewing all Services in the Namespace
kubectl get svc -n default
Viewing Logs for a Specific Pod
kubectl logs <pod-name> -n default
Describing a Specific Pod
kubectl describe pod <pod-name> -n default