This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

AWS Databricks Protector

Using AWS Databricks Protector

The Protegrity Big Data Protector for AWS Databricks delivers end‑to‑end data protection. Organizations deploying the Big Data Protector rely on modern, supported storage options such as Workspace storage, Unity Catalog Volumes, and cloud object storage like Amazon S3.

Designed to secure sensitive data across analytics pipelines, the Big Data Protector applies advanced tokenization and encryption during Spark execution and enforces centralized, policy‑driven controls. Whether installed via Workspace-backed paths or deployed using S3 buckets for configuration and script delivery, the Protector ensures resilient execution across AWS Databricks clusters.

By embracing cloud‑native storage paths, this approach ensures long‑term compatibility with Databricks platform changes while maintaining Protegrity’s standard of seamless and transparent protection. Organizations can continue to process high‑value datasets on AWS Databricks with confidence—knowing that sensitive information is secured across its lifecycle, even as the underlying platform evolves.

The Protegrity Big Data Protector for AWS Databricks empowers organizations to secure sensitive data across their analytics pipelines by combining high‑performance protection mechanisms with flexible deployment models tailored for modern cloud architectures. Central to this capability are two approaches; Application Protector REST (AP REST) and Cloud Protector approach. Each approach is designed to address different customer requirements around scalability, infrastructure usage, and cost optimization.

1 - Installing the AWS Databricks Protector

Steps to install the AWS Databricks Protector

Prerequisites

For more information about the prerequisites, refer to the sections listed below.

Register the jumpbox

To register and prepare the jumpbox, refer to Registering and preparing the jumpbox.

For the Application Protector REST Approach

For more information about the prerequisites, refer to For the Application Protector REST Approach.

For the Cloud Protector Approach

For more information about the prerequisites, refer to For the Cloud Protector Approach

Preparing the Environment

For more information about the preparing the environment, refer to Preparing the Environment.

Installing the Protector

For more information about installing the protector, refer to Creating the User Defined Functions.

Integrating the AWS Databricks Protector with Protegrity Provisioned Cluster (PPC)

To integrate the AWS Databricks Protector with PPC, perform the following steps:

When prompted for the ESA IP address, enter the PPC FQDN as configured in Step 4 of Deploying PPC. Ensure the FQDN does not exceed 50 characters. For the ESA listening port, enter 25400. These specific values are required to integrate the protector with the PPC.

Configuring the Protector

For more information about protector configuration, refer to Editing the Cluster Configuration.

2 - Uninstalling the AWS Databricks Protector

Steps to uninstall the AWS Databricks Protector

For more information about uninstalling the AWS Databricks Protector, refer to Dropping the User Defined Functions.