This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

CDP-AWS-DataHub Protector

Using CDP-AWS-DataHub Protector

The CDP-AWS-DataHub UDFs and APIs provide a robust framework for securing sensitive data within Cloudera Data Platform (CDP) environments on AWS. These components are part of the Protegrity Big Data Protector architecture, enabling developers and data engineers to integrate advanced data protection directly into big data workflows. The User Defined Functions (UDFs) allow seamless encryption, tokenization, and de-tokenization of sensitive fields during Hive, Spark, and Impala operations. By embedding Protegrity UDFs into SQL queries, organizations can enforce column-level security without altering application logic. This ensures compliance while maintaining analytical performance.

To perform protect and unprotect operations using the User Defined Functions, refer to User Defined Functions and APIs.

1 - Installing the CDP-AWS-DataHub Protector

Steps to install the CDP-AWS-DataHub Protector

Setting up the CDP-AWS-DataHub Protector

The CDP-AWS-DataHub Protector v10.0.0 secures sensitive data across the Cloudera Data Platform (CDP) environments hosted on AWS. The protector leverages Protegrity’s tokenization and encryption features to secure data at rest, in transit, and during processing within AWS DataHub clusters.

Prerequisites

For a detailed information on the prerequistes, refer to System Requirements.

Register the jumpbox

To register and prepare the jumpbox, refer to Registering and preparing the jumpbox.

Integrating the CDP-AWS-DataHub Protector with Protegrity Provisioned Cluster (PPC)

To integrate the CDP-AWS-DataHub Protector with PPC, perform the following steps:

  1. Preparing the environment using the steps mentioned in the section Preparing the Environment.

  2. Install the Big Data Protector using the steps mentioned in the section Installing the Big Data Protector.

Note: When prompted for the ESA IP address, enter the PPC FQDN as configured in Step 4 of Deploying PPC. Ensure the FQDN does not exceed 50 characters. For the ESA listening port, enter 25400. These specific values are required to integrate the protector with the PPC.

Post Configuration Steps

For a detailed information on the post configuration steps, refer to Configuring the Big Data Protector.

2 - Uninstalling the CDP-AWS-DataHub Protector

Steps to uninstall the CDP-AWS-DataHub Protector

For more information about uninstalling the CDP AWS DataHub Protector, refer to Uninstalling the Big Data Protector.