Migrating Tokenized Unicode Data

Migrating Tokenized Unicode Data between the Big Data Protector and the Teradata Database

The procedure to migrate tokenized Unicode data from and to a Teradata database are listed below.

This section is only applicable for Legacy Unicode and Base64 Unicode data element.
This section considers the Teradata database for reference.
In addition to the Teradata database, the Big Data Protector works with other databases, such as Netezza and Greenplum.

Migrating Tokenized Unicode Data from a Teradata Database

This section describes the task to unprotect the tokenized Unicode data in Hive, Impala, or Spark, which was tokenized in the Teradata database using the Protegrity Database Protector and then migrated to Hive, Impala, MapReduce, or Spark.

Ensure that the data elements used in the data security policy, deployed on the Teradata Database Protector and Big Data Protector machines are uniform.

From Teradata Database to Hive or Impala

To migrate Tokenized Unicode data from Teradata database to Hive or Impala and unprotect it using Hive or Impala protector:

Tokenize the Unicode data in the Teradata database using Protegrity Database Protector.
Migrate the tokenized Unicode data from the Teradata database to Hive or Impala.
To unprotect the tokenized Unicode data on Hive or Impala, ensure that the following UDFs are used, as required:
- Hive: ptyUnprotectUnicode()
- Impala: pty_UnicodeStringSel()

From Teradata database to Hadoop

To migrate Tokenized Unicode data from a Teradata database to Hadoop and unprotect it using MapReduce or Spark protector:

Migrate the tokenized Unicode data to the Hadoop ecosystem using any data migration utilities.
To unprotect the tokenized Unicode data using MapReduce or Spark, ensure that the following APIs are used, as required:
- MapReduce: public byte[] unprotect(String dataElement, byte[] data)
- Spark: void unprotect(String dataElement, List errorIndex, byte[][] input, byte[][] output)
Convert the protected tokens to bytes using UTF-8 encoding.
Send the data as input to the Unprotect API in the MapReduce or Spark protector, as required.
Convert the unprotected output in bytes to String using UTF-16LE encoding. The string data will display the data in cleartext format.

The following sample code snippet describes how to unprotect the Tokenized Unicode data, that is migrated from a Teradata database to Hadoop, using the MapReduce or Spark protector.

private Protector protector = null;
String[] unprotectinput= new String[SIZE] ;
byte[][] inputValueByte = new byte [unprotectinput.length][];
StringBuilder unprotectedString = new StringBuilder();
int x=0;
for (x=0; x< unprotectinput.length; x++)
inputValueByte[x]= unprotectinput[x].getBytes(StandardCharsets.UTF_8); // Point a implementation
protector.unprotect(DATAELEMENT_NAME, errorIndexList, inputValueByte, outputValueByte); //Point b implementation
unprotectedString.apprend(new String(outputValueByte[j],StandardCharsets.UTF_16LE))//Point c implementation

Migrating Tokenized Unicode Data to a Teradata Database

The steps to protect Unicode data in Hive, Impala, MapReduce, or Spark, migrate it to a Teradata database, and then unprotect the tokenized Unicode data using the Protegrity Database Protector are listed below.

Ensure that the data elements used in the data security policy, deployed on the Teradata Database Protector and Big Data Protector machines are uniform.

Migrating Tokenized Unicode data using Hive or Impala

To migrate Tokenized Unicode data using Hive or Impala protector to Teradata database:

To protect the Unicode data on Hive or Impala, ensure that the following UDFs are used, as required:
- Hive: ptyProtectUnicode()
- Impala: pty_UnicodeStringIns()
Migrate the tokenized Unicode data from Hive or Impala to the Teradata database.
To unprotect the tokenized Unicode data in the Teradata database, use the Protegrity Database Protector.

Migrating Unicode data using MapReduce or Spark protector

To protect Unicode data using MapReduce or Spark protector and migrate it to a Teradata database:

Convert the cleartext format Unicode data to bytes using UTF-16LE encoding.
To migrate the tokenized Unicode data using MapReduce or Spark to the Teradata database, ensure that the following APIs are used, as required:
- MapReduce: public byte[] protect(String dataElement, byte[] data)
- Spark: void protect(String dataElement, List<Integer> errorIndex, byte[][] input, byte[][] output)
Send the data as input to the Protect API in the MapReduce or Spark protector, as required.
Convert the protected output in bytes to String using UTF-8 encoding. The output is protected tokenized data.
Migrate the protected data to the Teradata database using any data migration utilities.

The following sample code snippet describes how to protect Unicode data using the MapReduce or Spark protector, and migrating it to a Teradata database.

private Protector protector = null;
String[] clear_data = new String[SIZE] ;
byte[][] inputValueByte = new byte [clear_data.length][];
StringBuilder protectedString = new StringBuilder();
inputValueByte= data.getBytes(StandardCharsets.UTF_16LE); //Point a implementation
protector.protect(DATAELEMENT_NAME, errorIndexList, inputValueByte, outputValueByte); //Point b implementation
int x=0;
for (x=0; x<outputValueByte.length; x++)
protectedString.append(new String(outputValueByte[x],StandardCharsets.UTF_8)); //Point c implementation

Feedback

Was this page helpful?

Last modified : January 20, 2026