This is the multi-page printable view of this section. Click here to print.
User Defined Functions and APIs
1 - MapReduce APIs
This section describes the MapReduce APIs available for protection and unprotection in the Big Data Protector to build secure Big Data applications.
Warning: The Protegrity MapReduce protector only supports bytes converted from the string data type.
If any other data type is directly converted to bytes and passed as input to the API that supports byte as input and provides byte as output, then data corruption might occur.
Caution: If you are using the Protect, or Unprotect, or Reprotect API which accepts byte as input and provides byte as output, then ensure that you pass the charset argument in APIs with the charset used to encode the string input data type.
For example, if the input String was encoded using the UTF-16LE charset, then ensure to pass the “UTF-16LE” charset argument in the ByteIn or ByteOut APIs.
Note: If you perform a security operation on a single data item, then an exception appears in case of any error. Similarly, if you perform a security operation on bulk data, then an exception appears in case of any error except for the error codes 22, 23, and 44. Instead of an error message, the UDFs return an error list for the individual items in the bulk data. For more information about the API error return codes, refer Return Codes for the Big Data Protector.
If you are using the Bulk APIs for the MapReduce protector, then the following two modes for error handling and return codes are available:
Default mode: Starting with the Big Data Protector, version 6.6.4, the Bulk APIs in the MapReduce protector will return the detailed error and return codes instead of
0forfailureand1forsuccess. In addition, the MapReduce jobs involving Bulk APIs will provide error codes instead of throwing exceptions.
For more information about the return codes for the Big Data Protector, refer .Backward compatibility mode: If you need to continue using the error handling capabilities provided with Big Data Protector, version 6.6.3 or lower, that is
0forfailureand1forsuccess, then you can set this mode.
Sample Code Usage
The MapReduce sample program, described in this section, is an example on how to use the Protegrity MapReduce protector APIs. The sample program utilizes the following two Java classes:
ProtectData.java– is the main class that calls the Mapper job.ProtectDataMapper.java– is the Mapper class that contains the logic to fetch the input data and store the protected content as output.
Main Job Class – ProtectData.java
ProtectData.java
package com.protegrity.samples.mapreduce;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class ProtectData extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
//Create the Job
Job job = new Job(getConf(), "ProtectData");
//Set the output key and value class
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
//Set the output key and value class
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Text.class);
//Set the Mapper class which will perform the protect job
job.setMapperClass(ProtectDataMapper.class);
//Set number of reducer task
job.setNumReduceTasks(0);
//Set the input and output Format class
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
//Set the jar class
job.setJarByClass(ProtectData.class);
//Store the input path and print the input path
Path input = new Path(args[0]);
System.out.println(input.getName());
//Store the output path and print the output path
Path output = new Path(args[1]);
System.out.println(output.getName());
//Add input and set output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//Call the job
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
System.exit(ToolRunner.run(new Configuration(), new ProtectData(), args));
}
}
Mapper Class – ProtectDataMapper.java
ProtectDataMapper.java
package com.protegrity.samples.mapreduce;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
//Need to import the ptyMapReduceProtector class to use the Protegrity MapReduce protector
import com.protegrity.hadoop.mapreduce.ptyMapReduceProtector;
//Create the Mapper class i.e. ProtectDataMapper which will extends the Mapper Class
public class ProtectDataMapper extends Mapper<Object, Text, NullWritable, Text> {
//Declare the member variable for the ptyMapReduceProtector class
private ptyMapReduceProtector mapReduceProtector;
//Declare the Array of Data Elements which will be required to do the protection/unprotection
private final String[] data_element_names = { "TOK_NAME", "TOK_PHONE", "TOK_CREDIT_CARD", "TOK_AMOUNT" };
//Initialize the mapreduce protector i.e ptyMapReduceProtector in the default constructor
public ProtectDataMapper() throws Exception {
// Create the new object for the class ptyMapReduceProtector
mapReduceProtector = new ptyMapReduceProtector();
// Open the session using the method " openSession("0") "
int openSessionStatus = mapReduceProtector.openSession("0");
}
//Override the map method to parse the text and process it line by line
//Split the inputs separated by delimiter "," in the line
//Apply the protect/unprotect operation
//Create the output text which will have protected/unprotected outputs separated by delimiter ","
//Write the output text to the context
@Override
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException
{
// Store the line in a variable strOneLine
String strOneLine = value.toString();
// Split the inputs separated by delimiter "," in the line
StringTokenizer st = new StringTokenizer(strOneLine, ",");
// Create the instance of StringBuilder to store the output
StringBuilder sb = new StringBuilder();
// Store the no of inputs in a line
int noOfTokens = st.countTokens();
if (mapReduceProtector != null) {
//Iterate through the string token and apply the protect/unprotect operation
for (int i = 0; st.hasMoreElements(); i++) {
String data = (String)st.nextElement();
if(i == 0) {
sb.append(new String(data));
} else {
//To protect data, call the function protect method with parameters data element and input data in bytes
//mapReduceProtector.protect( <Data Element> , <Data in bytes> )
//Output will be returned in bytes
//To unprotect data, call the function unprotect method with parameters data element and input data in bytes
//mapReduceProtector.unprotect( <Data Element> , <Data in bytes> )
//Output will be returned in bytes
byte[] bResult =
mapReduceProtector.protect(data_element_names[i-1], data.trim().getBytes());
if (bResult != null) {
// Store the result in string and append it to the output sb
sb.append(new String(bResult));
}
else {
// If output will be null, then store the result as "cryptoError" and append it to the output sb
sb.append("cryptoError");
}
}
if(i < noOfTokens -1 ) {
// Append delimiter "," at the end of the processed result
sb.append(",");
} } }
// write the output text to context
context.write(NullWritable.get(), new Text(sb.toString()));
}
//clean up the session and objects
@Override
protected void finalize() throws Throwable {
//Close the session
int closeSessionStatus = mapReduceProtector.closeSession();
mapReduceProtector = null;
super.finalize();
}
}
openSession( )
This method opens a new user session for protect and unprotect operations. It is a good practice to create one session per user thread.
Warning: This API is redundant and will be removed in the future releases.
Signature:
public synchronized int openSession(String parameter)
Parameters:
parameter: An internal API requirement that should be set to 0.
Result:
1: The function returns1if the session is successfully created.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
int openSessionStatus = mapReduceProtector.openSession("0");
Exception and Error Codes:
The function throws the ptyMapRedProtectorException exception if the session creation fails.
closeSession ()
This function closes the current open user session. Every instance of ptyMapReduceProtector opens only one session, and a session ID is not required to close it.
Warning: This API is redundant and will be removed in the future releases.
Signature:
public synchronized int closeSession()
Parameters:
- None
Result:
The function returns:
1- if the session is successfully closed.0- if the session closure is a failure.
Example
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
int openSessionStatus = mapReduceProtector.openSession("0");
int closeSessionStatus = mapReduceProtector.closeSession();
Exception and Error Codes:
- None
getVersion()
The function returns the current version of the protector.
Signature:
public String getVersion()
Parameters:
- None
Result:
- The function returns the current version of the protector.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
String version = mapReduceProtector.getVersion();
getVersionExtended()
The function returns the extended version information of the protector.
Signature:
public String getVersionExtended()
Parameters:
- None
Result:
The function returns a String in the following format:
"BDP: <1>; JcoreLite: <2>; CORE: <3>;"
where:
- 1 - Current version of Protector
- 2 - Jcorelite library version
- 3 - Core library version
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
String extendedVersion = mapReduceProtector.getVersionExtended();
checkAccess()
The function checks the access of the user for the specified data element(s).
Signature:
public boolean checkAccess(String dataElement, byte bAccessType, String... newDataElement)
Parameters:
dataElement: Specifies the name of the data element. (old data element when checking for reprotect access)bAccessType: Specifies the type of the access of the user for the data element(s).newDataElement: Specifies the name of the new data element when checking for reprotect access.The following are the different values for the bAccessType variable:
Access Value PROTECT 0x06 UNPROTECT 0x07 REPROTECT 0x08
Result:
- The function returns
trueif the user has access to the data element(s) for the specified operation. Else, the function returnsfalse.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
byte bAccessType = 0x06;
boolean isAccess = mapReduceProtector.checkAccess("DE_PROTECT" , bAccessType );
checkAccess() with Permission enum argument
The function checks the access of the user for the specified data element(s).
Signature:
public boolean checkAccess(String dataElement, Permission permission, String... newDataElement)
Parameters:
dataElement: Specifies the name of the data element. (old data element when checking for reprotect access).permission: Specifies the type of the access using BDPProtector.Permission enum of the user for the data element(s).newDataElement: Specifies the name of the new data element when checking for reprotect access.The following are the different values for the permission variable:
Access Value PROTECT Permission.PROTECT UNPROTECT Permission.UNPROTECT REPROTECT Permission.REPROTECT
Result:
- The function returns
trueif the user has access to the data element(s) for the specified operation. Else, the function returnsfalse.
Example:
import com.protegrity.bdp.protector.BDPProtector.Permission;
String dataElement = "dataelement";
ptyMapReduceProtector protector = new ptyMapReduceProtector();
boolean accessProtectType = protector.checkAccess(dataElement, Permission.PROTECT);
boolean accessReprotectType = protector.checkAccess(dataElement, Permission.REPROTECT,dataElement);
boolean accessUnprotectType = protector.checkAccess(dataElement, Permission.UNPROTECT);
protect() - Byte array data
The function protects the data provided as a byte array. The type of protection applied is defined by the dataElement.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer the section Date and Datetime tokenization in Protection Method Reference.
Signature:
public byte[] protect(String dataElement, byte[] data, String... CharSet)
Parameters:
dataElement: Specifies the name of the data element to protect the data.data: Is the byte array of data to be protected.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Warning: The Protegrity MapReduce protector only supports bytes converted from the string data type.
If any other data type is directly converted to bytes and passed as input to the API that supports byte as input and provides byte as output, then data corruption might occur.
Note: If you are using the Protect API which accepts byte as input and provides byte as output, then ensure that when unprotecting the data, the Unprotect API, with byte as input and byte as output is utilized. In addition, ensure that the byte data being provided as input to the Protect API has been converted from a string data type only.
Note: When the charset of input byte[] data is UTF-16LE or UTF-16BE, ensure to pass the charset argument.
Result:
- The function returns the byte array of protected data.
Exception:
- The function throws the
ptyMapRedProtectorExceptionin case of a failure to protect the data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
byte[] protectedResult = mapReduceProtector.protect("DE_PROTECT", "protegrity".getBytes(), "UTF-8");
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring | HMAC |
| protect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes | Yes |
protect() - Int data
The function protects the data provided as an int. The type of protection applied is defined by the dataElement.
Signature:
public int protect(String dataElement, int data)
Parameters:
dataElement: Specifies the name of the data element to be protected.data: Specifies the data in theintegerformat to be protected.
Result:
- The function returns the protected
intdata.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
int bResult = mapReduceProtector.protect("DE_PROTECT",1234);
Exception:
- The function throws the
ptyMapRedProtectorExceptionexception in case of failure to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Int data | Integer (4 Bytes) | No | No | Yes | No | Yes |
protect() - Long data
This function protects the data provided as long. The type of protection applied is defined by dataElement.
Signature:
public long protect(String dataElement, long data)
Parameters:
dataElement: Specifies the name of the data element used to protect the data.data: Specifies the data in thelongformat to be protected.
Result:
- The function returns the protected data in the
longformat.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
long bResult = mapReduceProtector.protect("DE_PROTECT",123412341234);
Exception:
- The function throws the
ptyMapRedProtectorExceptionexception in case of failure to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Long data | Integer (8 Bytes) | No | No | Yes | No | Yes |
unprotect() - Byte array data
This function returns the data in its original form.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer the section Date and Datetime tokenization in Protection Method Reference.
Signature:
public byte[] unprotect(String dataElement, byte[] data, String... charset)
Parameters:
dataElement: Is the name of data element to be unprotected.data: Is anarrayof data to be unprotected.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Note: When the charset of input byte[] data is UTF-16LE or UTF-16BE, ensure to pass the charset argument.
Note: The Protegrity MapReduce protector only supports bytes converted from the string data type.
If any other data type is directly converted to bytes and passed as input to the API that supports byte as input and provides byte as output, then data corruption might occur.
Result:
The function returns a byte array of unprotected data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
byte[] protectedResult = mapReduceProtector.protect( "DE_PROTECT_UNPROTECT", "protegrity".getBytes(), "UTF-8" );
byte[] unprotectedResult = mapReduceProtector.unprotect( "DE_PROTECT_UNPROTECT", protectedResult, "UTF-8" );
Exception:
- The function throws the
ptyMapRedProtectorExceptionexception in case of a failure to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes |
unprotect() - Int data
This function returns the data in its original form.
Signature:
public int unprotect(String dataElement, int data)
Parameters:
dataElement: Specifies the name of data element to unprotect the data.data: Is the data in theintformat to unprotect.
Result:
- The function returns the unprotected
intdata.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
int protectedResult = mapReduceProtector.protect( "DE_PROTECT_UNPROTECT",1234);
int unprotectedResult = mapReduceProtector.unprotect("DE_PROTECT_UNPROTECT", protectedResult);
Exception:
The function throws the ptyMapRedProtectorException exception in case of a failure to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Int data | Integer (4 Bytes) | No | No | Yes | No | Yes |
unprotect() - Long data
This function returns the data in its original form.
Signature:
public long unprotect(String dataElement, long data)
Parameters:
dataElement: Specifies the name of data element to unprotect the data.data: Is the data in thelongformat to unprotect.
Result:
- The function returns the unprotected
longdata.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
long protectedResult = mapReduceProtector.protect( "DE_PROTECT_UNPROTECT", 123412341234 );
long unprotectedResult = mapReduceProtector.unprotect("DE_PROTECT_UNPROTECT", protectedResult );
Exception:
The function throws the ptyMapRedProtectorException exception in case of a failure to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Long data | Integer (8 Bytes) | No | No | Yes | No | Yes |
bulkProtect() - Byte array data
This is used when a set of data needs to be protected in a bulk operation. It helps to improve performance.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer the section Date and Datetime tokenization in the Protection Method Reference.
Signature:
public byte[][] bulkProtect(String dataElement, List<Integer> errorIndex, byte[][] inputDataItems, String... charset)
Parameters:
dataElement: Specifies the name of data element used to protect the data.errorIndex: Is a list used to store all the error indices encountered while protecting each data entry ininputDataItems.inputDataItems: Is a two-dimensionalarrayto store the bulk data for protection.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Result:
- The function returns a two-dimensional byte array of protected data.
- If the Backward Compatibility mode is not set, then the appropriate error code appears. For more information about the return codes, refer
PEP Log Return CodesandPEP Result Codes. - If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk protect operation:
- 1: The protect operation for the entry is successful.
- 0: The protect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in the ESA forensics. - Any other value or garbage return value: The protect operation for the entry is unsuccessful. For more information about the failed entry, view the logs available in ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
byte[][] protectData = {"protegrity".getBytes(), "protegrity".getBytes(), "protegrity".getBytes(), "protegrity".getBytes()};
byte[][] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData, "UTF-8" );
System.out.print("Protected Data: ");
for(int i = 0; i < protectedData.length; i++)
{
//THIS WILL PRINT THE PROTECTED DATA
System.out.print(protectedData[i] == null ? null : new String(protectedData[i]));
if(i < protectedData.length - 1)
{
System.out.print(",");
}
}
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
//ABOVE CODE WILL PRINT THE ERROR INDEXES
Exception:
The function throws the ptyMapRedProtectorException if an error is encountered during bulk protection of the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring | HMAC |
| bulkProtect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes | Yes |
bulkProtect() - Int data
The function is used when a set of data needs to be protected in a bulk operation. It helps to improve performance.
Signature:
public int[] bulkProtect(String dataElement, List <Integer> errorIndex, int[] inputDataItems)
Parameters:
dataElement: Specifies the name of data element to protect the data..errorIndex: Is a list used to store all the error indices encountered while protecting each data entry in input Data Items.inputDataItems: Is anarrayto store the bulkintdata for protection.
Result:
The function returns the
intarray of protected data.If the Backward Compatibility mode is not set, then the appropriate error code appears. For more information about the return codes, refer PEP Log Return Codes and PEP Result Codes.
If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk protect operation:
- 1: The protect operation for the entry is successful.
- 0: The protect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in the ESA forensics. - Any other value or garbage return value: The protect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
int[] protectData = {1234, 5678, 9012, 3456};
int[] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData );
//CHECK THE ERROR INDEXES FOR ERRORS
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
//ABOVE CODE WILL ONLY PRINT THE ERROR INDEXES
Exception:
The function throws the ptyMapRedProtectorException exception if an error is encountered during bulk protection of the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| bulkProtect() - Int data | Integer (4 Bytes) | No | No | Yes | No | Yes |
bulkProtect() - Long data
The function is used when a set of data needs to be protected in a bulk operation. It helps to improve performance.
Signature:
public long[] bulkProtect(String dataElement, List <Integer> errorIndex, long[] inputDataItems)
Parameters:
dataElement: Specifies the name of data element to protect the data.errorIndex: Is a list used to store all the error indices encountered while protecting each data entry in input Data Items.inputDataItems: Is the array to store the data for protection.
Result:
- The function returns the long array of protected data.
- If the Backward Compatibility mode is not set, then the appropriate error code appears. For more information about the return codes, refer.
- If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk protect operation:
- 1: The protect operation for the entry is successful.
- 0: The protect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in the ESA forensics. - Any other value or garbage return value: The protect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in the ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
long[] protectData = {123412341234, 567856785678, 901290129012, 345634563456};
long[] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData );
//CHECK THE ERROR INDEXES FOR ERRORS
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
//ABOVE CODE WILL ONLY PRINT THE ERROR INDEXES
Exception:
The function throws the ptyMapRedProtectorException exception if an error is encountered during bulk protection of the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| bulkProtect() - Long data | Integer (8 Bytes) | No | No | Yes | No | Yes |
bulkUnprotect() - Byte array data
This method unprotects in bulk the inputDataItems with the required data element.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar. For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
public byte[][] bulkUnprotect(String dataElement, List<Integer> errorIndex, byte[][] inputDataItems, String... charset)
Parameters:
dataElement: Specifies the name of data element to unprotect the data.errorIndex: Is a list of the error indices encountered while unprotecting each data entry ininputDataItems.inputDataItems: Is a two-dimensionalarrayto store the bulk data to unrpotect.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Result:
The function returns the two-dimensional byte array of unprotected data.
- If the Backward Compatibility mode is not set, then the appropriate error code appears. For more information about the return codes, refer PEP Log Return Codes and PEP Result Codes.
- If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk unprotect operation:
- 1: The unprotect operation for the entry is successful.
- 0: The unprotect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in ESA forensics. - Any other value or garbage return value: The unprotect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
byte[][] protectData = {"protegrity".getBytes(), "protegrity".getBytes(), "protegrity".getBytes(), "protegrity".getBytes()};
byte[][] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData, "UTF-8" );
//THIS WILL PRINT THE PROTECTED DATA
System.out.print("Protected Data: ");
for(int i = 0; i < protectedData.length; i++)
{
System.out.print(protectedData[i] == null ? null : new String(protectedData[i]));
if(i < protectedData.length - 1)
{
System.out.print(",");
}
}
//THIS WILL PRINT THE ERROR INDEX FOR PROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
byte[][] unprotectedData = mapReduceProtector.bulkUnprotect( "DE_PROTECT", errorIndex, protectedData, "UTF-8" );
//THIS WILL PRINT THE UNPROTECTED DATA
System.out.print("UnProtected Data: ");
for(int i = 0; i < unprotectedData.length; i++)
{
System.out.print(unprotectedData[i] == null ? null : new String(unprotectedData[i]));
if(i < unprotectedData.length - 1)
{
System.out.print(",");
}
}
//THIS WILL PRINT THE ERROR INDEX FOR UNPROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
Exception:
The function throws the ptyMapRedProtectorException exception for errors when unprotecting the data.
Supported Protection Methods:
| MapReduce APIs | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| bulkUnprotect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes |
bulkUnprotect() - Int data
This method unprotects in bulk the inputDataItems with the required data element.
Signature:
public int[] bulkUnprotect(String dataElement, List<Integer> errorIndex, int[] inputDataItems)
Parameters:
dataElement: Specifies the name of data element to unprotect the data.errorIndex: Is a list of the error indices encountered while unprotecting each data entry ininputDataItems.inputDataItems: Is theintarray that contains the data to be unprotected.
Result:
- The function returns the unprotected
intarray data. - If the Backward Compatibility mode is not set, then the appropriate error code appears.
For more information about the return codes, refer PEP Log Return Codes and PEP Result Codes. - If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk unprotect operation:
- 1: The unprotect operation for the entry is successful.
- 0: The unprotect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in ESA forensics. - Any other value or garbage return value: The unprotect operation for the entry is unsuccessful. For more information about the failed entry, view the logs available in ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
int[] protectData = {1234, 5678,9012,3456 };
int[] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData );
//THIS WILL PRINT THE ERROR INDEX FOR PROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
int[] unprotectedData = mapReduceProtector.bulkUnprotect( "DE_PROTECT", errorIndex, protectedData );
//THIS WILL PRINT THE ERROR INDEX FOR UNPROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
Exception:
The function throws the ptyMapRedProtectorException exception for errors while unprotecting the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| bulkUnprotect() - Int data | Integer (4 Bytes) | No | No | Yes | No | Yes |
bulkUnprotect() - Long data
This method unprotects in bulk the inputDataItems array with the required data element.
Signature:
public long[] bulkUnprotect(String dataElement, List<Integer> errorIndex, long[] inputDataItems)
Parameters:
dataElement: Specifies the name of data element to unprotect the data.errorIndex: Is a list of the error indices encountered while unprotecting each data entry ininputDataItemsinputDataItems: Is the longarraythat contains the data to unprotect.
Result:
- The function returns the unprotected
longarray data. - If the Backward Compatibility mode is not set, then the appropriate error code appears. For more information about the return codes, refer PEP Log Return Codes and PEP Result Codes.
- If the Backward Compatibility mode is set, then the Error Index includes one of the following values, per entry in the bulk unprotect operation:
- 1: The unprotect operation for the entry is successful.
- 0: The unprotect operation for the entry is unsuccessful.
For more information about the failed entry, view the logs available in the ESA forensics. - Any other value or garbage return value: The unprotect operation for the entry is unsuccessful. For more information about the failed entry, view the logs available in ESA forensics.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
List<Integer> errorIndex = new ArrayList<Integer>();
long[] protectData = { 123412341234, 567856785678, 901290129012, 345634563456 };
long[] protectedData = mapReduceProtector.bulkProtect( "DE_PROTECT", errorIndex, protectData );
//THIS WILL PRINT THE ERROR INDEX FOR PROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
long[] unprotectedData = mapReduceProtector.bulkUnprotect( "DE_PROTECT", errorIndex, protectedData );
//THIS WILL PRINT THE ERROR INDEX FOR UNPROTECT OPERATION
System.out.println("");
System.out.print("Error Index: ");
for(int i = 0; i < errorIndex.size(); i++)
{
System.out.print(errorIndex.get( i ));
if(i < errorIndex.size() - 1)
{
System.out.print(",");
}
}
Exception:
- The function throws the
ptyMapRedProtectorExceptionfor errors when unprotecting data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| bulkUnprotect() - Long data | Integer (8 Bytes) | No | No | Yes | No | Yes |
reprotect() - Byte array data
The function is used to reprotect the data that is protected earlier with a separate data element.
Signature:
public byte[] reprotect(String oldDataElement, String newDataElement, byte[] data, String... charset)
Parameters:
oldDataElement: Specifies the name of data element to protect the data earlier.newDataElement: Specifies the name of new data element to protect the data.data: Is an array that contains the data to be protected.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Note: If you are using Format Preserving Encryption (FPE) and Byte APIs, then ensure that the encoding, which is used to convert the string input data to bytes, matches the encoding that is selected in the Plaintext Encoding drop-down for the required FPE data element.
Result:
- The function returns the byte array of reprotected data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
byte[] protectedResult = mapReduceProtector.protect( "DE_PROTECT_1", "protegrity".getBytes(), "UTF-8" );
byte[] reprotectedResult = mapReduceProtector.reprotect( "DE_PROTECT_1", "DE_PROTECT_2", protectedResult, "UTF-8" );
Exception:
- The function throws the
ptyMapRedProtectorExceptionfor errors while reprotecting the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| reprotect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes |
reprotect() - Int data
The function is used to protect the data again, that is protected earlier, with a new data element.
Signature:
public int reprotect(String oldDataElement, String newDataElement, int data)
Parameters:
oldDataElement: Specifies the name of data element to protect the data earlier.newDataElement: Specifies the name of new data element to protect the data.data: Is an array that contains the data to be protected.
Result:
- The function returns the reprotected int data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
int protectedResult = mapReduceProtector.protect( "DE_PROTECT_1", 1234 );
int reprotectedResult = mapReduceProtector.reprotect( "DE_PROTECT_1", "DE_PROTECT_2", protectedResult );
Exception:
- The function throws the
ptyMapRedProtectorExceptionfor errors while reprotecting the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Int data | Integer (4 Bytes) | No | No | Yes | No | Yes |
reprotect() - Long data
The function is used to re-protect the data that has been protected earlier with a separate data element.
Signature:
public long reprotect(String oldDataElement, String newDataElement, long data)
Parameters:
oldDataElement: Specifies the name of data element to protect the data earlier.newDataElement: Specifies the name of new data element to protect the data.data: Is an array that contains the data to be protected.
Result:
- The function returns the reprotected long data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
long protectedResult = mapReduceProtector.protect( "DE_PROTECT_1", 123412341234 );
long reprotectedResult = mapReduceProtector.reprotect( "DE_PROTECT_1", "DE_PROTECT_2", protectedResult );
Exception:
- The function throws the
ptyMapRedProtectorExceptionfor errors while reprotecting the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Long data | Integer (8 Bytes) | No | No | Yes | No | Yes |
hmac()
Warning: It is recommended to use the HMAC data element with the protect() and bulkProtect() Byte APIs for hashing byte array data, instead of using the hmac() API.
This method performs data hashing using the HMAC operation on a single data item with a data element, which is associated with hmac. It returns hmac value of the given data with the given data element.
Warning: This function is marked for deprecation and will be removed from the future releases.
Signature:
public byte[] hmac(String dataElement, byte[] data)
Parameters:
String dataElement: Specifies the name of the data element to hash the data.byte[] data: Is an array that contains the data to be hashed.
Result:
- The function returns the byte array of HMAC data.
Example:
ptyMapReduceProtector mapReduceProtector = new ptyMapReduceProtector();
byte[] protectedResult = mapReduceProtector.hmac( "HMAC_DE", "protegrity".getBytes() );
Exception:
- The function throws the
ptyMapRedProtectorExceptionif an error occurs while hashing the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| hmac() | HMAC | No | No | Yes | No | Yes |
2 - Hive UDFs
Warning: If you are using Ranger or Sentry, then ensure that your policy provides create access permissions to the required UDFs.
This section lists the Hive UDFs available for protection and unprotection in the Big Data Protector.
ptyGetVersion()
This UDF returns the current version of the protector.
ptyGetVersion()
Parameters:
- None
Result:
- The UDF returns the current version of the protector.
Example:
create temporary function ptyGetVersion AS 'com.protegrity.hive.udf.ptyGetVersion';
select ptyGetVersion();
ptyGetVersionExtended()
This UDF returns the extended version information of the protector.
ptyGetVersionExtended();
Parameters:
- None
Result:
The UDF returns a String in the following format:
BDP: <1>; JcoreLite: <2>; CORE: <3>;
where:
- is the current version of the Protector
- is the Jcorelite library version
- is the Core library version
Example:
create temporary function ptyGetVersionExtended AS 'com.protegrity.hive.udf.ptyGetVersionExtended';
select ptyGetVersionExtended();
ptyWhoAmI()
This UDF returns the current logged in user.
ptyWhoAmI()
Parameters:
- None
Result:
- The UDF returns the current logged in user.
Example:
create temporary function ptyWhoAmI AS 'com.protegrity.hive.udf.ptyWhoAmI';
select ptyWhoAmI();
ptyProtectStr()
This UDF protects the string values.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar. For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
ptyProtectStr(String input, String dataElement)
Parameters:
String input: Specifies theStringvalue to protect.String dataElement: Is the name of the data element to protect the string value.
Result:
- The UDF returns the protected
stringvalue.
Example:
create temporary function ptyProtectStr AS 'com.protegrity.hive.udf.ptyProtectStr';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val string) row format delimited fields terminated by ','stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select (val) from temp_table;
select ptyProtectStr(val, 'Token_alpha') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyProtectStr() |
| No | Yes | Yes | Yes | Yes |
ptyUnprotectStr()
The UDF unprotects the protected string value.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar. For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
ptyUnprotectStr(String input, String dataElement)
Parameters:
String input: Specifies the protectedStringvalue to uprotect.String dataElement: Is the name of the data element to unprotect the string value.
Result:
- The UDF returns the unprotected
stringvalue.
Example:
create temporary function ptyProtectStr AS 'com.protegrity.hive.udf.ptyProtectStr';
create temporary function ptyUnprotectStr AS 'com.protegrity.hive.udf.ptyUnprotectStr';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select (val) from temp_table;
insert overwrite table protected_data_table select ptyProtectStr(val, 'Token_alpha') from test_data_table;
select ptyUnprotectStr(protectedValue, 'Token_alpha') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyUnprotectStr() |
| No | Yes | Yes | Yes | Yes |
ptyReprotect() - String Data
The UDF reprotects string format protected data, which was earlier protected using the ptyProtectStr UDF, with a different data element.
ptyReprotect(String input, String oldDataElement, String newDataElement)
Parameters:
String input: Specifies theStringvalue to reprotect.String oldDataElement: Specifies the name of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Result:
- The UDF returns the protected string value.
Example:
create temporary function ptyProtectStr AS 'com.protegrity.hive.udf.ptyProtectStr';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select (val) from temp_table;
insert overwrite table test_protected_data_table select ptyProtectStr(val,'Token_alpha') from test_data_table;
create table test_reprotected_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'Token_alpha', 'new_Token_alpha') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyReprotect() |
| No | Yes | Yes | Yes | Yes |
ptyProtectUnicode()
The UDF protects string (Unicode) values.
Warning: This UDF should be used only if you want to tokenize the Unicode data in Hive, and migrate the tokenized data from Hive to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyProtectUnicode(String input, String dataElement)
Parameters:
String input: Specifies thestring (Unicode)value to protect.String dataElement: Specifies the name of the data element to protect thestring (Unicode)value.
Result:
- The UDF returns the protected
stringvalue.
Example:
create temporary function ptyProtectUnicode AS 'com.protegrity.hive.udf.ptyProtectUnicode';
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
select ptyProtectUnicode(val, 'Token_unicode') from temp_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectUnicode() | - Unicode (Legacy) - Unicode Base64 | No | No | Yes | No | Yes |
ptyUnprotectUnicode()
The UDF unprotects the protected string (Unicode) value.
ptyUnprotectUnicode(String input, String dataElement)
Parameters:
String input: Specifies thestring (Unicode)value to unprotect.String dataElement: Specifies the name of the data element to unprotect thestring (Unicode)value.
Warning: This UDF should be used only if you want to tokenize the Unicode data in Teradata using the Protegrity Database Protector, and migrate the tokenized data from a Teradata database to Hive and detokenize the data using the Protegrity Big Data Protector for Hive. Ensure that you use this UDF with a Unicode tokenization data element only.
Result:
- The UDF returns the unprotected
string (Unicode)value.
Example:
create temporary function ptyProtectUnicode AS 'com.protegrity.hive.udf.ptyProtectUnicode';
create temporary function ptyUnprotectUnicode AS 'com.protegrity.hive.udf.ptyUnprotectUnicode';
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table protected_data_table select ptyProtectUnicode(val, 'Token_unicode') from temp_table;
select ptyUnprotectUnicode(protectedValue, 'Token_unicode') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectUnicode() | - Unicode (Legacy) - Unicode Base64 | No | No | Yes | No | Yes |
ptyReprotectUnicode()
The UDF reprotects the string format protected data, which was protected earlier using the ptyProtectUnicode UDF, with a different data element.
Warning: This UDF should be used only if you want to tokenize the Unicode data in Hive, and migrate the tokenized data from Hive to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyReprotectUnicode(String input, String oldDataElement, String newDataElement)
Parameters:
String input: Specifies theString(Unicode)value to reprotect.String oldDataElement: Specifies the name of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Result:
- The UDF returns the protected
stringvalue.
Example:
create temporary function ptyProtectUnicode AS
'com.protegrity.hive.udf.ptyProtectUnicode';
create temporary function ptyReprotectUnicode AS
'com.protegrity.hive.udf.ptyReprotectUnicode';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val string) row format delimited fields terminated by ','
stored as textfile;
create table test_protected_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) from temp_table;
insert overwrite table test_protected_data_table select ptyProtectUnicode(val, 'Unicode_Token') from test_data_table;
create table test_reprotected_data_table(val string) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotectUnicode(val, 'Unicode_Token','new_Unicode_Token') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectUnicode() | - Unicode (Legacy) - Unicode Base64 | No | No | Yes | No | Yes |
ptyProtectShort()
The UDF protects the SmallInt (Short) values.
Signature:
ptyProtectShort(SmallInt input, String dataElement)
Parameters:
SmallInt input: Specifies theSmallIntvalue to protect.String dataElement: Specifies the name of the data element to protect theSmallIntvalue.
Result:
- The UDF returns the protected
SmallIntvalue.
Example:
create temporary function ptyProtectShort AS 'com.protegrity.hive.udf.ptyProtectShort';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val smallint) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as smallint from temp_table;
select ptyProtectShort(val, 'Token_Integer_2') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectShort() | Integer 2 Bytes | No | No | Yes | No | Yes |
ptyUnprotectShort()
The UDF unprotects the protected SmallInt (Short) values.
Signature:
ptyUnprotectShort(SmallInt input, String dataElement)
Parameters:
SmallInt input: Specifies the protectedSmallIntvalue to unprotect.String dataElement: Specifies the name of the data element to unprotect theSmallIntvalue.
Result:
- The UDF returns the unprotected
SmallIntvalue.
Example:
create temporary function ptyProtectShort AS 'com.protegrity.hive.udf.ptyProtectShort';
create temporary function ptyUnprotectShort AS 'com.protegrity.hive.udf.ptyUnprotectShort';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val smallint) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue smallint) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as smallint from temp_table;
insert overwrite table protected_data_table select ptyProtectShort(val, 'Token_Integer_2') from test_data_table;
select ptyUnprotectShort(protectedValue, 'Token_Integer_2') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectShort() | Integer 2 Bytes | No | No | Yes | No | Yes |
ptyReprotect() - Short Data
The UDF reprotects the protected SmallInt (Short) data with a different data element.
Signature:
ptyReprotect(SmallInt input, String oldDataElement, String newDataElement)
Parameters:
SmallInt input: Specifies the SmallInt value to reprotect.String oldDataElement: Specifies the nName of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element used to reprotect the data.
Result
The UDF returns the reprotected SmallInt value.
Example
create temporary function ptyProtectShort AS 'com.protegrity.hive.udf.ptyProtectShort';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val smallint) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val smallint) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as smallint from temp_table;
insert overwrite table test_protected_data_table select ptyProtectShort(val, ' Token_Integer_2') from test_data_table;
create table test_reprotected_data_table(val smallint) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'Token_Integer_2', 'new_Token_Integer_2') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | Integer 2 Bytes | No | No | Yes | No | Yes |
ptyProtectInt()
The UDF protects integer values.
Signature:
ptyProtectInt(int input, String dataElement)
Parameters:
int input: Specifies theIntegervalue to protect.String dataElement: Specifies the name of the data element to protect theintegervalue.
Result:
- The UDF returns the protected
integervalue.
Example:
create temporary function ptyProtectInt AS 'com.protegrity.hive.udf.ptyProtectInt';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val int) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as int from temp_table;
select ptyProtectInt(val, 'Token_numeric') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectInt() | Integer 4 Bytes | No | No | Yes | No | Yes |
ptyUnprotectInt()
The UDF unprotects the protected integer value.
Signature:
ptyUnprotectInt(int input, String dataElement)
Parameters:
int input: Specifies theIntegervalue to unprotect.String dataElement: Specifies the name of the data element to uprotect theintegervalue.
Result:
- The UDF returns the unprotected
integervalue.
Example:
create temporary function ptyProtectInt AS 'com.protegrity.hive.udf.ptyProtectInt';
create temporary function ptyUnprotectInt AS 'com.protegrity.hive.udf.ptyUnprotectInt';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val int) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue int) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as int from temp_table;
insert overwrite table protected_data_table select ptyProtectInt(val, 'Token_numeric') from test_data_table;
select ptyUnprotectInt(protectedValue, 'Token_numeric') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectInt() | Integer 4 Bytes | No | No | Yes | No | Yes |
ptyReprotect() - Int Data
The UDF reprotects the protected integer data with a different data element.
Signature:
ptyReprotect(int input, String oldDataElement, String newDataElement)
Parameters:
int input: Specifies theIntegervalue to unprotect.String olddataElement: Specifies the name of the data element used to protect theintegervalue earlier.String newdataElement: Specifies the name of the new data element to reprotect theintegervalue.
Result:
- The UDF returns the protected
integervalue.
Example:
create temporary function ptyProtectInt AS 'com.protegrity.hive.udf.ptyProtectInt';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val int) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val int) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val int) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as int from temp_table;
insert overwrite table test_protected_data_table select ptyProtectInt(val, 'Token_Integer') from test_data_table;
create table test_reprotected_data_table(val int) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'Token_Integer', 'new_Token_Integer') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | Integer 4 Bytes | No | No | Yes | No | Yes |
ptyProtectBigInt()
The UDF protects the BigInt value.
Signature:
ptyProtectBigInt(BigInt input, String dataElement)
Parameters:
BigInt input: Specifies theBigIntvalue to protect.String dataElement: Specifies the name of the data element to protect theBigIntvalue.
Result:
- The UDF returns the protected
BigIntvalue.
Example:
create temporary function ptyProtectBigInt as 'com.protegrity.hive.udf.ptyProtectBigInt';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as bigint from temp_table;
select ptyProtectBigInt(val, 'BIGINT_DE') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectBigInt() | Integer 8 Bytes | No | No | Yes | No | Yes |
ptyUnprotectBigInt()
The UDF unprotects the protected BigInt value.
Signature:
ptyUnprotectBigInt(BigInt input, String dataElement)
Parameters:
BigInt input: Specifies the protectedBigIntvalue to unprotect.String dataElement: Specifies the name of the data element to unprotect theBigIntvalue.
Result:
- The UDF returns the unprotected
BigIntegervalue.
Example:
create temporary function ptyProtectBigInt as 'com.protegrity.hive.udf.ptyProtectBigInt';
create temporary function ptyUnprotectBigInt as 'com.protegrity.hive.udf.ptyUnprotectBigInt';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue bigint) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as bigint from temp_table;
insert overwrite table protected_data_table select ptyProtectBigInt(val, 'BIGINT_DE') from test_data_table;
select ptyUnprotectBigInt(protectedValue, 'BIGINT_DE') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectBigInt() | Integer 8 Bytes | No | No | Yes | No | Yes |
ptyReprotect() - BigInt Data
The UDF reprotects the protected BigInt format data with a different data element.
Signature:
ptyReprotect(Bigint input, String oldDataElement, String newDataElement)
Parameters:
BigInt input: Specifies theBigIntvalue to unprotect.String olddataElement: Specifies the name of the data element used to protect theBigIntvalue earlier.String newdataElement: Specifies the name of the new data element to reprotect theBigIntvalue.
Result:
- The UDF returns the protected
BigIntvalue.
Example:
create temporary function ptyProtectBigInt AS 'com.protegrity.hive.udf.ptyProtectBigInt';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as bigint from temp_table;
insert overwrite table test_protected_data_table select ptyProtectBigInt(val, 'Token_BigInteger') from test_data_table;
create table test_reprotected_data_table(val bigint) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, ' 'BIGINT_DE', 'new_BIGINT_DE') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | Integer 8 Bytes | No | No | Yes | No | Yes |
ptyProtectFloat()
The UDF protects the float value.
Signature:
ptyProtectFloat(Float input, String dataElement)
Parameters:
Float input: Specifies theFloatvalue to protect.String dataElement: Specifies the name of the data element to protect thefloatvalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
floatvalue.
Example:
create temporary function ptyProtectFloat as 'com.protegrity.hive.udf.ptyProtectFloat';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val float) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as float from temp_table;
select ptyProtectFloat(val, 'FLOAT_DE') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectFloat() | No | No | No | Yes | No | Yes |
ptyUnprotectFloat()
The UDF unprotects the protected float value.
Signature:
ptyUnprotectFloat(Float input, String dataElement)
Parameters:
Float input: Specifies theFloatvalue to unprotect.String dataElement: Specifies the name of the data element to unprotect thefloatvalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the unprotected
floatvalue.
Example:
create temporary function ptyProtectFloat as 'com.protegrity.hive.udf.ptyProtectFloat';
create temporary function ptyUnprotectFloat as 'com.protegrity.hive.udf.ptyUnprotectFloat';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val float) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue float) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as float from temp_table;
insert overwrite table protected_data_table select ptyProtectFloat(val, 'FLOAT_DE') from test_data_table;
select ptyUnprotectFloat(protectedValue, 'FLOAT_DE') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectFloat() | No | No | No | Yes | No | Yes |
ptyReprotect() - Float Data
The UDF reprotects the float format protected data with a different data element.
Signature:
ptyReprotect(Float input, String oldDataElement, String newDataElement)
Parameters:
Float input: Specifies theFloatvalue to unprotect.String olddataElement: Specifies the name of the data element used to protect theFloatvalue earlier.String newdataElement: Specifies the name of the new data element to reprotect theFloatvalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
floatvalue.
Example:
create temporary function ptyProtectFloat AS 'com.protegrity.hive.udf.ptyProtectFloat';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val float) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val float) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val float) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as float from temp_table;
insert overwrite table test_protected_data_table select ptyProtectFloat(val, 'NoEncryption') from test_data_table;
create table test_reprotected_data_table(val float) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'NoEncryption','NoEncryption') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | No | No | No | Yes | No | Yes |
ptyProtectDouble()
The UDF protects the double value.
Signature:
ptyProtectDouble(Double input, String dataElement)
Parameters:
Double input: Specifies theDoublevalue to protect.String dataElement: Specifies the name of the data element to protect thedoublevalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
doublevalue.
Example:
create temporary function ptyProtectDouble as 'com.protegrity.hive.udf.ptyProtectDouble';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val double) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as double from temp_table;
select ptyProtectDouble(val, 'DOUBLE_DE') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDouble() | No | No | No | Yes | No | Yes |
ptyUnprotectDouble()
The UDF unprotects the protected double value.
Signature:
ptyUnprotectDouble(Double input, String dataElement)
Parameters:
Double input: Specifies theDoublevalue to uprotect.String dataElement: Specifies the name of the data element to uprotect thedoublevalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the unprotected
doublevalue.
Example:
create temporary function ptyProtectDouble as 'com.protegrity.hive.udf.ptyProtectDouble';
create temporary function ptyUnprotectDouble as 'com.protegrity.hive.udf.ptyUnprotectDouble';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val double) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val double) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue double) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as double from temp_table;
insert overwrite table protected_data_table select ptyProtectDouble(val, 'DOUBLE_DE') from test_data_table;
select ptyUnprotectDouble(protectedValue, 'DOUBLE_DE') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDouble() | No | No | No | Yes | No | Yes |
ptyReprotect() - Double Data
The UDF reprotects the double format protected data with a different data element.
Signature:
ptyReprotect(Double input, String oldDataElement, String newDataElement)
Parameters:
Double input: Specifies thedoublevalue to reprotect.String oldDataElement: Specifies the name of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
doublevalue.
Example:
create temporary function ptyProtectDouble AS 'com.protegrity.hive.udf.ptyProtectDouble';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val double) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val double) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val double) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as double from temp_table;
insert overwrite table test_protected_data_table select ptyProtectDouble(val,'NoEncryption') from test_data_table;
create table test_reprotected_data_table(val double) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'NoEncryption','NoEncryption') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | No | No | No | Yes | No | Yes |
ptyProtectDec()
The UDF protects the decimal value.
Note: This API works only with the CDH 4.3 distribution.
Signature:
ptyProtectDec(Decimal input, String dataElement)
Parameters:
Decimal input: Specifies thedecimalvalue to protect.String dataElement: Specifies the name of the data element to protect thedecimalvalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
decimalvalue.
Example:
create temporary function ptyProtectDec as 'com.protegrity.hive.udf.ptyProtectDec';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as decimal from temp_table;
select ptyProtectDec(val, 'BIGDECIMAL_DE') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDec() | No | No | No | Yes | No | Yes |
ptyUnprotectDec()
The UDF unprotects the protected decimal value.
Note: This API works only with the CDH 4.3 distribution.
Signature:
ptyUnprotectDec(Decimal input, String dataElement)
Parameters:
Decimal input: Specifies thedecimalvalue to unprotect.String dataElement: Specifies the name of the data element to unprotect thedecimalvalue.
Result:
- The UDF returns the unprotected
decimalvalue.
Example:
create temporary function ptyProtectDec as 'com.protegrity.hive.udf.ptyProtectDec';
create temporary function ptyUnprotectDec as 'com.protegrity.hive.udf.ptyUnprotectDec';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue decimal) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as decimal from temp_table;
insert overwrite table protected_data_table select ptyProtectDec(val, 'BIGDECIMAL_DE') from test_data_table;
select ptyUnprotectDec(protectedValue, 'BIGDECIMAL_DE') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDec() | No | No | No | Yes | No | Yes |
ptyProtectHiveDecimal()
The UDF protects the decimal value.
Note: This API works only for distributions which include Hive, Version 0.11 and later.
Signature:
ptyProtectHiveDecimal(Decimal input, String dataElement)
Parameters:
Decimal input: Specifies thedecimalvalue to protect.String dataElement: Specifies the name of the data element to protect thedecimalvalue.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Caution: Before the ptyProtectHiveDecimal() UDF is called, Hive rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Result:
- The UDF returns the protected
decimalvalue.
Example:
create temporary function ptyProtectHiveDecimal as
'com.protegrity.hive.udf.ptyProtectHiveDecimal';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as decimal from temp_table;
select ptyProtectHiveDecimal(val, 'BIGDECIMAL_DE') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectHiveDecimal() | No | No | No | Yes | No | Yes |
ptyUnprotectHiveDecimal()
The UDF unprotects the protected decimal value.
Note: This API works only for distributions which include Hive, Version 0.11 and later.
Signature:
ptyUnprotectHiveDecimal(Decimal input, String dataElement)
Parameters:
Decimal input: Specifies thedecimalvalue to unprotect.String dataElement: Specifies the name of the data element to unprotect thedecimalvalue.
Result:
- The UDF returns the unprotected
decimalvalue.
Example:
create temporary function ptyProtectHiveDecimal as 'com.protegrity.hive.udf.ptyProtectHiveDecimal';
create temporary function ptyUnprotectHiveDecimal as 'com.protegrity.hive.udf.ptyUnprotectHiveDecimal';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val string) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue decimal) row format delimited fields terminated by ',' stored as textfile;
load data local inpath 'test_data.csv' overwrite into table temp_table;
insert overwrite table test_data_table select cast(val) as decimal from temp_table;
insert overwrite table protected_data_table select ptyProtectHiveDecimal(val,'BIGDECIMAL_DE') from test_data_table;
select ptyUnprotectHiveDecimal(protectedValue, 'BIGDECIMAL_DE') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectHiveDecimal() | No | No | No | Yes | No | Yes |
ptyReprotect() - Decimal Data
The UDF reprotects the decimal format protected data with a different data element.
Note: This API works only for distributions which include Hive, Version 0.11 and later.
Signature:
ptyReprotect(Decimal input, String oldDataElement, String newDataElement)
Parameters:
Decimal input: Specifies thedecimalvalue to reprotect.String oldDataElement: Specifies the name of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The UDF returns the protected
decimalvalue.
Example:
create temporary function ptyProtectHiveDecimal AS 'com.protegrity.hive.udf.ptyProtectHiveDecimal';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as decimal from temp_table;
insert overwrite table test_protected_data_table select ptyProtectHiveDecimal(val, 'NoEncryption') from test_data_table;
create table test_reprotected_data_table(val decimal) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'NoEncryption','NoEncyption') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | No | No | No | Yes | No | Yes |
ptyProtectDate()
The UDF protects the date format data, which is provided as an input.
Signature:
ptyProtectDate(Date input, String dataElement)
Parameters:
Date input: Specifies thedateformat data to protect.String dataElement: Specifies the name of the data element protect thedateformat data.
Result:
- The UDF returns the protected
dateformat data.
Example:
create temporary function ptyProtectDate AS 'com.protegrity.hive.udf.ptyProtectDate';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val date) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val date) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as date from temp_table;
select ptyProtectDate(val, 'Token_Date') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDate() | Date | No | No | Yes | No | Yes |
ptyUnprotectDate()
The UDF unprotects the protected date format data, provided as an input.
Signature:
ptyUnprotectDate(Date input, String dataElement)
Parameters:
Date input: Specifies thedateformat data to unprotect.String dataElement: Specifies the name of the data element unprotect thedateformat data.
Result:
- The UDF returns the unprotected
dateformat data.
Example:
create temporary function ptyProtectDate AS 'com.protegrity.hive.udf.ptyProtectDate';
create temporary function ptyUnprotectDate AS 'com.protegrity.hive.udf.ptyUnprotectDate';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val date) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val date) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue date) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as date from temp_table;
insert overwrite table protected_data_table select ptyProtectDate(val, 'Token_Date') from test_data_table;
select ptyUnprotectDate(protectedValue, 'Token_Date') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDate() | Date | No | No | Yes | No | Yes |
ptyReprotect() - Date Data
The UDF reprotects the date format protected data, which was earlier protected using the ptyProtectDate UDF, with a different data element.
Signature:
ptyReprotect(Date input, String oldDataElement, String newDataElement)
Parameters:
Date input: Specifies thedateformat data to reprotect.String oldDataElement: Specifies the name of the data element used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Result:
- The UDF returns the protected
dateformat data.
Example:
create temporary function ptyProtectDate AS 'com.protegrity.hive.udf.ptyProtectDate';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val date) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val date) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val date) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as date from temp_table;
insert overwrite table test_protected_data_table select ptyProtectDate(val,'Token_Date') from test_data_table;
create table test_reprotected_data_table(val date) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val, 'Token_Date', 'new_Token_Date') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | Date | No | No | Yes | No | Yes |
ptyProtectDateTime()
The UDF protects the timestamp format data provided as an input.
Signature:
ptyProtectDateTime(Timestamp input, String dataElement)
Parameters:
Timestamp input: Specifies the data in thetimestampformat to be protect.String dataElement: Specifies the name of the data element to protect thetimestampformat data.
Result:
- The UDF returns the protected
timestampdata.
Example:
create temporary function ptyProtectDateTime AS 'com.protegrity.hive.udf.ptyProtectDateTime';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as timestamp from temp_table;
select ptyProtectDateTime(val, 'Token_Timestamp') from test_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDateTime() | Datetime | No | No | Yes | No | Yes |
ptyUnprotectDateTime()
The UDF unprotects the protected timestamp format data provided as an input.
Signature:
ptyUnprotectDateTime(Timestamp input, String dataElement)
Parameters:
Timestamp input: Specifies thetimestampformat protected data to unprotect.String dataElement: Specifies the name of the data element to unprotect thetimestampformat data.
Result:
- The UDF returns the unprotected
timestampformat data.
Example:
create temporary function ptyProtectDateTime AS 'com.protegrity.hive.udf.ptyProtectDateTime';
create temporary function ptyUnprotectDateTime AS 'com.protegrity.hive.udf.ptyUnprotectDateTime';
drop table if exists test_data_table;
drop table if exists temp_table;
drop table if exists protected_data_table;
create table temp_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
create table protected_data_table(protectedValue timestamp) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as timestamp from temp_table;
insert overwrite table protected_data_table select ptyProtectDateTime(val, 'Token_Timestamp') from test_data_table;
select ptyUnprotectDateTime(protectedValue, 'Token_Timestamp') from protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDateTime() | Datetime | No | No | Yes | No | Yes |
ptyReprotect() - DateTime Data
The UDF reprotects the timestamp format protected data, which was earlier protected using the ptyProtectDateTime UDF, with a different data element.
Signature:
ptyReprotect(Timestamp input, String oldDataElement, String newDataElement)
Parameters:
Timestamp input: Specifies the data in thetimestampformat to reprotect.String oldDataElement: Specifies the name of the data element that was used to protect the data earlier.String newDataElement: Specifies the name of the new data element to reprotect the data.
Result:
- The UDF returns the protected
timestampformat data.
Example:
create temporary function ptyProtectDateTime AS 'com.protegrity.hive.udf.ptyProtectDateTime';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists temp_table;
create table temp_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
create table test_data_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
create table test_protected_data_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
insert overwrite table test_data_table select cast(val) as timestamp from temp_table;
insert overwrite table test_protected_data_table select ptyProtectDateTime(val,‘Token_Timestamp’) from test_data_table;
create table test_reprotected_data_table(val timestamp) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table test_reprotected_data_table select ptyReprotect(val,‘Token_Timestamp’, 'new_Token_Timestamp') from test_protected_data_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() | Datetime | No | No | Yes | No | Yes |
ptyProtectChar()
The UDF protects the char value.
Note: It is recommended to use the String UDFs, such as,
ptyProtectStr(),ptyUnprotectStr(), orptyReprotect()instead of the respective Char UDFs, such as,ptyProtectChar(),ptyUnprotectChar(), orptyReprotect()unless it is required to use the char data type only.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
ptyProtectChar(Char input, String dataElement)
Parameters:
Char input: Specifies thecharvalue to protect.String DataElement: Specifies the name of the data element to protect thecharvalue.
Warning: If you have fixed length data fields and the input data is shorter than the length of the field, then
ensure that you truncate the trailing white spaces and leading white spaces, if applicable, before passing the input to the respective Protect and Unprotect UDFs. The truncation of the white spaces ensures that the results of the protection and unprotection
operations will result in consistent data output across the Protegrity products.
Ensure that the lengths of the Char column in the source and target Hive tables are the same to avoid data corruption, since as per Hive behaviour, characters that exceed the defined Char column size, are truncated.
The UDF only supports Numeric, Alpha, Alpha Numeric, Upper-case Alpha, Upper Alpha-Numeric, and
Email tokenization data elements, and with length preservation selected.
Using any other data elements with this UDF is not supported.
Using non-length preserving data elements with this UDF is not supported.
Result:
- The UDF returns the protected
charvalue.
Example:
create temporary function ptyProtectChar AS 'com.protegrity.hive.udf.ptyProtectChar';
drop table if exists temp_table;
create table temp_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE temp_table;
select ptyProtectChar(val, 'TOKEN_ELEMENT') from temp_table;
Exception:
ptyHiveProtectorException: 21, Input or Output buffer too smallA non-length preserving data element is provided.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectChar() | All length preserving tokens | No | No | Yes | No | Yes |
ptyUnprotectChar()
The UDF unprotects the char value.
Note: It is recommended to use the String UDFs, such as,
ptyProtectStr(),ptyUnprotectStr(), orptyReprotect()instead of the respective Char UDFs, such as,ptyProtectChar(),ptyUnprotectChar(), orptyReprotect()unless it is required to use the char data type only.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
ptyUnprotectChar(Char input, String dataElement)
Parameters:
Char input: Specifies the protectedcharvalue to unprotect.String DataElement: Specifies the name of the data element to unprotect thecharvalue.
Warning: If you have fixed length data fields and the input data is shorter than the length of the field, then
ensure that you truncate the trailing white spaces and leading white spaces, if applicable, before
passing the input to the respective Protect and Unprotect UDFs.
The truncation of the white spaces ensures that the results of the protection and unprotection
operations will result in consistent data output across the Protegrity products.
Ensure that the lengths of the Char column in the source and target Hive tables are the same to avoid
data corruption, since as per Hive behaviour, characters that exceed the defined Char column size, are
truncated.
The UDF only supports Numeric, Alpha, Alpha Numeric, Upper-case Alpha, Upper Alpha-Numeric, and
Email tokenization data elements, and with length preservation selected.
Using any other data elements with this UDF is not supported.
Using non-length preserving data elements with this UDF is not supported.
Result:
- The UDF returns the unprotected
charvalue.
Example:
create temporary function ptyProtectChar AS 'com.protegrity.hive.udf.ptyProtectChar';
create temporary function ptyUnprotectChar AS 'com.protegrity.hive.udf.ptyUnprotectChar';
drop table if exists test_data_table;
drop table if exists protected_data_table;
create table test_data_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE test_data_table;
create table protected_data_table(protectedValue char(10)) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table protected_data_table select ptyProtectChar(val, 'TOKEN_ELEMENT') from test_data_table;
select ptyUnprotectChar(protectedValue,'TOKEN_ELEMENT') FROM protected_data_table;
Exception:
ptyHiveProtectorException: 21, Input or Output buffer too smallA non-length preserving data element is provided.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectChar() | All length preserving tokens | No | No | Yes | No | Yes |
ptyReprotect() - Char data
The UDF reprotects char format protected data with a different data element.
Note: It is recommended to use the String UDFs, such as,
ptyProtectStr(),ptyUnprotectStr(), orptyReprotect()instead of the respective Char UDFs, such as,ptyProtectChar(),ptyUnprotectChar(), orptyReprotect()unless it is required to use the char data type only.
Signature:
ptyReprotect(Char input, String oldDataElement, String newDataElement)
Parameters:
Char input: Specifies thecharvalue to reprotect.String oldDataElement: Specifies the name of the data element used to protect thecharvalue.String newDataElement: Specifies the name of the new data element to reprotect thecharvalue.
Warning: If you have fixed length data fields and the input data is shorter than the length of the field, then
ensure that you truncate the trailing white spaces and leading white spaces, if applicable, before
passing the input to the respective Protect and Unprotect UDFs.
The truncation of the white spaces ensures that the results of the protection and unprotection operations will result in consistent data output across the Protegrity products.
Ensure that the lengths of the Char column in the source and target Hive tables are the same to avoid data corruption, since as per Hive behaviour, characters that exceed the defined Char column size, are truncated.
The UDF only supports Numeric, Alpha, Alpha Numeric, Upper-case Alpha, Upper Alpha-Numeric, and Email tokenization data elements with length preservation selected.
Using any other data elements with this UDF is not supported.
Using non-length preserving data elements with this UDF is not supported.
Result:
- The UDF returns the protected
charvalue.
Example:
create temporary function ptyProtectChar AS 'com.protegrity.hive.udf.ptyProtectChar';
create temporary function ptyUnprotectChar AS 'com.protegrity.hive.udf.ptyUnprotectChar';
create temporary function ptyReprotect AS 'com.protegrity.hive.udf.ptyReprotect';
drop table if exists test_data_table;
drop table if exists protected_data_table;
drop table if exists unprotected_data_table;
drop table if exists reprotected_data_table;
create table test_data_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH 'test_data.csv' OVERWRITE INTO TABLE test_data_table;
create table protected_data_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table protected_data_table select ptyProtectChar(val, 'TOKEN_ELEMENT') from test_data_table;
create table reprotected_data_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table reprotected_data_table select ptyReprotect(val,'old_Token_alpha', 'new_Token_alpha') from protected_data_table;
create table unprotected_data_table(val char(10)) row format delimited fields terminated by ',' stored as textfile;
insert overwrite table unprotected_data_table select ptyUnprotectChar(val,'TOKEN_ELEMENT') from reprotected_data_table;
Exception:
ptyHiveProtectorException: 21, Input or Output buffer too smallA non-length preserving data element is provided.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotect() - Char data | All length preserving tokens | No | No | Yes | No | Yes |
ptyStringEnc()
The UDF encrypts the string value.
Signature:
ptyStringEnc(String input, String DataElement)
Parameters:
String input: Specifies thestringvalue to encrypt.String DataElement: Specifies the name of the data element to encrypt thestringvalue.
Warning:
- The string encryption UDFs are limited to accept 2 GB data size at maximum as input.
- Ensure that the field size for the protected binary data post the required encoding does not exceed the 2 GB input limit.
- The field size to store the input data is dependent on the encryption algorithm selected, such as, AES-128, AES-256, 3DES, and CUSP, and the encoding type selected, such as No Encoding, Base64, and Hex.
- Ensure that you set the input data size based on the required encryption algorithm and encoding to avoid exceeding the 2 GB input limit.
Result:
- The UDF returns an encrypted
binaryvalue.
Example:
create temporary function ptyStringEnc as 'com.protegrity.hive.udf.ptyStringEnc';
DROP TABLE IF EXISTS stringenc_data;
DROP TABLE IF EXISTS stringenc_data_protect;
CREATE TABLE stringenc_data (stringdata String) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/tmp/stringdata.csv' OVERWRITE INTO TABLE stringenc_data;
CREATE TABLE stringenc_data_protect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_protect SELECT base64(ptyStringEnc(stringdata,'AES128')) FROM stringenc_data;
Exception:
ptyHiveProtectorException: INPUT-ERROR: Tokenization or Format Preserving Data Elements are not supported: A data element, which is unsupported, is provided.java.io.IOException: Too many bytes before newline: 2147483648: The length of the input needs to be less than the maximum limit of 2 GB.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringEnc() | No |
| No | Yes | No | Yes |
Guidelines for Estimating Field Size of Data
The encryption algorithm and the field sizes in bytes required by the features, such as, Key ID (KID), Initialization Vector (IV), and Integrity Check (CRC) is listed in the following table.
| Encryption Algorithm | KID (size in Bytes) | IV (size in Bytes) | CRC (size in Bytes) |
|---|---|---|---|
| AES | 16 | 16 | 4 |
| 3DES | 8 | 8 | 4 |
| CUSP_TRDES | 2 | N/A | 4 |
| CUSP_AES | 2 | N/A | 4 |
Note: The number of bytes considered for 1 GB and 2 GB are
1073741824and2147483648respectively.
The byte sizes required by the input file, encoding type selected, and the encryption algorithm with the features selected is listed in the following table:
| Encoding Type | Encryption Algorithm | |||
| AES | 3DES | CUSP_TRDES | CUSP_AES | |
| AES | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 2147483647 | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 2147483648 | ||
| 3DES | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 1073741823 | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 1073741824 | ||
| CUSP_TRDES | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 1610612735 | (Input file size in Bytes) + (Bytes needed by Encryption Algorithm and Features) <= 1610612736 | ||
ptyStringDec()
The UDF decrypts the binary value.
Signature:
ptyStringDec(Binary input, String DataElement)
Parameters:
Binary input: Specifies the protectedBinaryvalue to unprotect.String DataElement: Specifies the name of the data element that was used to encrypt thestringvalue, to decrypt thebinaryvalue.
Result:
- The UDF returns the decrypted
stringvalue
Example:
create temporary function ptyStringEnc as 'com.protegrity.hive.udf.ptyStringEnc';
create temporary function ptyStringDec as 'com.protegrity.hive.udf.ptyStringDec';
DROP TABLE IF EXISTS stringenc_data;
DROP TABLE IF EXISTS stringenc_data_protect;
DROP TABLE IF EXISTS stringenc_data_unprotect;
CREATE TABLE stringenc_data (stringdata String) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/tmp/stringdata.csv' OVERWRITE INTO TABLE stringenc_data;
CREATE TABLE stringenc_data_protect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_protect SELECT base64(ptyStringEnc(stringdata,'AES128')) FROM stringenc_data;
CREATE TABLE stringenc_data_unprotect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_unprotect SELECT
ptyStringDec(unbase64(stringdata),'AES128') FROM stringenc_data_protect;
Exception:
ptyHiveProtectorException: INPUT-ERROR: First argument (Input Data to be unprotected) is not a valid Binary Datatype: The input data, which is not in binary format is provided.ptyHiveProtectorException: INPUT-ERROR: Tokenization or Format Preserving Data Elements are not supported: A data element, which is unsupported, is provided.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringDec() | No |
| No | Yes | No | Yes |
ptyStringReEnc()
The UDF re-encrypts the binary format encrypted data, with a different data element.
Signature:
ptyStringReEnc(Binary input, String oldDataElement, String newDataElement)
Parameters:
Binary input: Specifies thebinaryvalue to reencrypt.String oldDataElement: Specifies the name of the data element used to encrypt the data earlier.String newDataElement: Specifies the name of the new data element to reencrypt the data.
Result:
- The UDF returns the re-encrypted
binarydata.
Example:
create temporary function ptyStringEnc as 'com.protegrity.hive.udf.ptyStringEnc';
create temporary function ptyStringDec as 'com.protegrity.hive.udf.ptyStringDec';
create temporary function ptyStringReEnc as 'com.protegrity.hive.udf.ptyStringReEnc';
DROP TABLE IF EXISTS stringenc_data;
DROP TABLE IF EXISTS stringenc_data_protect;
DROP TABLE IF EXISTS stringenc_data_unprotect;
DROP TABLE IF EXISTS stringenc_data_reprotect;
DROP TABLE IF EXISTS stringenc_data_unprotect_after_reprotect;
CREATE TABLE stringenc_data (stringdata String) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/tmp/stringdata.csv' OVERWRITE INTO TABLE stringenc_data;
CREATE TABLE stringenc_data_protect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_protect SELECT base64(ptyStringEnc(stringdata,'AES128')) FROM stringenc_data;
CREATE TABLE stringenc_data_unprotect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_unprotect SELECT ptyStringDec(unbase64(stringdata),'AES128') FROM stringenc_data_protect;
CREATE TABLE stringenc_data_reprotect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_reprotect SELECT base64(ptyStringReEnc(unbase64(stringdata),'AES128','AES128_KID')) FROM
stringenc_data_protect;
CREATE TABLE stringenc_data_unprotect_after_reprotect (stringdata String) stored as textfile;
INSERT OVERWRITE TABLE stringenc_data_unprotect_after_reprotect SELECT ptyStringDec(unbase64(stringdata),'AES128_KID') FROM stringenc_data_reprotect;
Exception:
ptyHiveProtectorException: INPUT-ERROR: First argument (Input Data to be reprotected) is not a valid Binary Datatype: The input data, which is not in binary format is provided.java.io.IOException: Too many bytes before newline: 2147483648: The length of the input needs to be less than the maximum limit of 2 GB.com.protegrity.hive.udf.ptyHiveProtectorException: 26, Unsupported algorithm or unsupported action for the specific data element: The data element is not supported for this UDF.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringReEnc() | No |
| No | Yes | No | Yes |
3 - Pig UDFs
ptyGetVersion()
The function returns the current version of the protector.
Signature:
ptyGetVersion()
Parameters:
- None
Result:
- The function returns the version number in a chararray.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
// register pep pig version
DEFINE ptyGetVersion com.protegrity.pig.udf.ptyGetVersion;
//define UDF
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:chararray,name:chararray, ssn:chararray);
// load employee.csv from HDFS path
version = FOREACH employees GENERATE ptyGetVersion();
DUMP version;
ptyGetVersionExtended()
The function returns the extended version information of the protector.
Signature:
ptyGetVersionExtended()
Parameters:
- None
Result:
- The function returns a chararray in the following format:where,
BDP: <1>; JcoreLite: <2>; CORE: <3>;- is the current version of the Protector
- is the Jcorelite library version
- is the Core library version
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
// register pep pig version
DEFINE ptyGetVersionExtended com.protegrity.pig.udf.ptyGetVersionExtended;
//define UDF
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:chararray,name:chararray, ssn:chararray);
// load employee.csv from HDFS path
version = FOREACH employees GENERATE ptyGetVersionExtended();
DUMP version;
ptyWhoAmI()
The function returns the current logged in user name.
ptyWhoAmI()
Parameters:
None
Result:
- The function returns the User name in a chararray.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
DEFINE ptyWhoAmI com.protegrity.pig.udf.ptyWhoAmI;
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:chararray, name:chararray, ssn:chararray);
username = FOREACH employees GENERATE ptyWhoAmI();
DUMP username;
ptyProtectInt()
The function returns the protected value for integer data.
ptyProtectInt (int data, chararray dataElement)
Parameters:
int data: Specifies the data to protect.chararray dataElement: Specifies the name of the data element to use for data protection.
Result:
- The function returns the protected value for the given numeric data.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
DEFINE ptyProtectInt com.protegrity.pig.udf.ptyProtectInt;
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:int, name:chararray, ssn:chararray);
data_p = FOREACH employees GENERATE ptyProtectInt(eid, ‘token_integer’);
DUMP data_p;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectInt() | Integer 4 Bytes | No | No | Yes | No | Yes |
ptyUnprotectInt()
The function returns the unprotected value for protected data in the integer format.
ptyUnprotectInt (int data, chararray dataElement)
Parameters:
int data: Is the protected data.chararray dataElement: Specifies the name of the data element to unprotect the data.
Result:
The function returns the unprotected value for the specified protected integer data.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
DEFINE ptyProtectInt com.protegrity.pig.udf.ptyProtectInt;
DEFINE ptyUnprotectInt com.protegrity.pig.udf.ptyUnProtectInt;
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:int, name:chararray, ssn:chararray);
data_p = FOREACH employees GENERATE ptyProtectInt(eid, ‘token_integer’);
data_u = FOREACH data_p GENERATE ptyUnprotectInt(eid, ‘token_integer’);
DUMP data_u;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectInt() | Integer 4 Bytes | No | No | Yes | No | Yes |
ptyProtectStr()
The function protects the string value.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
ptyProtectStr(chararray input, chararray dataElement)
Parameters:
chararray data: Specifies thestringvalue to protect.chararray dataElement: Specifies the name of the data element to protect the string value.
Result:
- The function returns the protected
stringvalue in a chararray.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
DEFINE ptyProtectStr com.protegrity.pig.udf.ptyProtectStr;
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:chararray, name:chararray, ssn:chararray);
data_p = FOREACH employees GENERATE ptyProtectIntStr(name, ‘token_alphanumeric’);
DUMP data_p
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyProtectStr() |
| No | Yes | Yes | Yes | Yes |
ptyUnprotectStr()
The function unprotects the protected string value.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
ptyUnprotectStr (chararray input, chararray dataElement)
Parameters:
chararray input: Specifies the protectedstringvalue.chararray dataElement: Specifies the name of the data element to unprotect thestringvalue.
Result:
- The function returns the unprotected value in a chararray.
Example:
REGISTER </path/to/bdp/lib/>/peppig-<jar_version>.jar;
DEFINE ptyProtectInt com.protegrity.pig.udf.ptyProtectStr;
DEFINE ptyUnprotectInt com.protegrity.pig.udf.ptyUnProtectStr;
employees = LOAD ‘employee.csv’ using PigStorage(‘,’) AS (eid:chararray, name:chararray, ssn:chararray);
data_p = FOREACH employees
GENERATE ptyProtectStr(name, ‘token_alphanumeric’) as name:chararray
DUMP data_p;
data_u = FOREACH data_p GENERATE ptyUnprotectStr(ssn, ‘Token_alphanumeric’);
DUMP data_u;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyUnprotectStr() |
| No | Yes | Yes | Yes | Yes |
4 - HBase Commands
HBase is a database, which provides random read and write access to tables, consisting of rows and columns, in real-time. HBase is designed to run on commodity servers, to automatically scale as more servers are added, and is fault tolerant as data is divided across servers in the cluster. HBase tables are partitioned into multiple regions. Each region stores a range of rows in the table. Regions contain a datastore in memory and a persistent datastore (HFile). The Name node assigns multiple regions to a region server. The Name node manages the cluster and the region servers store portions of the HBase tables and perform the work on the data.
Overview of the HBase Protector
The Protegrity HBase protector extends the functionality of the data storage framework. It provides transparent data protection and unprotection using coprocessors. These coprocessors provide the functionality to run code directly on the region servers. The Protegrity coprocessor for HBase runs on the region servers and protects the data stored in the servers. All clients which work with HBase are supported. The data is transparently protected or unprotected, as required, utilizing the coprocessor framework.
HBase Protector Usage
The Protegrity HBase protector utilizes the get, put, and scan commands and calls the Protegrity coprocessor for the HBase protector. The Protegrity coprocessor for the HBase protector locates the metadata associated with the requested column qualifier and the current logged in user. If the data element is associated with the column qualifier and the current logged in user, then the HBase protector processes the data in a row based on the data elements defined by the security policy deployed in the Big Data Protector.
Warning: The Protegrity HBase coprocessor only supports bytes converted from the string data type. If any other data type is directly converted to bytes and inserted in an HBase table, which is configured with the Protegrity HBase coprocessor, then data corruption might occur.
Adding Data Elements and Column Qualifier Mappings to a New Table
In an HBase table, every column family of a table stores metadata for that family, which contain the column qualifier and data element mappings. Users need to add metadata to the column families for defining mappings between the data element and column qualifier, when a new HBase table is created. The following command creates a new HBase table with one column family.
create 'table', { NAME => 'column_family_1', METADATA => {'DATA_ELEMENT:credit_card'=>'CC_NUMBER','DATA_ELEMENT:name'=>'TOK_CUSTOMER_NAME' } }
Parameters:
table: Name of the table.column_family_1: Name of the column family.METADATA: Data associated with the column family.DATA_ELEMENT: Contains the column qualifier name. In the example, the column qualifier names credit_card and name, correspond to data elements CC_NUMBER and TOK_CUSTOMER_NAME respectively.
Adding Data Elements and Column Qualifier Mappings to an Existing Table
Users can add data elements and column qualifiers to an existing HBase table. Users need to alter the table to add metadata to the column families for defining mappings between the data element and column qualifier. The following command adds data elements and column qualifier mappings to a column in an existing HBase table.
alter 'table', { NAME => 'column_family_1', METADATA => { 'DATA_ELEMENT:credit_card'=>'CC_NUMBER', 'DATA_ELEMENT:name'=>'TOK_CUSTOMER_NAME' } }
Parameters:
table: Name of the table.column_family_1: Name of the column family.METADATA: Data associated with the column family.DATA_ELEMENT: Contains the column qualifier name. In the example, the column qualifier names credit_card and name, correspond to data elements CC_NUMBER and TOK_CUSTOMER_NAME respectively.
Inserting Protected Data into a Protected Table
Users can ingest protected data into a protected table in HBase using the BYPASS_COPROCESSOR flag. If the BYPASS_COPROCESSOR flag is set while inserting data in the HBase table, then the Protegrity coprocessor for HBase is bypassed. The following command bypasses the Protegrity coprocessor for HBase and ingests protected data into an HBase table.
put 'table', 'row_2', 'column_family:credit_card', '3603144224586181', {ATTRIBUTES => {'BYPASS_COPROCESSOR'=>'1'}}
Parameters:
table: Name of the table.column_family: Name of the column family.METADATA: Data associated with the column family.ATTRIBUTES: Additional parameters to consider when ingesting the protected data. In the example, the flag to bypass the Protegrity coprocessor for HBase is set.
Retrieving Protected Data from a Table
If users need to retrieve protected data from an HBase table, then they need to set the BYPASS_COPROCESSOR flag to retrieve the data. This is necessary to retain the protected data as is since HBase performs protects and unprotects the data transparently. The following command bypasses the Protegrity coprocessor for HBase and retrieves protected data from an HBase table.
scan 'table', { ATTRIBUTES => {'BYPASS_COPROCESSOR'=>'1'}}
Parameters
table: Name of the table.ATTRIBUTES: Additional parameters to consider when ingesting the protected data. In the example, the flag to bypass the Protegrity coprocessor for HBase is set.
Hadoop provides shell commands to ingest, extract, and display the data in an HBase table.
Warning: If you are using the HBase shell, it is not recommended to use Format Preserving Encryption (FPE). If you are using HBase Java API (Byte APIs), then ensure that the encoding, which is used to convert the string input data to bytes is set in the PTY_CHARSET operation attribute as shown in the following sections.
put
This command ingests the data provided by the user in protected form, using the configured data elements, into the required row and column of an HBase table. You can use this command to ingest data into all the columns for the required row of the HBase table.
For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar. For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
put '<table_name>','<row_number>', '<column_family>:<column_name>', '<data>'
If the data bytes are not in UTF-8 encoding, then ensure to set the PTY_CHARSET attribute:
put '<table_name>','<row_number>', '<column_family>:<column_name>', '<data>', {ATTRIBUTES => {'PTY_CHARSET' => '<charset>'}}
The
charsetcan be UTF-8, UTF-16LE or UTF-16BE.
Put put = new Put(inputString.getBytes("<charset>"));
put.setAttribute("PTY_CHARSET", Bytes.toBytes("<charset>"));
// <charset> can be UTF-8, UTF-16LE or UTF-16BE
Parameters:
table_name: Specifies the name of the table.row_number: Specifies the number of the row in the HBase table.column_family: Specifies the name of the column family.
get
This command displays the protected data from the required row and column of an HBase table in the cleartext form. You can use this command to display the data contained in all the columns of the required row of the HBase table.
get '<table_name>','<row_number>', '<column_family>:<column_name>'
If the data bytes are not in the UTF-8 encoding, then ensure to set the PTY_CHARSET attribute:
get '<table_name>', '<row_number>', {COLUMN => '<column_family>:<column_name>', ATTRIBUTES => {'PTY_CHARSET' => '<charset>'}}
The
charsetcan be UTF-8, UTF-16LE or UTF-16BE.
Get get = new Get();
get.setAttribute("PTY_CHARSET", Bytes.toBytes("<charset>"));
// <charset> can be UTF-8, UTF-16LE or UTF-16BE
Parameters:
table_name: Specifies the name of the table.row_number: Specifies the number of the row in the HBase table.column_family: Specifies the name of the column family.
Ensure that the logged in user has the permissions to view the protected data in cleartext form. If the user does not have the permissions to view the protected data, then only the protected data appears.
scan
This command displays the data from the HBase table in the protected or unprotected form.
Scan scan = new Scan();
scan.setAttribute("PTY_CHARSET", Bytes.toBytes("<charset>"));
// <charset> can be UTF-8, UTF-16LE or UTF-16BE
You can use the following commands to view the data:
Protected Data:
scan '<table_name>', { ATTRIBUTES => {'BYPASS_COPROCESSOR'=>'1'}}Unprotected Data:
scan '<table_name>'If the data bytes are not in UTF-8 encoding, then ensure to set the PTY_CHARSET attribute:
scan '<table_name>', {ATTRIBUTES => {'PTY_CHARSET' => '<charset>'}}The
charsetcan be UTF-8, UTF-16LE or UTF-16BE.
Parameters:
table_name: Specifies the name of the table.ATTRIBUTES: Specifies the additional parameters to consider when displaying the protected or unprotected data.
Ensure that the logged in user has the permissions to unprotect the protected data. If the user does not have the permissions to unprotect the protected data, then only the protected data appears.
5 - Impala UDFs
This section explains the Impala protector, the UDFs provided, and the commands for protecting and unprotecting data in an Impala table.
Overview of the Impala Protector
Impala is an MPP SQL query engine for querying the data stored in a cluster. The Protegrity Impala protector extends the functionality of the Impala query engine and provides UDFs which protect or unprotect the data as it is stored or retrieved.
Impala Protector Usage
The Protegrity Impala protector provides UDFs for protecting data using encryption or tokenization, and unprotecting data by using decryption or detokenization.
Ensure that the /user/impala path exists in HDFS with the Impala supergroup permissions. To verify the path, use the following command:
# hadoop fs –ls /user
Creating the /user/impala path in Impala with Supergroup permissions
If the /user/impala path does not exist or does not have supergroup permissions, then perform the following steps.
To create the
/user/impaladirectory in HDFS, run the following command:# sudo –u hdfs hadoop –mkdir /user/impalaTo assign Impala supergroup permissions to the
/user/impalapath, run the following command:# sudo –u hdfs hadoop –chown –R impala:supergroup /user/impala
Inserting Data from a File into a Table
To insert data from a file into an Impala table, ensure that the required user permissions for the directory path in HDFS are assigned for the Impala table.
Preparing the environment for the basic_sample.csv file
- To assign permissions to the path where data from the
basic_sample.csvfile needs to be copied, run the following command:sudo -u hdfs hadoop fs -chown root:root /tmp/basic_sample/sample/ - To copy the
basic_sample.csvfile into HDFS, run the following command:hdfs dfs -put basic_sample.csv /tmp/basic_sample/sample/ - To verify the presence of the
basic_sample.csvfile in the HDFS path, run the following command:hdfs dfs -ls /tmp/basic_sample/sample/ - To assign permissions for Impala to the path where the
basic_sample.csvfile is located, run the following command:sudo -u hdfs hadoop fs -chown impala:supergroup /path/
Populating the table sample_table from the basic_sample_data.csv file
You can use the following command populate the basic_sample table with the data from the basic_sample_data.csv file:
create table sample_table(colname1 colname1_format, colname2 colname2_format, colname3 colname3_format) row format delimited fields terminated by ',';
LOAD DATA INPATH '/tmp/basic_sample/sample/basic_sample.csv' INTO TABLE sample_table;
Parameters:
sample_table: Name of the Impala table created to load the data from the input CSV file from the required path.colname1, colname2, colname3: Name of the columns.colname1_format, colname2_format, colname3_format: The data types contained in the respective columns. The data types can only be of typesSTRING,INT,DOUBLE, orFLOAT.ATTRIBUTES: Additional parameters to consider when ingesting the data. In the example, the row format is delimited using the ‘,’ character because the row format in the input file is comma separated. If the input file is tab separated, then the the row format is delimited using ‘\t’.
Protecting Existing Data
To protect existing data, you must define the mappings between the columns and their respective data elements in the data security policy. The following commands ingest cleartext data from the basic_sample table to the basic_sample_protected table in protected form using Impala UDFs.
create table basic_sample_protected (colname1 colname1_format, colname2 colname2_format, colname3 colname3_format);
insert into basic_sample_protected(colname1, colname2, colname3) select ID,pty_stringins(colname1, dataElement1),pty_stringins(colname2, dataElement2),pty_stringins(colname3, dataElement3) from basic_sample;
Parameters:
basic_sample_protected: Table to store protected data.colname1, colname2, colname3: Name of the columns.dataElement1, dataElement2, dataElement3: The data elements corresponding to the columns.basic_sample: Table containing the original data in cleartext form.
Unprotecting Protected Data
To unprotect the protected data, you must specify the name of the table which contains the protected data, the table which would store the unprotected data, and the columns and their respective data elements. Ensure that the user performing the task has permissions to unprotect the data as required in the data security policy. The following commands unprotect the protected data in a table and stores the data in cleartext form in to a different table, if the user has the required permissions.
create table table_unprotected (colname1 colname1_format, colname2 colname2_format, colname3 colname3_format);
insert into table_unprotected (colname1, colname2, colname3) select ID,pty_stringsel(colname1,dataElement1), pty_stringsel(colname2, dataElement2),pty_stringsel(colname3, dataElement3) from table_protected;
Parameters:
table_unprotected: Table to store unprotected data.colname1, colname2, colname3: Name of the columns.dataElement1, dataElement2, dataElement3: The data elements corresponding to the columns.table_protected: Table containing protected data.
Retrieving Data from a Table
To retrieve data from a table, you must have access to the table. The following command displays the data contained in the table.
select * from table;
Parameters:
table: Name of the table.
Impala UDFs
pty_GetVersion()
The UDF returns the PepImpala version.
Signature:
pty_getversion()
Parameters:
- None
Result:
- The UDF returns the PepImpala version.
Example:
select pty_GetVersion();
pty_GetVersionExtended()
The UDF returns the extended version information.
Signature:
pty_getversionextended();
Parameters:
- None
Result:
- The UDF returns a string in the following format:where,
Impala: <1>; CORE: <2>;- Is the PepImpala version
- Is the Core library version
Example:
select pty_getversionextended();
pty_WhoAmI()
The UDF returns the logged in user name.
Signature:
pty_WhoAmI()
Parameters:
- None
Result:
- The UDF returns the logged in user name.
Example:
select pty_WhoAmI();
pty_StringEnc()
The UDF returns the encrypted value for a column containing String format data.
Signature:
pty_StringEnc(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to encrypt in the table.dataElement: Specifies the name of the data element to encrypt the string value.
Result:
- The UDF returns the
stringvalue.
Example:
select pty_StringEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_StringEnc() | No |
| No | Yes | Yes | Yes |
pty_StringDec()
The UDF returns the decrypted value for a column containing String format data.
Signature:
pty_StringDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to decrypt in the table.dataElement: Is the variable specifying the unprotection method.
Result:
- The UDF returns the
stringvalue.
Example:
select pty_StringDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_StringDec() | No |
| No | Yes | Yes | Yes |
pty_StringIns()
The UDF returns the tokenized value for a column containing String format data.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer to the section Date and Datetime tokenization.
Signature:
pty_StringIns(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to tokenize in the table.dataElement: Specifies the name of the data element to protect the string value.
Result:
- The UDF returns the tokenized
stringvalue.
Example:
select pty_StringIns(column_name, 'TOK_NAME') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_StringIns() |
| No | Yes | Yes | Yes | Yes |
pty_StringSel()
The UDF returns the detokenized value for a column containing String format data.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
pty_StringSel(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to detokenize in the table.dataElement: Specifies the name of the data element to unprotect the string value.
Result:
- The UDF returns the detokenized
stringvalue.
Example:
select pty_StringSel(column_name, 'TOK_NAME') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_StringSel() |
| No | Yes | Yes | Yes | Yes |
pty_UnicodeStringIns()
The UDF returns the tokenized value for a column containing String (Unicode) format data.
Signature:
pty_UnicodeStringIns(data string, dataElement string)
Parameters:
data: Specifies the column name of thestring (Unicode)format data to tokenize in the table.dataElement: Specifies the name of the data element to protect thestring (Unicode)value.
Warning: This UDF should be used only if you want to tokenize Unicode data in Impala, and migrate the tokenized data from Impala to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Result:
- The UDF returns the protected
stringvalue.
Example:
select pty_UnicodeStringIns(column_name, 'Token_unicode') from temp_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_UnicodeStringIns() | - Unicode (Legacy) - Unicode (Base64) | No | No | Yes | No | Yes |
pty_UnicodeStringSel()
The UDF unprotects the existing protected String value.
Signature:
pty_UnicodeStringSel(data string, dataElement string)
Parameters:
data: Specifies the column name of the string format data to detokenize in the table.varchar dataElement: Specifies the name of data element to unprotect thestringvalue.
Warning: This UDF should be used only if you want to tokenize Unicode data in Teradata using the Protegrity Database Protector, and migrate the tokenized data from a Teradata database to Impala and detokenize the data using the Protegrity Big Data Protector for Impala. Ensure that you use this UDF with a Unicode tokenization data element only.
Result:
- The UDF returns the detokenized
string(Unicode) value.
Example:
select pty_UnicodeStringSel(column_name, 'Token_unicode') from temp_table;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_UnicodeStringSel() | - Unicode (Legacy) - Unicode (Base64) | No | No | Yes | No | Yes |
pty_UnicodeStringFPEIns()
The UDF returns the encrypted value for a column containing String (Unicode) format data with Format Preserving Encryption (FPE) as the protection method.
Note: Ensure that you use this UDF with an FPE data element only.
Warning: The pty_UnicodeStringFPEIns() UDF will be deprecated from the future releases. This UDF is retained in this build for backward compatibility purposes only.
Signature:
pty_UnicodeStringFPEIns(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to encrypt in the table.dataElement: Specifies the name of the FPE data element to protect the string value.
Result:
- The UDF returns the
stringvalue.
Example:
SELECT pty_unicodestringfpeins(column_name,'<DataElement>') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_UnicodeStringFPEIns() | No | No | FPE (All) | Yes | No | Yes |
pty_UnicodeStringFPESel()
The UDF unprotects the existing encrypted String value that was encrypted using the FPE enabled data element.
Note: Ensure that you use this UDF with an FPE data element only.
Warning: The pty_UnicodeStringFPESel() UDF will be deprecated from the future releases. This UDF is retained in this build for backward compatibility purposes only.
Signature:
pty_UnicodeStringFPESel(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to decrypt in the table.varchar dataElement: Is the variable specifying the detokenization method. Note: Ensure that the FPE data element used to tokenize and detokenize the data is same.
Result:
- The UDF returns the decrypted
string(Unicode) value.
Example:
select pty_unicodestringfpesel(NAME,'<DataElement>') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_UnicodeStringFPESel() | No | No | FPE (All) | Yes | No | Yes |
pty_IntegerEnc()
The UDF returns an encrypted value for a column containing Integer format data.
Signature:
pty_IntegerEnc(data integer, dataElement string)
Parameters:
data: Specifies the column name of the data to encrypt in the table.dataElement: Specifies the name of the data element to encrypt the integer value.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_IntegerEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_IntegerEnc() | No |
| No | Yes | No | Yes |
pty_IntegerDec()
The UDF returns the decrypted value for a column containing Integer format data.
Signature:
pty_IntegerDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to decrypt in the table.dataElement: Specifies the name of the data element to decrypt the integer value.
Result:
- The UDF returns an
integervalue.
Example:
select pty_IntegerDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_IntegerDec() | No |
| No | Yes | No | Yes |
pty_IntegerIns()
The UDF returns the tokenized value for a column containing Integer format data.
Signature:
pty_IntegerIns(data integer, dataElement string)
Parameters:
data: Specifies the column name of the data to tokenize in the table.dataElement: Specifies the name of the data element to protect the integer value.
Result:
- The UDF returns the tokenized
integervalue.
Example:
select pty_IntegerIns(column_name,'integer_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_IntegerIns() | Integer (4 Bytes) | No | No | Yes | No | Yes |
pty_IntegerSel()
The UDF returns the detokenized value for a column containing Integer format data.
Signature:
pty_IntegerSel(data integer, dataElement string)
Parameters:
data: Specifies the column name of the data to detokenize in the table.dataElement: Specifies the name of the data element to unprotect the integer value.
Result:
- The UDF returns the detokenized
integervalue.
Example:
select pty_IntegerSel(column_name,'integer_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_IntegerSel() | Integer (4 Bytes) | No | No | Yes | No | Yes |
pty_FloatEnc()
The UDF returns the encrypted value for a column containing Float format data.
Signature:
pty_FloatEnc(data float, dataElement string)
Parameters:
data: Specifies the column name of the data to encrypt in the table.dataElement: Specifies the name of the data element to encrypt the float value.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_FloatEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_FloatEnc() | No |
| No | Yes | No | Yes |
pty_FloatDec()
The UDF returns the decrypted value for a column containing Float format data.
Signature:
pty_FloatDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data to decrypt in the table.dataElement: Specifies the name of the data element to decrypt the float value.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_FloatDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_FloatDec() | No |
| No | Yes | No | Yes |
pty_FloatIns()
The UDF returns the tokenized value for a column containing Float format data.
Signature:
pty_FloatIns(data float, dataElement string)
Parameters:
data: Specifies the column name of the data to tokenize in the table.dataElement: Specifies the name of the data element to protect the float value.
Result:
- The UDF returns the tokenized
floatvalue.
Example:
select pty_FloatIns(cast(12.3 as float), 'no_enc');
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element would return an error mentioning that the operation is not supported for that data type. If you want to tokenize the Float column, then load the Float column into a String column and use the pty_StringIns() UDF to tokenize the column. For more information about pty_StringIns() UDF, refer section pty_StringIns().
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_FloatIns() | No | No | No | Yes | No | Yes |
pty_FloatSel()
The UDF returns the detokenized value for a column containing Float format data.
Signature:
pty_FloatSel(data float, dataElement string)
Parameters:
data: Specifies the column name of the data to detokenize in the table.dataElement: Specifies the name of the data element to unprotect the float value.
Result:
- The UDF returns the detokenized
floatvalue.
Example:
select pty_FloatSel(tokenized_value, 'no_enc');
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element would return an error mentioning that the operation is not supported for that data type.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_FloatSel() | No | No | No | Yes | No | Yes |
pty_DoubleEnc()
The UDF returns the encrypted value for a column containing Double format data.
Signature:
pty_DoubleEnc(data double, dataElement string)
Parameters:
data: Specifies thedoubledata column to encrypt in the table.dataElement: Specifies the name of the data element to encrypt the double value.
Result:
- The UDF returns a
string.
Example:
select pty_DoubleEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_DoubleEnc() | No |
| No | Yes | No | Yes |
pty_DoubleDec()
The UDF returns the decrypted value for a column containing Double format data.
Signature:
Pty_DoubleDec(data string, dataElement string)
Parameters:
data: Specifies thedoubledata column to decrypt in the table.dataElement: Specifies the name of the data element to decrypt thedoublevalue.
Result:
- The UDF returns a
doublevalue.
Example:
select pty_DoubleDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_DoubleDec() | No |
| No | Yes | No | Yes |
pty_DoubleIns()
The UDF returns the tokenized value for a column containing Double format data.
Signature:
pty_DoubleIns(data double, dataElement string)
Parameters:
data: Specifies the column name of the data to tokenize in the table.dataElement: Specifies the name of the data element to protect the double value.
Result:
- The UDF returns the
doublevalue.
Example:
select pty_DoubleIns(cast(1.2 as double), 'no_enc');
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element would return an error mentioning that the operation is not supported for that data type. If you want to tokenize the Double column, then load the Double column into a String column and use the pty_StringIns() UDF to tokenize the column. For more information about pty_StringIns() UDF, refer pty_StringIns().
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_DoubleIns() | No | No | No | Yes | No | Yes |
pty_DoubleSel()
The UDF returns the detokenized value for a column containing Double format data.
Signature:
pty_DoubleSel(data double, dataElement string)
Parameters:
data: Specifies the column name of the data to detokenize in the table.dataElement: Specifies the name of the data element to unprotect the double value.
Result:
- The UDF Returns the detokenized
doublevalue.
Example:
select pty_DoubleSel(tokenized_value, 'no_enc');
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element would return an error mentioning that the operation is not supported for that data type.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_DoubleSel() | No | No | No | Yes | No | Yes |
pty_SmallIntEnc()
The UDF returns the encrypted value for a column containing SmallInt format data.
Signature:
pty_SmallIntEnc(data SmallInt, dataElement string)
Parameters:
data: Specifies the column name of the data to encrypt in the table.dataElement: Specifies the name of the data element to encrypt theSmallIntvalue.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_SmallIntEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_SmallIntEnc() | No |
| No | Yes | No | Yes |
pty_SmallIntDec()
The UDF returns the decrypted value for a column containing SmallInt format data.
Signature:
pty_SmallIntDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data, to decrypt, in the table.dataElement: Specifies the name of the data element to decrypt theSmallIntvalue.
Result:
- The UDF returns a
SmallIntvalue.
Example:
select pty_SmallIntDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_SmallIntDec() | No |
| No | Yes | No | Yes |
pty_SmallIntIns()
The UDF returns the tokenized value for a column containing SmallInt format data.
Signature:
pty_SmallIntIns(data SmallInt, dataElement string)
Parameters:
data: Specifies the column name of the data, to tokenize, in the table.dataElement: Specifies the name of the data element to protect theSmallIntvalue.
Result:
- The UDF returns the tokenized
SmallIntvalue.
Example:
select pty_SmallIntIns(column_name,'integer_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_SmallIntIns() | Integer (2 Bytes) | No | No | Yes | No | Yes |
pty_SmallIntSel()
The UDF the detokenized value for a column containing SmallInt format data.
Signature:
pty_SmallIntSel(data SmallInt, dataElement string)
Parameters:
data: Specifies the column name of the data, to detokenize, in the table.dataElement: Specifies the name of the data element to unprotect theSmallIntvalue.
Result:
- The UDF returns the detokenized
SmallIntvalue.
Example:
select pty_SmallIntSel(column_name,'integer_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_SmallIntSel() | Integer (2 Bytes) | No | No | Yes | No | Yes |
pty_BigIntEnc()
The UDF returns the encrypted value for a column containing BigInt format data.
Signature:
pty_BigIntEnc(data BigInt, dataElement string)
Parameters:
data: Specifies the column name of the data, to encrypt, in the table.dataElement: Specifies the name of the data element to encrypt theBigIntvalue.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_BigIntEnc(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_BigIntEnc() | No |
| No | Yes | No | Yes |
pty_BigIntDec()
The UDF returns the decrypted value for a column containing BigInt format data.
Signature:
pty_BigIntDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data, to decrypt, in the table.dataElement: Specifies the name of the data element to decrypt theBigIntvalue.
Result:
- The UDF returns a
BigIntvalue.
Example:
select pty_BigIntDec(column_name,'enc_3des') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_BigIntDec() | No |
| No | Yes | No | Yes |
pty_BigIntIns()
The UDF returns the tokenized value for a column containing BigInt format data.
Signature:
pty_BigIntIns(data BigInt, dataElement string)
Parameters:
data: Specifies the column name of the data, to tokenize, in the table.dataElement: Specifies the name of the data element to protect theBigIntvalue.
Result:
- The UDF returns the tokenized
BigIntvalue.
Example:
select pty_BigIntIns(column_name,'BigInt_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_BigIntIns() | Integer (8 Bytes) | No | No | Yes | No | Yes |
pty_BigIntSel()
The UDF returns the detokenized value for a column containing BigInt format data.
Signature:
pty_BigIntSel(data BigInt, dataElement string)
Parameters:
data: Specifies the column name of the data, to detokenize, in the table.dataElement: Specifies the name of the data element to unprotect theBigIntvalue.
Result:
- The UDF returns the detokenized
BigIntvalue.
Example:
select pty_BigIntSel(column_name,'BigInt_de') from table_name;
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_BigIntSel() | Integer (8 Bytes) | No | No | Yes | No | Yes |
pty_DateEnc()
The UDF returns the encrypted value for a column containing Date format data.
Signature:
pty_DateEnc(data Date, dataElement string)
Parameters:
data: Specifies the column name of the data, to encrypt, in the table.dataElement: Specifies the name of the data element to encypt thedatevalue.
Result:
- The UDF returns a
stringvalue.
Example:
select pty_DateEnc(column_name,'enc_3des') from table_name;
Note: For the Date UDFs:
- Impala supports the date range from
0001-01-01to9999-12-31. - Protegrity supports the date range from
0600-01-01to3337-11-27.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_DateEnc() | No |
| No | Yes | No | Yes |
pty_DateDec()
The UDF returns the decrypted value for a column containing Date format data.
Signature:
pty_DateDec(data string, dataElement string)
Parameters:
data: Specifies the column name of the data, to decrypt, in the table.dataElement: Specifies the name of the data element to decypt thedatevalue.
Result:
- The UDF returns the
Datevalue.
Example:
select pty_DateDec(column_name,'enc_3des') from table_name;
Note: For the Date UDFs:
- Impala supports the date range from
0001-01-01to9999-12-31. - Protegrity supports the date range from
0600-01-01to3337-11-27.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| pty_DateDec() | No |
| No | Yes | No | Yes |
pty_DateIns()
The UDF returns the tokenized value for a column containing Date format data.
Signature:
pty_DateIns(data Date, dataElement string)
Parameters:
data: Specifies the column name of the data, to tokenize, in the table.dataElement: Specifies the name of the data element to protect thedatevalue.
Result:
- The UDF returns the tokenized
Datevalue
Example:
select pty_DateIns(column_name,'Date_de') from table_name;
Note: For the Date UDFs:
- Impala supports the date range from
0001-01-01to9999-12-31. - Protegrity supports the date range from
0600-01-01to3337-11-27.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_DateIns() | Date Data Elements | No | No | Yes | No | Yes |
pty_DateSel()
The UDF returns the detokenized value for a column containing Date format data.
Signature:
pty_DateSel(data Date, dataElement string)
Parameters:
data: Specifies the column name of the data, to detokenize, in the table.dataElement: Specifies the name of the data element to unprotect thedatevalue.
Result:
- The UDF returns the detokenized
Datevalue.
Example:
select pty_DateSel(column_name,'Date_de') from table_name;
Note: For the Date UDFs:
- Impala supports the date range from
0001-01-01to9999-12-31. - Protegrity supports the date range from
0600-01-01to3337-11-27.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| pty_DateSel() | Date Data Elements | No | No | Yes | No | Yes |
6 - Spark Java APIs
All the Spark Java APIs that are available for protection and unprotection in Big Data Protector to build secure Big Data applications are listed here.
Spark is an execution engine that carries out batch processing of jobs in-memory and handles a wider range of computational workloads. In addition to processing a batch of stored data, Spark is capable of manipulating data in real time.
Spark leverages the physical memory of the Hadoop system. It utilizes the Resilient Distributed Datasets (RDDs) to store the data in-memory and lowers latency, if the data fits in the memory size. The data is saved on the hard drive only if required. RDDs being the basic units of abstraction and computation in Spark, you can use the Spark protection and unprotection APIs to perform transformation operations on an RDD.
If you want to use the Spark Protector API in a Spark Java job, then you must implement the function interface as per the Spark Java programming specifications. Subsequently, you can use it in the required transformation of an RDD to tokenize the data.
Overview of the Spark Protector
The Protegrity Spark protector extends the functionality of the Spark engine and provides APIs that protect or unprotect the data as it is stored or retrieved.
Spark Protector Usage
The Protegrity Spark protector provides APIs for protecting and reprotecting the data using encryption or tokenization, and unprotecting data by using decryption or detokenization. Note: Ensure that you configure the Spark protector after installing the Big Data Protector.
Spark Scala
The Protegrity Spark protector (Java) can be used with Scala to protect the data by using encryption or tokenization. You can also use it with Scala to unprotect the data using decryption or detokenization.
Sample Code Usage for Spark (Scala)
The Spark protector sample program, described in this section, is an example on how to use the Protegrity Spark protector APIs with Scala.
The sample program utilizes the following three Scala classes for protecting and unprotecting data:
ProtectData.scala– This main class creates the Spark context object and calls the DataLoader class for reading cleartext data.UnProtectData.scala- This main class creates the Spark Context object and calls the DataLoader class for reading protected data.DataLoader.scala- This loader class fetches the input from the input path, calls the ProtectFunction to protect the data, and stores the protected data as output in the output path. In addition, it fetches the input from the protected path, calls the UnProtectFunction to unprotect the data, and stores the cleartext content as output.
The following functions perform protection for every new line in the input or unprotection for every new line in the output.
ProtectFunction- This class calls the Spark protector for every new line specified in the input to protect data.UnProtectFunction- This class calls the Spark protector for every new line specified in the input to unprotect data.
Main Job Class for Protect Operation – ProtectData.scala
ProtectData.scala
package com.protegrity.samples.spark.scala
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object ProtectData {
def main(args: Array[String]) {
// create a SparkContext object, which tells Spark how to access a cluster.
val sparkContext = new SparkContext(new SparkConf())
// create the new object for class DataLoader
val protector = new DataLoader(sparkContext)
// Call writeProtectedData method which read clear data from input Path i.e (args[0]) and
write data in output path after protect operation
protector.writeProtectedData(args(0), args(1), ",")
}
}
Main Job Class for Unprotect Operation – UnProtectData.scala
UnProtectData.scala
package com.protegrity.samples.spark.scala
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object UnProtectData {
def main(args: Array[String]) {
val sparkContext = new SparkContext(new SparkConf())
val protector = new DataLoader(sparkContext)
protector.unprotectData(args(0), args(1), ",")
}
}
Utility to call Protect or Unprotect Function – DataLoader.scala
DataLoader.scala
package com.protegrity.samples.spark.scala
import org.apache.log4j.Logger
import org.apache.spark.SparkContext
object DataLoader {
private val logger = Logger.getLogger(classOf[DataLoader])
}
/**
* A Data loader utility for reading & writing protected and un-protected data
*/
class DataLoader(private var sparkContext: SparkContext) {
private var data_element_names: Array[String] = Array("TOK_NAME", "TOK_PHONE",
"TOK_CREDIT_CARD", "TOK_AMOUNT")
private var appid: String = sparkContext.getConf.getAppId
/**
* Writes protected data to the output path delimited by the input delimiter
*
* @param inputPath - path of the input employee info file
* @param outputPath - path where the output should be saved
* @param delim - denotes the delimiter between the fields in the file
*/
def writeProtectedData(inputPath: String, outputPath: String, delim: String) {
// read lines from the input path & create RDD
val rdd = sparkContext.textFile(inputPath)
//import ProtectFunction
import com.protegrity.samples.spark.scala.ProtectFunction._
//call ProtectFunction on rdd
rdd.ProtectFunction(delim, appid, data_element_names, outputPath)
}
/**
* Reads protected data from the input path delimited by the input delimiter
*
* @param protectedInputPath - path of the protected employee data
* @param unprotectedOutputPath - output path where unprotected data should be stored.
* @param delim
*/
def unprotectData(protectedInputPath: String, unprotectedOutputPath: String, delim: String)
{
// read lines from the protectedInputPath & create RDD
val protectedRdd = sparkContext.textFile(protectedInputPath)
//import UnProtectFunction
import com.protegrity.samples.spark.scala.UnProtectFunction._
//call UnprotectFunction on rdd
protectedRdd.UnprotectFunction(delim, appid, data_element_names, unprotectedOutputPath)
}
}
ProtectFunction.scala
package com.protegrity.samples.spark.scala
import java.util.ArrayList
import org.apache.spark.rdd.RDD
import com.protegrity.spark.Protector
import com.protegrity.spark.PtySparkProtector
object ProtectFunction {
/*Defining this class as implicit,so that we can add new functionality to an RDD on the fly.
implicits are lexically bounded i.e If we import this class, then only we can use it's
functions otherwise not*/
implicit class Protect(rdd: RDD[String]) {
def ProtectFunction(delim: String, appid: String, dataElement: Array[String],
protectoutputpath: String) =
{
val protectedRDD = rdd.map { line =>
// splits the input seperated by delimiter in the line
val splits = line.split(delim)
// store first split in protectedString as we are not going to protect first split.
var protectedString = splits(0)
// Initialize input size
val input = Array.ofDim[String](splits.length)
// Initialize output size
val output = Array.ofDim[String](splits.length)
// Initialize errorList
val errorList = new ArrayList[Integer]()
// create the new object for class ptySparkProtector
var protector: Protector = new PtySparkProtector(appid)
// Iterate through the splits and call protect operation
for (i <- 1 until splits.length) {
input(i) = splits(i)
// To protect data, call protect method with parameter dataElement, errorList,
input array and output array.output will be stored in output[]
protector.protect(dataElement(i - 1), errorList, input, output)
//Apppend output with protectedString
protectedString += delim + output(i)
}
protectedString
}
// Save protectedRDD into output path
protectedRDD.saveAsTextFile(protectoutputpath)
}
}
}
UnprotectFunction.scala
package com.protegrity.samples.spark.scala
import java.util.ArrayList
import org.apache.spark.rdd.RDD
import com.protegrity.spark.Protector
import com.protegrity.spark.PtySparkProtector
object UnProtectFunction {
/*Defining this class as implicit,so that we can add new functionality to an RDD on the fly.
implicits are lexically bounded i.e If we import this class, then only we can use it's functions otherwise not*/
implicit class Unprotect(protectedRDD: RDD[String]) {
def UnprotectFunction(delim: String, appid: String, dataElement: Array[String], unprotectoutputpath: String) =
{
val unprotectedRDD = protectedRDD.map { line =>
// splits the input seperated by delimiter in the line
val splits = line.split(delim)
// store first split in unprotectedString
var unprotectedString = splits(0)
// Initialize input size
val input = Array.ofDim[String](splits.length)
// Initialize output size
val output = Array.ofDim[String](splits.length)
// Initialize errorList
val errorList = new ArrayList[Integer]()
// create the object for class ptySparkProtector
var protector: Protector = new PtySparkProtector(appid)
// Iterate through the splits and call unprotect operation
for (i <- 1 until splits.length) {
input(i) = splits(i)
// To unprotect data, call unprotect method with parameter dataElement, errorList, input array and output array.output will be stored in output[]
protector.unprotect(dataElement(i - 1), errorList, input, output)
//Apppend output with protectedString
unprotectedString += delim + output(i)
}
unprotectedString
}
// Save unprotectedRDD into output path
unprotectedRDD.saveAsTextFile(unprotectoutputpath)
}
}
}
Spark APIs and supported protection methods
The following table lists the Spark APIs, the input and output data types, and the supported Protection Methods:
| Operation | Input | Output | Protection Method Supported |
|---|---|---|---|
| Protect | Byte | Byte | Tokenization, Encryption, No Encyption, CUSP |
| Protect | Short | Short | Tokenization, No Encyption |
| Protect | Short | Byte | Encryption, CUSP |
| Protect | Int | Int | Tokenization, No Encyption |
| Protect | Int | Byte | Encryption, CUSP |
| Protect | Long | Long | Tokenization, No Encyption |
| Protect | Long | Byte | Encryption, CUSP |
| Protect | Float | Float | Tokenization, No Encyption |
| Protect | Float | Byte | Encryption, CUSP |
| Protect | Double | Double | Tokenization, No Encyption |
| Protect | Double | Byte | Encryption, CUSP |
| Protect | String | String | Tokenization, No Encyption |
| Protect | String | Byte | Encryption, CUSP |
| Unprotect | Byte | Byte | Tokenization, Encryption, No Encyption, CUSP |
| Unprotect | Short | Short | Tokenization, NoEncyption |
| Unprotect | Byte | Short | Encryption, CUSP |
| Unprotect | Int | Int | Tokenization, No Encyption |
| Unprotect | Byte | Int | Encryption, CUSP |
| Unprotect | Long | Long | Tokenization, No Encyption |
| Unprotect | Byte | Long | Encryption, CUSP |
| Unprotect | Float | Float | Tokenization, No Encyption |
| Unprotect | Byte | Float | Encryption, CUSP |
| Unprotect | Double | Double | Tokenization, No Encyption |
| Unprotect | Byte | Double | Encryption, CUSP |
| Unprotect | String | String | Tokenization, No Encyption |
| Unprotect | Byte | String | Encryption, CUSP |
| Reprotect | Byte | Byte | Tokenization, Encryption, CUSP |
| Reprotect | Short | Short | Tokenization |
| Reprotect | Int | Int | Tokenization |
| Reprotect | Long | Long | Tokenization |
| Reprotect | Float | Float | Tokenization |
| Reprotect | Double | Double | Tokenization |
| Reprotect | String | String | Tokenization |
Note: If a protected value is generated using Byte as both Input and Output, then only Encryption/CUSP is supported.
Loading the Cleartext Data from a File to HDFS
You must first create a sample csv file that contains the cleartext data in comma separated value
format. For example, create the basic_sample_data.csv file with the contents listed below.
| ID | Name | Phone | Credit Card | Amount |
|---|---|---|---|---|
| 928724 | Hultgren Caylor | 9823750987 | 376235139103947 | 6959123 |
| 928725 | Bourne Jose | 9823350487 | 6226600538383292 | 42964354 |
| 928726 | Sorce Hatti | 9824757883 | 6226540862865375 | 7257656 |
| 928727 | Lorie Garvey | 9913730982 | 5464987835837424 | 85447788 |
| 928728 | Belva Beeson | 9948752198 | 5539455602750205 | 59040774 |
| 928729 | Hultgren Caylor | 9823750987 | 376235139103947 | 3245234 |
| 928730 | Bourne Jose | 9823350487 | 6226600538383292 | 2300567 |
| 928731 | Lorie Garvey | 9913730982 | 5464987835837424 | 85447788 |
| 928732 | Bourne Jose | 9823350487 | 6226600538383292 | 3096233 |
| 928733 | Hultgren Caylor | 9823750987 | 376235139103947 | 5167763 |
| 928734 | Lorie Garvey | 9913730982 | 5464987835837424 | 85447788 |
To load the cleartext data from the basic_sample_data.csv file to HDFS, run the following command:
hadoop fs -put <Local_Filesystem_Path>/basic_sample_data.csv <Path_of_Cleartext_data_file>
where,
basic_sample_data.csv: Specifies the name of the file containing cleartext data.<Local_Filesystem_Path>: Specifies the directory path on the local machine where the basic_sample_data.csv file is saved.<Path_of_Cleartext_data_file>: Specifies the HDFS directory path for the file with the cleartext data.
Note: Ensure that the user who is running the command has read and write access to this location.
Protecting the Existing Data
To protect cleartext data, you must specify the name of the file, which contains the cleartext data and the name of the location that contains the file which would store the protected data. The following command reads the cleartext data from the basic_sample_data.csv file and stores it in the basic_sample_protected directory in protected form using the Spark APIs.
./spark-submit --master yarn --class com.protegrity.spark.ProtectData <PROTEGRITY_DIR>/samples/spark/lib/spark_protector_demo.jar
<Path_of_Cleartext_data_file>/basic_sample_data.csv
<Path_of_Protected_data_file>/basic_sample_protected
Note: Ensure that the user performing the task has the permissions to protect the data, as required, in the data security policy.
com.protegrity.spark.ProtectData: Specifies the Spark protector class for protecting the data.spark_protector_demo.jar: Specifies the sample.jarfile utilizing the Spark protector API to protect the data in the.csvfile. You must create this sample.jarfile by compiling the scala class files.<Path_of_Cleartext_data_file>: Specifies the HDFS directory path for the file with cleartext data.<Path_of_Protected_data_file>: Specifies the HDFS directory path for the file with protected data.basic_sample_data: Specifies the name of the file to read cleartext data.
Unprotecting the Protected Data
To unprotect the protected data, you must specify the name of the location that contains the file, which stores the protected data and the name of the location that contains the file to store the unprotected data. To retrieve the protected data from the basic_sample_protected directory and save it in the basic_sample_unprotected directory in unprotected form, use the following command.
./spark-submit --master yarn --class com.protegrity.spark.UnProtectData <PROTEGRITY_DIR>/samples/spark/lib/spark_protector_demo.jar
<Path_of_Protected_data_file>/basic_sample_protected_data <Path_of_Unprotected_data_file>/basic_sample_unprotected_data
Note: Ensure that the user performing the task has the permissions to unprotect the data, as required, in the data security policy.
where,
com.protegrity.spark.UnProtectData: Specifies the Spark protector class for unprotecting the data.spark_protector_demo.jar: Specifies the sample.jarfile utilizing the Spark protector API to unprotect the data in the.csvfile. You must create the sample.jarfile by compiling the scala class files.<Path_of_Protected_data_file>/basic_sample_protected_data: Specifies the HDFS directory path for the file with protected data.<Path_of_Protected_data_file>: Specifies the HDFS directory path for the file with protected data.<Path_of_Unprotected_data_file>/basic_sample_unprotected_data: Specifies the HDFS directory path for the file to store the unprotected data.
Retrieving the Unprotected Data from a File
To retrieve data from a file containing protected data, you must have access to the file. To view the unprotected data contained in the file, use the following command.
hadoop fs -cat <Path_of_Unprotected_data_file> /basic_sample_unprotected_data/part*
where,
<Path_of_Unprotected_data_file>/basic_sample_unprotected_data: Specifies the HDFS directory path for the file that contains the unprotected data.
getVersion()
The function returns the current version of the protector.
Signature:
public String getVersion()
Parameters:
- None
Result:
- The function returns the current version of the protector.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector(applicationId);
String version = protector.getVersion();
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to return the current version of the Spark protector.
getVersionExtended()
The function returns the extended version information of the protector.
Signature:
public String getVersionExtended()
Parameters:
- None
Result:
- The function returns a String in the following format:where,
"BDP: <1>; JcoreLite: <2>; CORE: <3>;"- Is the current version of the Protector
- Is the Jcorelite library version
- Is the Core library version
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector(applicationId);
String version = protector.getVersionExtended();
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to return the current version of the Spark protector.
checkAccess()
The function checks the access permissions of the user for the specified data element(s).
Signature:
public boolean checkAccess(String dataElement, Permission permission, String... newDataElement)
Parameters:
dataElement: Specifies the name of the data element. (old data element when checking for reprotect access).Permission: Specifies the type of the access of the user for the data element(s).newDataElement: Specifies the name of the new data element when checking for reprotect access.
Result:
- The function returns the following values:
true: If the user has access to the data element(s).false: If the user does not have access to the data element(s).
Example:
import com.protegrity.bdp.protector.BDPProtector.Permission;
String dataElement = "dataelement";
Protector protector = new PtySparkProtector("protectAppId");
boolean accessProtectType = protector.checkAccess(dataElement, Permission.PROTECT);
boolean accessReprotectType = protector.checkAccess(dataElement, Permission.REPROTECT, dataElement);
boolean accessUnprotectType = protector.checkAccess(dataElement, Permission.UNPROTECT);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to verify the access of the user for the data element(s).
hmac()
Warning: The function is marked for deprecation and will be removed from the future releases.
Warning: It is recommended to use the HMAC data element with the protect() Byte API for hashing byte array data, instead of using the hmac() API.
The function performs hashing of the data using the HMAC operation on a single data item with a data element, which is associated with HMAC. It returns the hmac value of the data with the data element.
Signature:
public byte[] hmac(String dataElement, byte[] input)
Parameters:
dataElement: Specifies the name of the data element for HMAC.data: Specifies the bytearrayof data for HMAC.
Result:
- The function returns the
Byte arrayof HMAC data.
Example:
String applicationId = sparkContext.getConf().getAppId()
Protector protector = new PtySparkProtector(applicationId);
byte[] output = protector.hmac("HMAC-SHA1", "test1".getBytes());
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring | HMAC |
|---|---|---|---|---|---|---|---|
| hmac() | No | No | No | Yes | No | Yes | Yes |
protect() - Byte array data
The function protects the data provided as an array of a byte array. The type of protection applied is defined by the data element.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, byte[][] input, byte[][] output, String... charset)
Parameters:
dataElement: Specifies the name of the data element used for protection.errorIndex: Specifies the list of the Error Index.input: Specifies an array of the byte array type that contains the data to protect.output: Specifies an array of the byte array type that contains the protected data.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Note: The Protegrity Spark protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the API that supports byte as input and provides byte as output, then data corruption might occur.
Warning: If you are using the Protect API, which accepts byte as input and provides byte as output, then ensure that when unprotecting the data, the Unprotect API, with byte as input and byte as output is utilized. In addition, ensure that the byte data being provided as input to the Protect API has been converted from a string data type only.
Result:
- The
outputvariable in the method signature contains the protected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement=”Binary”;
byte[][] input = new byte[][]{“test1”.getbytes(),”test2”.getbytes()};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output, "UTF-8");
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring | HMAC |
| protect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes | Yes |
protect() - Short array data
The function protects the short format data provided as a short array. The type of protection applied is defined by dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, short[] input, short[] output)
Parameters:
dataElement: Specifies the name of the data element used for protection.errorIndex: List of the Error Indexinput: Specifies the short array type that contains the data to protect.output: Specifies the short array type that contains the protected data.
Result:
- The
outputvariable in the method signature contains the protected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement=”short”;
short[] input = new short[] {1234, 4545};
short[] output = new short[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Short array data | Integer (2 Bytes) | No | No | Yes | No | Yes |
protect() - Short array data for encryption
The function encrypts the short format data provided as a short array. The type of encryption applied is defined by dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, short[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element used for encryption.errorIndex: List of the Error Index.input: Specifies a short array type that contains the data to be encrypted.output: Specifies an encrypted array of byte array that contains the encrypted data.
Result:
- The
outputvariable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement= "AES-256";
short[] input = new short[] {1234, 4545};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - Short array data for encryption | No |
| No | Yes | No | Yes |
protect() - Int array
The function protects the data provided as int array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, int[] input, int[] output)
Parameters:
dataElement: Specifies the name of the data element to protect the data.errorIndex: Is the list of the Error Index.input: Is anintarray of data to be protected.output: Is anintarray containing the protected data.
Result:
- The output variable in the method signature contains the protected
intdata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "int";
int[] input = new int[]{1234, 4545};
int[] output = new int[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Int array | Integer (4 Bytes) | No | No | Yes | No | Yes |
protect() - Int array data for encryption
The function encrypts the data provided as int array. The type of encryption applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, int[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element to encrypt the data.errorIndex: Is the list of the Error Index.input: Is anintarray of data to be encrypted.output: Is an array of byte array containing the encrypted data.
Result:
- The output variable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
int[] input = new int[]{1234, 4545};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - Int array data for encryption | No |
| No | Yes | No | Yes |
protect() - Long array data
The function protects the data provided as long byte array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, long[] input, long[] output)
Parameters:
dataElement: Specifies the name of the data element to protect the data.errorIndex: Is the list of the error index.input: Is thelongarray of data to be protected.output: Is thelongarray containing the protected data.
Result:
- The
outputvariable in the method signature contains the protected data
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "long";
long[] input = new long[] {1234, 4545};
long[] output = new long[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Long array data | Integer (8 Bytes) | No | No | Yes | No | Yes |
protect() - Long array data for encryption
The function encrypts the data provided as long byte array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, long[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element to encrypt the data.errorIndex: Is the list of the error index.input: Is thelongarray of data to be encrypted.output: Is an array of a byte array containing the encrypted data.
Result:
- The
outputvariable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "long";
long[] input = new long[] {1234, 4545};
long[] output = new long[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - Long array data for encryption | No |
| No | Yes | No | Yes |
protect() - Float array data
The function protects the data provided as a float array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, float[] input, float[] output)
Parameters:
dataElement: Specifies the name of the data element to protect the data.errorIndex: Is the list of the Error Index.input: Specifies thefloatarray of data to be protected.output: Specifies thefloatarray containing the protected data.
Result:
- The
outputvariable in the method signature contains the protectedfloatdata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "float";
float[] input = new float[] {123.4f, 454.5f};
float[] output = new float[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Float array data | No | No | No | Yes | No | Yes |
protect() - Float array data for encryption
The function encrypts the data provided as a float array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, float[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element to encrypt the data.errorIndex: Is the list of the Error Index.input: Specifies thefloatarray of data to be encrypted.output: Specifies the array of byte array containing the encrypted data.
Result:
- The
outputvariable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
float[] input = new float[] {123.4f, 454.5f};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - Float array data for encryption | No |
| No | Yes | No | Yes |
protect() - Double array data
The function protects the data provided as a double array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, double[] input, double[] output)
Parameters:
dataElement: Specifies the name of the data element to protect the data.errorIndex: Is the list of the error index.input: Is thedoublearray of data to be protected.output: Is thedoublearray containing the protected data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause corruption of data.
Result:
- The output variable in the method signature contains the protected
doubledata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "double";
double[] input = new double[] {123.4, 454.5};
double[] output = new double[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| protect() - Double array data | No | No | No | Yes | No | Yes |
protect() - Double array data for encryption
The function encrypts the data provided as a double array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, double[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element to encrypt the data.errorIndex: Is the list of the Error Index.input: Specifies thedoublearray of data to be encrypted.output: Specifies an array of byte array containing the encrypted data.
Result:
- The
outputvariable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
double[] input = new double[] {123.4, 454.5};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - Double array data for encryption | No |
| No | Yes | No | Yes |
protect() - String array data
The function protects the data provided as a string array. The type of protection applied is defined by the dataElement.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, String[] input, String[] output)
Parameters:
dataElement: Specifies the name of the data element to protect the data.errorIndex: Is the list of the error index.input: Is theStringarray of data to be protected.output: Is theStringarray containing the protected data.
Result:
- The output variable in the method signature contains the protected
Stringdata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AlphaNum";
String[] input = new String[] {"test1", "test2"};
String[] output = new String[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to protect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring | HMAC |
| protect() - String array data |
| No | FPE (All) | Yes | Yes | Yes | Yes |
protect() - String array data for encryption
The function encrypts the data provided as a String array. The type of protection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, String[] input, byte[][] output)
Parameters:
dataElement: Specifies the name of the data element to encrypt the data.errorIndex: Is the list of the Error Index.input: Specifies theStringarray of data to be encrypted.output: Specifies the array of byte array containing the encrypted data.
Result:
- The
outputvariable in the method signature contains the encrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
String[] input = new String[] {"test1", "test2"};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.protect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| protect() - String array data for encryption | No |
| No | Yes | No | Yes |
unprotect() - Byte array data
The function unprotects the data provided as an array of a byte array. The type of unprotection applied is defined by the dataElement.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] inputDataItems, byte[][] output, String... charset)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Specifies the list of the Error Index.input: Specifies an array of the byte array type that contains the data to unprotect.output: Specifies an array of the byte array type that contains the unprotected data.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Warning: The Protegrity Spark protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the API that supports byte as input and provides byte as output, then data corruption might occur.
Result:
- The
outputvariable in the method signature contains the unprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "Binary";
byte[][] input = new byte[][] {“test1”.getbytes(), ”test2”.getbytes()};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output, "UTF-8");
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes |
unprotect() - Short array data
The function unprotects the short format data provided as a short array. The type of protection applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, short[] input, short[] output)
Parameters:
dataElement: Specifies the name of the data element used to unprotect the data.errorIndex: List of the Error Indexinput: Specifies the short array type that contains the data to unprotect.output: Specifies the short array type that contains the unprotected data.
Result:
- The
outputvariable in the method signature contains the unprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "short";
short[] input = new short[]{1234, 4545};
short[] output = new short[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Short array data | Integer (2 Bytes) | No | No | Yes | No | Yes |
unprotect() - Short array data for decryption
The function decrypts the array of byte array to get short array. The type of encryption applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] input, short[] output)
Parameters:
dataElement: Specifies the name of the data element used to decrypt the data.errorIndex: Is the list of the Error Index.input: Specifies an array of the byte array type that contains the data to be decrypted.output: Specifies theshortarray that contains the decrypted data.
Result:
- The
outputvariable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted short array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, short[] input,
byte[][] output) throws PtySparkProtectorException;
byte[][] input = { <encrypted short array> }
short[] output = new short[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to decrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Short array data for decryption | No |
| No | Yes | No | Yes |
unprotect() - Int array data
The function unprotects the data provided as int array. The type of unprotection applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, int[] input, int[] output)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Is the list of the Error Index.input: Is anintarray of data to be unprotected.output: Is anintarray containing the unprotected data.
Result:
- The output variable in the method signature contains the unprotected
intdata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "int";
int[] input = new int[]{1234, 4545};
int[] output = new int[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Int array | Integer (4 Bytes) | No | No | Yes | No | Yes |
unprotect() - Int array data for decryption
The function decrypts an array of byte array to get an int array. The type of decryption applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] input, int[] output)
Parameters:
dataElement: Specifies the name of the data element to decrypt the data.errorIndex: Is the list of the Error Indexinput: Is an array of abytearray containing the encrypted data.output: Is anintarray containing the decrypted data.
Result:
- The output variable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted int array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, int[] input, byte[]
[] output) throws PtySparkProtectorException;
byte[][] input = {<encrypted int array>};
int[] output = new int[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to decrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Int array data for decryption | No |
| No | Yes | No | Yes |
unprotect() - Long array data
The function unprotects the data provided as long array. The type of unprotection applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, long[] input, long[] output)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Is the list of the error index.input: Is thelongarray of data to be unprotected.output: Is thelongarray containing the unprotected data.
Result:
- The
outputvariable in the method signature contains the unprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "long";
long[] input = new long[] {1234, 4545};
long[] output = new long[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Long array data | Integer (8 Bytes) | No | No | Yes | No | Yes |
unprotect() - Long array data for decryption
The function decrypts an array of byte array to get a long array. The type of decryption applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] input, long[] output)
Parameters:
dataElement: Specifies the name of the data element to decrypt the data.errorIndex: Is the list of the error index.input: Is an array of byte array of data to be decrypted.output: Is alongarray containing the decrypted data.
Result:
- The
outputvariable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted long array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, long[] input,
byte[][] output) throws PtySparkProtectorException;
byte[][] input = { <encrypted long array> };
long[] output = new long[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to decrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Long array data for decryption | No |
| No | Yes | No | Yes |
unprotect() - Float array data
The function unprotects the data provided as a float array. The type of unprotection applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, float[] input, float[] output)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Is the list of the Error Index.input: Specifies thefloatarray of data to be unprotected.output: Specifies thefloatarray containing the unprotected data.
Result:
- The
outputvariable in the method signature contains the unprotectedfloatdata.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "float";
float[] input = new float[] {123.4f, 454.5f};
float[] output = new float[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Float array data | No | No | No | Yes | No | Yes |
unprotect() - Float array data for decryption
The function decrypts an array of byte array to get a float array. The type of decryption applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] input, float[] output)
Parameters:
dataElement: Specifies the name of the data element to decrypt the data.errorIndex: Is the list of the Error Index.input: Is an array of abytearray containing the encrypted data.output: Specifies thefloatarray containing the decrypted data.
Warning: Ensure that you use the data element with either the No Encryption method or Encryption data element only. Using any other data element might cause data corruption.
Result:
- The
outputvariable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted float array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, float[] input,
byte[][] output) throws PtySparkProtectorException;
byte[][] input = { <encrypted float array> };
float[] output = new float[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to decrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Float array data for decryption | No |
| No | Yes | No | Yes |
unprotect() - Double array data
The function unprotects the data provided as a double array. The type of unprotection applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, double[] input, double[] output)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Is the list of the error index.input: Is thedoublearray of data to be unprotected.output: Is thedoublearray containing the unprotected data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause corruption of data.
Result:
- The output variable in the method signature contains the unprotected
doubledata.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "double";
double[] input = new double[] {123.4, 454.5};
double[] output = new double[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| unprotect() - Double array data | No | No | No | Yes | No | Yes |
unprotect() - Double array data for decryption
The function decrypts an array of byte array to get a double array. The type of decryption applied is defined by the dataElement.
Signature:
public void protect(String dataElement, List<Integer> errorIndex, byte[][] input, double[] output)
Parameters:
dataElement: Specifies the name of the data element to decrypt the data.errorIndex: Is the list of the Error Index.input: Specifies an array of a byte array containing the encrypted data.output: Specifies thedoublearray containing the decrypted data.
Warning: Ensure that you use the data element with either the No Encryption method or Encryption data element only. Using any other data element might cause data corruption.
Result:
- The
outputvariable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted double array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, double[] input,
byte[][] output) throws PtySparkProtectorException;
byte[][] input = { <encrypted double array> };
double[] output = new double[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to decrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - Double array data for decryption | No |
| No | Yes | No | Yes |
unprotect() - String array data
The function unprotects the data provided as a String array. The type of protection applied is defined by the dataElement.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, String[] input, String[] output)
Parameters:
dataElement: Specifies the name of the data element to unprotect the data.errorIndex: Is the list of the error index.input: Is theStringarray of data to be unprotected.output: Is theStringarray containing the unprotected data.
Result:
- The output variable in the method signature contains the unprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AlphaNum";
String[] input = new String[] {"test1", "test2"};
String[] output = new String[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to unprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - String array data |
| No | FPE (All) | Yes | Yes | Yes |
unprotect() - String array data for decryption
The function decrypts an array of byte array to get a String array. The type of protection applied is defined by the dataElement.
Signature:
public void unprotect(String dataElement, List<Integer> errorIndex, byte[][] input, String[] output)
Parameters:
dataElement: Specifies the name of the data element to decrypt the data.errorIndex: Is the list of the Error Index.input: Specifies the array of byte array containing the encrypted data.output: Specifies theStringarray containing the decrypted data.
Result:
- The
outputvariable in the method signature contains the decrypted data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String dataElement = "AES-256";
// here input is encrypted String array created using our below API
// public void protect(String dataElement, List<Integer> errorIndex, String[] input,
byte[][] output) throws PtySparkProtectorException;
byte[][] input = { <encrypted string array> };
String[] output = new String[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.unprotect(dataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to encrypt the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| unprotect() - String array data for decryption | No |
| No | Yes | No | Yes |
reprotect() - Byte array data
The function reprotects the array of byte array data, protected earlier, with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, byte[][] input, byte[][] output, String... charset)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Is an array of a byte array that contains the data to be encrypted.output: Is an array of a byte array containing the reprotected data.charset: Specifies the charset of the input data. The applicable charsets are UTF-8 (default), UTF-16LE, and UTF-16BE.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "Binary";
String newDataElement = "Binary_1";
byte[][] input = new byte[][] {"test1".getBytes(), "test2".getBytes()};
byte[][] output = new byte[input.length][];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output, "UTF-8");
Exception:
- The function throws the
PtySparkProtectorExceptionif it fails to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| reprotect() - Byte array data |
|
| FPE (All) | Yes | Yes | Yes |
reprotect() - Short array data
The function reprotects the short array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, short[] input, short[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies theshortarray of data to be reprotected.output: Specifies theshortarray containing the reprotected data.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "short";
String newDataElement = "short_1";
short[] input = new short[] {135, 136};
short[] output = new short[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Short array data | Integer (2 Bytes) | No | No | Yes | No | Yes |
reprotect() - Int array data
The function reprotects the int array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, int[] input, int[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies theintarray of data to be reprotected.output: Specifies theintarray containing the reprotected data.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "int";
String newDataElement = "int_1";
int[] input = new int[] {234,351};
int[] output = new int[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Int array data | Integer (4 Bytes) | No | No | Yes | No | Yes |
reprotect() - Long array data
The function reprotects the long array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, long[] input, long[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies thelongarray of data to be reprotected.output: Specifies thelongarray containing the reprotected data.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "long";
String newDataElement = "long_1";
long[] input = new long[] {1234, 135};
long[] output = new long[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Long array data | Integer (8 Bytes) | No | No | Yes | No | Yes |
reprotect() - Float array data
The function reprotects the float array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, float[] input, float[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies thefloatarray of data to be reprotected.output: Specifies thefloatarray containing the reprotected data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "NoEnc";
String newDataElement = "NoEnc_1";
float[] input = new float[] {23.56f, 26.43f}};
float[] output = new float[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Float array data | No | No | No | Yes | No | Yes |
reprotect() - Double array data
The function reprotects the double array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, double[] input, double[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies thedoublearray of data to be reprotected.output: Specifies thedoublearray containing the reprotected data.
Warning: Ensure that you use the data element with the No Encryption method only. Using any other data element might cause data corruption.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "NoEnc";
String newDataElement = "NoEnc_1";
double[] input = new double[] {235.5, 1235.66};
double[] output = new double[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| reprotect() - Double array data | No | No | No | Yes | No | Yes |
reprotect() - String array data
The function reprotects the String array data that was protected earlier with a different data element.
Signature:
public void reprotect(String oldDataElement, String newDataElement, List<Integer> errorIndex, String[] input, String[] output)
Parameters:
oldDataElement: Specifies the name of the data element with which data was protected earlier.newDataElement: Specifies the name of the new data element to reprotect the data.errorIndex: Specifies the list of the Error Indexinput: Specifies theStringarray of data to be reprotected.output: Specifies theStringarray containing the reprotected data.
Result:
- The
outputvariable in the method signature contains the reprotected data.
Example:
String applicationId = sparkContext.getConf().getAppId();
Protector protector = new PtySparkProtector (applicationId);
String oldDataElement = "AlphaNum";
String newDataElement = "AlphaNum_1";
String[] input = new String[] {"test1", "test2"};
String[] output = new String[input.length];
List<Integer> errorIndexList = new ArrayList<Integer>();
protector.reprotect(oldDataElement, newDataElement, errorIndexList, input, output);
Exception:
- The function throws the
PtySparkProtectorExceptionif it is unable to reprotect the data.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| reprotect() - String array data |
| No | FPE (All) | Yes | Yes | Yes |
7 - Spark SQL UDFs
All the Spark SQL UDFs that are available for protection and unprotection in Big Data Protector to build secure Big Data applications are listed here.
Introduction
The Spark SQL module provides relational data processing capabilities to Spark. The module allows you to run SQL queries with Spark programs. It contains DataFrames, which is an RDD with an associated schema, that provide support for processing structured data in Hive tables.
Spark SQL enables structured data processing and programming of RDDs providing relational and procedural processing through a DataFrame API that integrates with Spark.
Note: The example code snippets provided in this section utilize SQL queries to invoke the UDFs, after they are registered, using the sqlContext.sql() method.
DataFrames
A DataFrame is a distributed collection of data, such as RDDs, with a corresponding schema. DataFrames can be created from a wide array of sources, such as Hive tables, external databases, structured data files, or existing RDDs. It can act as a distributed SQL query engine and is equivalent to a table in a relational database that can be manipulated, similar to RDDs. To optimize execution, DataFrames support relational operations and track their schema.
SQLContext
A SQLContext is a class that is used to initialize Spark SQL. It enables applications to run SQL queries, while running SQL functions, and provides the result as a DataFrame.
HiveContext extends the functionality of SQLContext and provides capabilities to use Hive UDFs, create Hive queries, and access and modify the data in Hive tables.
The Spark SQL CLI is used to run the Hive metastore service in local mode and execute queries. When we run Spark SQL (spark-sql), which is the client for running queries in Spark, it creates a SparkContext defined as sc and HiveContext defined as sqlContext.
Inserting Data from a File into a Table
The following commands create a class named Person with columns to store data.
scala> import sqlContext.implicits._
scala> case class Person(colname1: colname1_format, colname2: colname2_format, colname3: colname3_format)
The following command reads the local sample file basic_sample_data.csv:
scala> val input = sc.textFile("file:///opt/protegrity/samples/data/basic_sample_data.csv")
The following command creates a DataFrame by mapping the RDD to the RDD [Person] object.
scala> val df = input.map(x => x.split(",")).map(p => Person(p(0).toInt, p(1), p(2), p(3))).toDF()
The following command registers the temporary table sample_table.
scala> df.registerTempTable("sample_table")
The following commands save the table sample_table to a Parquet file.
scala> import org.apache.spark.sql.SaveMode
scala> df.write.mode(SaveMode.Ignore).save("sample_table.parquet")
where,
sample_table: Specifies the name of the table created to load the data from the input CSV file from the required path.colname1, colname2, colname3: Specifies the name of the columns.colname1_format, colname2_format, colname3_format: Specifies the data types contained in the respective columns.
Protecting Existing Data
This following command creates a Spark SQL table with the protected data.
"SELECT ID, " +
"ptyProtectStr(colname1, 'dataElement1') as colname1," +
"ptyProtectStr(colname1, 'dataElement2') as colname2," +
"ptyProtectStr(colname3, 'dataElement3') as colname3," + "FROM basic_sample".registerTempTable("basic_sample_protected")
Note: Ensure that the user performing the task has the permissions to protect the data, as required, in the data security policy.
where,
basic_sample_protected: Specifies the table to store the protected data.colname1, colname2, colname3: Specifies the name of the columns.dataElement1, dataElement2, dataElement3: Specifies the data elements corresponding to the columns.basic_sample: Specifies the table containing the original data in the cleartext format.basic_sample_protected: Specifies the table to store the protected data.
Unprotecting and Viewing the Protected Data
To unprotect and view the protected data, you need to specify the name of the table which contains the protected data, and the columns and their respective data elements.
Ensure that the user performing the task has permissions to unprotect the data as required in the data security policy. The following commands unprotect the protected data from the table table_protected.
scala> drop table if exists table_unprotected;
scala> create table table_unprotected (colname1 colname1_format, colname2 colname2_format,
colname3 colname3_format) distributed randomly;
scala> sqlContext.sql(
"SELECT ID," +
"ptyUnprotectStr(colname1, 'dataElement1') as colname1," +
"ptyUnprotectStr(colname2, 'dataElement2') as colname2," +
"ptyUnprotectStr(colname3, 'dataElement3') as colname3," +
"FROM table_protected"
).show(false)
where,
ptyUnprotectStr: Is the Protegrity Spark SQL UDF to unprotect theStringdata.colname1, colname2, colname3: Specifies the names of the columns.dataElement1, dataElement2, dataElement3: Specifies the data elements corresponding to the columns.table_protected: Specifies the table containing the protected data.
Retrieving Data from a Table
To retrieve data from a table, you must have access to the table.
The following command displays the data contained in the table.
scala> sqlContext.sql("SELECT * table").show()
where,
table: Specifies the name of the table.
Calling Spark SQL UDFs from Domain Specific Language (DSL)
You can utilize the functions of the Domain-Specific Langugage (DSL) and call Spark SQL UDFs to protect or unprotect data from the Dataframe APIs. The following sample snippet describes how to call the Spark SQL UDFs from a DSL:
package com.protegrity.spark.dsl
import com.protegrity.spark.PtySparkProtectorException
import org.apache.spark.sql.{Column, DataFrame, UserDefinedFunction}
/**
* DSL API for applying protection on DataFrames implicitly.
*
* e.g
* import sqlContext.implicits._
* import com.protegrity.spark.dsl.PtySparkDSL._
* val df = sc.parallelize(List("hello", "world")).toDF()
* df.protect("_1", "AlphaNum")
* .withColumnRenamed("_1", "protected")
* .show()
*/
object PtySparkDSL {
implicit class PtySparkDSL(dataFrame: DataFrame) {
import org.apache.spark.sql.functions._
private def applyUDFOnColumns(colname: String,
dataElement: String,
func: UserDefinedFunction): Seq[Column] = {
dataFrame.schema.map { field =>
val name = field.name
if (name.equals(colname)) {
func(col(colname), lit(dataElement)).as(colname)
} else {
column(name)
}
}
}
private def applyUDFOnColumns(colname: String, oldDataElement: String, newDataElement: String, func: UserDefinedFunction): Seq[Column] = {
dataFrame.schema.map { field =>
val name = field.name
if (name.equals(colname)) {
func(col(colname), lit(oldDataElement), lit(newDataElement)).as(colname)
} else {
column(name)
}
}
}
/**
* Returns data type of input field from DataFrame
* @param colname
* @return data type of the column
*/
private def getFieldType(colname: String): String = {
try {
dataFrame.schema(colname).dataType.typeName
} catch {
case e: IllegalArgumentException =>
throw new PtySparkProtectorException(e.getMessage)
}
}
def protect(colname: String, dataElement: String): DataFrame = {
val dataType = getFieldType(colname)
val function = dataType match {
case "short" => udf(com.protegrity.spark.udf.ptyProtectShort _)
case "integer" => udf(com.protegrity.spark.udf.ptyProtectInt _)
case "long" => udf(com.protegrity.spark.udf.ptyProtectLong _)
case "float" => udf(com.protegrity.spark.udf.ptyProtectFloat _)
case "double" => udf(com.protegrity.spark.udf.ptyProtectDouble _)
case "decimal(38,18)" =>
udf(com.protegrity.spark.udf.ptyProtectDecimal _)
case "string" => udf(com.protegrity.spark.udf.ptyProtectStr _)
case "date" => udf(com.protegrity.spark.udf.ptyProtectDate _)
case "timestamp" => udf(com.protegrity.spark.udf.ptyProtectDateTime _)
case _ =>
throw new PtySparkProtectorException(
"Error!! DSL API invoked on unsupported column type - " + dataType)
}
val columns = applyUDFOnColumns(colname, dataElement, function)
dataFrame.select(columns: _*)
}
def protectUnicode(colname: String, dataElement: String): DataFrame = {
val function = udf(com.protegrity.spark.udf.ptyProtectUnicode _)
val columns = applyUDFOnColumns(colname, dataElement, function)
dataFrame.select(columns: _*)
}
def unprotect(colname: String, dataElement: String): DataFrame = {
val dataType = getFieldType(colname)
val function = dataType match {
case "short" => udf(com.protegrity.spark.udf.ptyUnprotectShort _)
case "integer" => udf(com.protegrity.spark.udf.ptyUnprotectInt _)
case "long" => udf(com.protegrity.spark.udf.ptyUnprotectLong _)
case "float" => udf(com.protegrity.spark.udf.ptyUnprotectFloat _)
case "double" => udf(com.protegrity.spark.udf.ptyUnprotectDouble _)
case "decimal(38,18)" =>
udf(com.protegrity.spark.udf.ptyUnprotectDecimal _)
case "string" => udf(com.protegrity.spark.udf.ptyUnprotectStr _)
case "date" => udf(com.protegrity.spark.udf.ptyUnprotectDate _)
case "timestamp" =>
udf(com.protegrity.spark.udf.ptyUnprotectDateTime _)
case _ =>
throw new PtySparkProtectorException(
"Error!! DSL API invoked on unsupported column type - " + dataType)
}
val columns = applyUDFOnColumns(colname, dataElement, function)
dataFrame.select(columns: _*)
}
def unprotectUnicode(colname: String, dataElement: String): DataFrame = {
val function = udf(com.protegrity.spark.udf.ptyUnprotectUnicode _)
val columns = applyUDFOnColumns(colname, dataElement, function)
dataFrame.select(columns: _*)
}
def reprotect(colname: String, oldDataElement: String, newDataElement: String): DataFrame = {
val dataType = getFieldType(colname)
val function = dataType match {
case "short" => udf(com.protegrity.spark.udf.ptyReprotectShort _)
case "integer" => udf(com.protegrity.spark.udf.ptyReprotectInt _)
case "long" => udf(com.protegrity.spark.udf.ptyReprotectLong _)
case "float" => udf(com.protegrity.spark.udf.ptyReprotectFloat _)
case "double" => udf(com.protegrity.spark.udf.ptyReprotectDouble _)
case "decimal(38,18)" =>
udf(com.protegrity.spark.udf.ptyReprotectDecimal _)
case "string" => udf(com.protegrity.spark.udf.ptyReprotectStr _)
case "date" =>
udf(com.protegrity.spark.udf.ptyReprotectDate _)
case "timestamp" =>
udf(com.protegrity.spark.udf.ptyReprotectDateTime _)
case _ =>
throw new PtySparkProtectorException(
"Error!! DSL API invoked on unsupported column type - " + dataType)
}
val columns = applyUDFOnColumns(colname, oldDataElement, newDataElement, function)
dataFrame.select(columns: _*)
}
def reprotectUnicode(colname: String, oldDataElement: String, newDataElement: String): DataFrame = {
val function = udf(com.protegrity.spark.udf.ptyReprotectUnicode _)
val columns = applyUDFOnColumns(colname, oldDataElement, newDataElement, function)
dataFrame.select(columns: _*)
}
}
}
ptyGetVersion()
The UDF returns the current version of the protector.
Signature:
ptyGetVersion()
Parameters:
- None
Result:
- The UDF returns the current version of the protector.
Example:
sqlContext.udf.register("ptyGetVersion", com.protegrity.spark.udf.ptyGetVersion _)
sqlContext.sql("select ptyGetVersion()").show()
ptyGetVersionExtended()
The UDF returns the extended version information of the protector.
Signature:
ptyGetVersionExtended()
Parameters:
- None
Result:
The UDF returns a String in the following format:
"BDP: <1>; JcoreLite: <2>; CORE: <3>;"where,
- Is the current Protector version.
- Is the Jcorelite library version.
- Is the Core library version.
Example:
sqlContext.udf.register("ptyGetVersionExtended", com.protegrity.spark.udf.ptyGetVersionExtended _)
sqlContext.sql("select ptyGetVersionExtended()").show()
ptyWhoAmI()
The UDF returns the current logged in user.
Signature:
ptyWhoAmI()
Parameters:
- None
Result:
- The UDF returns the current logged in user.
Example:
sqlContext.udf.register("ptyWhoAmI", com.protegrity.spark.udf.ptyWhoAmI _)
sqlContext.sql("select ptyWhoAmI()").show()
ptyProtectStr()
The UDF protects the string format data that is provided as an input.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer to Date and Datetime tokenization.
Signature:
ptyProtectStr(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains data in thestringformat to be protected.dataElement: Specifies the data element to protect thestringformat data.
Result:
- The UDF returns the protected
stringformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List("hello", "world")).toDF("string_col")
val protectStrUDF = sqlContext.udf
.register("ptyProtectStr", com.protegrity.spark.udf.ptyProtectStr _)
df.registerTempTable("string_test")
sqlContext
.sql( "select ptyProtectStr(string_col, 'Token_Alphanum') as protected from string_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyProtectStr() |
| No | Yes | Yes | Yes | Yes |
ptyProtectUnicode()
The UDF protects the string (Unicode) format data, which is provided as input.
Warning: This UDF should be used only if you want to tokenize the Unicode data in SparkSQL, and migrate the tokenized data from SparkSQL to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyProtectUnicode(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theString(Unicode) format to be protected.dataElement: Specifies the data element to protect thestring(Unicode) format data.
Result:
- The UDF returns the protected
stringformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List("瀚聪Marylène", "瀚聪")).toDF("unicode_col")
val protectUnicodeUDF = sqlContext.udf.register(
"ptyProtectUnicode",
com.protegrity.spark.udf.ptyProtectUnicode _)
df.registerTempTable("unicode_test")
sqlContext
.sql(
"select ptyProtectUnicode(unicode_col, 'Token_Unicode') as protected from unicode_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectUnicode() | - Unicode (Legacy) - Unicode (Base64) | No | No | Yes | No | Yes |
ptyProtectInt()
The UDF protects the integer format data, which is provided as input.
Signature:
ptyProtectInt(Int colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theintegerformat to be protected.dataElement: Specifies the data element to protect theintegerformat data.
Result:
- The UDF returns the protected
integerformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).toDF("int_col")
val protectIntUDF = sqlContext.udf.register("ptyProtectInt", com.protegrity.spark.udf.ptyProtectInt _)
df.registerTempTable("int_test")
sqlContext.sql("select ptyProtectInt(int_col, 'Token_Int') as protected from int_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectInt() | Integer (4 Bytes) | No | No | Yes | No | Yes |
ptyProtectShort()
The UDF protects the short format data, which is provided as input.
Signature:
ptyProtectShort(Short colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theshortformat to be protected.dataElement: Specifies the data element to protect theshortformat data.
Result:
- The UDF returns the protected
shortformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).map{x =>
ShortClass(x.toShort)
}.toDF("short_col")
val protectShortUDF = sqlContext.udf.register("ptyProtectShort", com.protegrity.spark.udf.ptyProtectShort _)
df.registerTempTable("short_test")
sqlContext.sql("select ptyProtectShort(short_col, 'Token_Short') as protected from short_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectShort() | Integer (2 Bytes) | No | No | Yes | No | Yes |
ptyProtectLong()
The UDF protects the long format data, which is provided as input.
Signature:
ptyProtectLong(Long colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thelongformat to be protected.dataElement: Specifies the data element to protect thelongformat data.
Result:
- The UDF returns the protected
longformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234l, 2345l)).toDF("long_col")
val protectLongUDF = sqlContext.udf
.register("ptyProtectLong", com.protegrity.spark.udf.ptyProtectLong _)
df.registerTempTable("long_test")
sqlContext
.sql("select ptyProtectLong(long_col, 'Token_Long') as protected from long_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectLong() | Integer (8 Bytes) | No | No | Yes | No | Yes |
ptyProtectDate()
The UDF protects the date format data, which is provided as input.
Signature:
ptyProtectDate(Date colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedateformat to be protected.dataElement: Specifies the data element to protect thedateformat data.
Result:
- The UDF returns the protected
dateformat data.
Example:
import sqlContext.implicits._
val d1 = Date.valueOf("2016-12-28")
val d2 = Date.valueOf("2016-12-28")
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1","date_col2")
val protectDateUDF = sqlContext.udf
.register("ptyProtectDate", com.protegrity.spark.udf.ptyProtectDate _)
df.registerTempTable("date_test")
sqlContext
.sql("select ptyProtectDate(date_col1, 'Token_Date') as protected from date_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDate() | Date | No | No | Yes | No | Yes |
ptyProtectDateTime()
The UDF protects the timestamp format data, which is provided as input.
Signature:
ptyProtectDateTime(Timestamp colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thetimestampformat to be protected.dataElement: Specifies the data element to protect thetimestampformat data.
Result:
- The UDF returns the protected
timestampformat data.
Example:
import sqlContext.implicits._
val d1 = Timestamp.valueOf("2016-12-28 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1","datetime_col2")
val protectDateTimeUDF = sqlContext.udf.register(
"ptyProtectDateTime",com.protegrity.spark.udf.ptyProtectDateTime _)
df.registerTempTable("datetime_test")
sqlContext
.sql(
"select ptyProtectDateTime(datetime_col1, 'Token_Datetime') as protected from
datetime_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDateTime() | Datetime (YYYY-MM-DD HH:MM:SS) | No | No | Yes | No | Yes |
ptyProtectFloat()
The UDF protects the float format data, which is provided as input.
Signature:
ptyProtectFloat(Float colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thefloatformat to be protected.dataElement: Specifies the data element to protect thefloatformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected
floatformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1","float_col2")
val protectFloatUDF = sqlContext.udf
.register("ptyProtectFloat", com.protegrity.spark.udf.ptyProtectFloat _)
df.registerTempTable("float_test")
sqlContext
.sql(
"select ptyProtectFloat(float_col1, 'Token_NoEncryption') as protected from float_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectFloat() | No | No | No | Yes | No | Yes |
ptyProtectDouble()
The UDF protects the double format data, which is provided as input.
Signature:
ptyProtectDouble(Double colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedoubleformat to be protected.dataElement: Specifies the data element to protect thedoubleformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected
doubleformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1","double_col2")
val protectDoubleUDF = sqlContext.udf.register(
"ptyProtectDouble",com.protegrity.spark.udf.ptyProtectDouble _)
df.registerTempTable("double_test")
sqlContext.sql("select ptyProtectDouble(double_col1, 'Token_NoEncryption') as protected from double_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDouble() | No | No | No | Yes | No | Yes |
ptyProtectDecimal()
The UDF protects the decimal format data, which is provided as input.
Signature:
ptyProtectDecimal(Decimal colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theDecimalformat to be protected.dataElement: Specifies the data element to protect theDecimalformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected
Decimalformat data.
Example:
import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1","decimal_col2")
val protectDecimalUDF = sqlContext.udf.register("ptyProtectDecimal",com.protegrity.spark.udf.ptyProtectDecimal _)
df.registerTempTable("decimal_test")
sqlContext.sql("select ptyProtectDecimal(decimal_col1, 'Token_NoEncryption') as protected from decimal_test").show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyProtectDecimal() | No | No | No | Yes | No | Yes |
ptyUnprotectStr()
The UDF unprotects the protected string format data.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
ptyUnprotectStr(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to unprotect.dataElement: Specifies the data element to unprotect thestringformat data.
Result:
- The UDF returns the unprotected
stringformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List("A2yae", "2LbRS")).toDF("string_col")
val unprotectStrUDF = sqlContext.udf
.register("ptyUnprotectStr", com.protegrity.spark.udf.ptyUnprotectStr _)
df.registerTempTable("string_test")
sqlContext
.sql(
"select ptyUnprotectStr(string_col, 'Token_Alphanum') as unprotected from string_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyUnprotectStr() |
| No | Yes | Yes | Yes | Yes |
ptyUnprotectUnicode()
The UDF unprotects the protected string format data.
Warning: This UDF should be used only if you want to tokenize the Unicode data in Teradata using the Protegrity Database Protector,and migrate the tokenized data from a Teradata database to SparkSQL and detokenize the data using the Protegrity Big Data Protector for SparkSQL. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyUnprotectUnicode(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to unprotect.dataElement: Specifies the data element to unprotect thestringformat data.
Result:
- The UDF returns the unprotected
string(Unicode) format data.
Example:
import sqlContext.implicits._
val df =
sc.parallelize(List("jmR6Dw4Tqzlw441n5qEMtMEUKsI", "Q1dwK")).toDF("unicode_col")
val unprotectUnicodeUDF = sqlContext.udf.register(
"ptyUnprotectUnicode",
com.protegrity.spark.udf.ptyUnprotectUnicode _)
df.registerTempTable("unicode_test")
sqlContext
.sql(
"select ptyUnprotectUnicode(unicode_col, 'Token_Unicode') as unprotected from
unicode_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectUnicode() | - Unicode (Legacy) - Unicode (Base64) | No | No | Yes | No | Yes |
ptyUnprotectInt()
The UDF unprotects the integer format data, which is provided as input.
Signature:
ptyUnprotectInt(Int colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in theintegerformat, to unprotect.dataElement: Specifies the data element to unprotect theintegerformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
integerformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).toDF("int_col")
val protectIntUDF = sqlContext.udf.register("ptyProtectInt", com.protegrity.spark.udf.ptyProtectInt _)
df.registerTempTable("int_test")
sqlContext.sql("select ptyProtectInt(int_col, 'Token_Int') as protected from int_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectInt() | Integer (4 Bytes) | No | No | Yes | No | Yes |
ptyUnprotectShort()
The UDF unprotects the short format data, which is provided as input.
Signature:
ptyUnprotectShort(Short colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in theshortformat, to unprotect.dataElement: Specifies the data element to unprotect theshortformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
shortformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(-24453, 1827)).map(x =>
ShortClass(x.toShort))toDF("short_col")
val unprotectShortUDF = sqlContext.udf.register("ptyUnprotectShort", com.protegrity.spark.udf.ptyUnprotectShort _)
df.registerTempTable("short_test")
sqlContext.sql("select ptyUnprotectShort(short_col, 'Token_Short') as unprotected from short_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectShort() | Integer (2 Bytes) | No | No | Yes | No | Yes |
ptyUnprotectLong()
The UDF unprotects the long format data, which is provided as input.
Signature:
ptyUnprotectLong(Long colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in thelongformat, to unprotect.dataElement: Specifies the data element to unprotect thelongformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
longformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(4960833108022315290l, -1854566784751726548l)).toDF("long_col")
val unprotectLongUDF = sqlContext.udf.register("ptyUnprotectLong", com.protegrity.spark.udf.ptyUnprotectLong _)
df.registerTempTable("long_test")
sqlContext.sql("select ptyUnprotectLong(long_col, 'Token_Long') as unprotected from long_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectLong() | Integer (8 Bytes) | No | No | Yes | No | Yes |
ptyUnprotectDate()
The UDF unprotects the date format data, which is provided as input.
Signature:
ptyUnprotectDate(Date colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in thedateformat, to unprotect.dataElement: Specifies the data element to unprotect thedateformat data.
Result:
- The UDF returns the unprotected
dateformat data.
Example:
import sqlContext.implicits._
val d1 = Date.valueOf("1881-04-07") //new Date(System.currentTimeMillis())
val d2 = Date.valueOf("2016-12-28") //new Date(System.currentTimeMillis())
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1", "date_col2")
val unprotectDateUDF = sqlContext.udf.register("ptyUnprotectDate", com.protegrity.spark.udf.ptyUnprotectDate _)
df.registerTempTable("date_test")
sqlContext.sql("select ptyUnprotectDate(date_col1, 'Token_Date') as unprotected from date_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDate() | Date | No | No | Yes | No | Yes |
ptyUnprotectDateTime()
The UDF unprotects the timestamp format data, which is provided as input.
Signature:
ptyUnprotectDateTime(Timestamp colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in thetimestampformat, to unprotect.dataElement: Specifies the data element to unprotect thetimestampformat data.
Result:
- The UDF returns the unprotected
timestampformat data.
Example:
import sqlContext.implicits._
val d1 = Timestamp.valueOf("1197-02-10 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1", "datetime_col2")
val unprotectDateTimeUDF = sqlContext.udf.register("ptyUnprotectDateTime", com.protegrity.spark.udf.ptyUnprotectDateTime _)
df.registerTempTable("datetime_test")
sqlContext.sql("select ptyUnprotectDateTime(datetime_col1, 'Token_Datetime') as unprotected from datetime_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDateTime() | Datetime (YYYY-MM-DD HH:MM:SS) | No | No | Yes | No | Yes |
ptyUnprotectFloat()
The UDF unprotects the float format data, which is provided as input.
Signature:
ptyUnprotectFloat(Float colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in thefloatformat, to unprotect.dataElement: Specifies the data element to unprotect thefloatformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
floatformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1","float_col2")
val unprotectFloatUDF = sqlContext.udf.register( "ptyUnprotectFloat", com.protegrity.spark.udf.ptyUnprotectFloat _)
df.registerTempTable("float_test")
sqlContext.sql("select ptyUnprotectFloat(float_col1, 'Token_NoEncryption') as unprotected from float_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectFloat() | No | No | No | Yes | No | Yes |
ptyUnprotectDouble()
The UDF unprotects the double format data, which is provided as input.
Signature:
ptyUnprotectDouble(Double colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in thedoubleformat, to unprotect.dataElement: Specifies the data element to unprotect thedoubleformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
doubleformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1", "double_col2'")
val unprotectDoubleUDF = sqlContext.udf.register("ptyUnprotectDouble", com.protegrity.spark.udf.ptyUnprotectDouble _)
df.registerTempTable("double_test")
sqlContext.sql("select ptyUnprotectDouble(double_col1, 'Token_NoEncryption') as unprotected from double_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDouble() | No | No | No | Yes | No | Yes |
ptyUnprotectDecimal()
The UDF unprotects the decimal format data, which is provided as input.
Signature:
ptyUnprotectDecimal(Decimal colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data, in theDecimalformat, to unprotect.dataElement: Specifies the data element to unprotect theDecimalformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: Before the ptyUnprotectDecimal() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected
Decimalformat data.
Example:
import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1","decimal_col2")
val unprotectDecimalUDF = sqlContext.udf.register("ptyUnprotectDecimal",com.protegrity.spark.udf.ptyUnprotectDecimal _)
df.registerTempTable("decimal_test")
sqlContext.sql("select ptyUnprotectDecimal(decimal_col1, 'Token_NoEncryption') as unprotected from decimal_test").show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyUnprotectDecimal() | No | No | No | Yes | No | Yes |
ptyReprotectStr()
The UDF reprotects the protected string format data, which was earlier protected using the ptyProtectStr UDF, with a different data element.
Signature:
ptyReprotectStr(String colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thestringformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
stringformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List("hello", "world")).toDF("string_col")
val reprotectStrUDF = sqlContext.udf
.register("ptyReprotectStr", com.protegrity.spark.udf.ptyReprotectStr _)
df.registerTempTable("string_test")
sqlContext
.sql("select ptyReprotectStr(string_col, 'Token_Alphanum', ' Token_Alphanum_1') as reprotected from string_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyReprotectStr() |
| No | Yes | Yes | Yes | Yes |
ptyReprotectUnicode()
The UDF reprotects the protected string format data, which was earlier protected using the ptyProtectUnicode UDF, with a different data element.
Warning: This UDF should be used only if you want to tokenize the Unicode data in SparkSQL, and migrate the tokenized data from SparkSQL to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyReprotectUnicode(String colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thestringformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
stringformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List("##Marylène", "##")).toDF("unicode_col")
val reprotectUnicodeUDF = sqlContext.udf.register( "ptyReprotectUnicode", com.protegrity.spark.udf.ptyReprotectUnicode _)
df.registerTempTable("unicode_test")
sqlContext
.sql("select ptyReprotectUnicode(unicode_col, 'Token_Unicode', 'Token_Unicode_1') as reprotected from unicode_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectUnicode() | - Unicode (Legacy) - Unicode (Base64) | No | No | Yes | No | Yes |
ptyReprotectInt()
The UDF reprotects the protected integer format data, which was earlier protected with a different data element.
Signature:
ptyReprotectInt(Int colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains theIntegerformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
Integerformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).toDF("int_col")
val reprotectIntUDF = sqlContext.udf
.register("ptyReprotectInt", com.protegrity.spark.udf.ptyReprotectInt _)
df.registerTempTable("int_test")
sqlContext
.sql("select ptyReprotectInt(int_col, 'Token_Int', ' Token_Int_1') as reprotected from int_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectInt() | Integer 4 bytes | No | No | Yes | No | Yes |
ptyReprotectShort()
The UDF reprotects the protected short format data, which was earlier protected with a different data element.
Signature:
ptyReprotectShort(Short colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains theShortformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
Shortformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).map(x =>
ShortClass(x.toShort)).toDF("short_col")
val reprotectShortUDF = sqlContext.udf.register("ptyReprotectShort", com.protegrity.spark.udf.ptyReprotectShort _)
df.registerTempTable("short_test")
sqlContext
.sql("select ptyReprotectShort(short_col, 'Token_Short', ' Token_Short_1') as reprotected from short_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectShort() | Integer 2 Bytes | No | No | Yes | No | Yes |
ptyReprotectLong()
The UDF reprotects the protected long format data, which was earlier protected with a different data element.
Signature:
ptyReprotectLong(Long colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thelongformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
longformat data.
Example:
import sqlContext.implicits._
val df = sc.parallelize(List(1234l, 2345l)).toDF("long_col")
val reprotectLongUDF = sqlContext.udf.register("ptyReprotectLong", com.protegrity.spark.udf.ptyReprotectLong _)
df.registerTempTable("long_test")
sqlContext
.sql("select ptyReprotectLong(long_col, 'Token_Long', 'Token_Long_1') as reprotected from long_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectLong() | Integer 8 Bytes | No | No | Yes | No | Yes |
ptyReprotectDate()
The UDF reprotects the protected date format data, which was earlier protected with a different data element.
Signature:
ptyReprotectDate(Date colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thedateformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
dateformat data.
Example:
import sqlContext.implicits._
val d1 = Date.valueOf("2016-12-28")
val d2 = Date.valueOf("2016-12-28")
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1", "date_col2")
val reprotectDateUDF = sqlContext.udf.register("ptyReprotectDate", com.protegrity.spark.udf.ptyReprotectDate _)
df.registerTempTable("date_test")
sqlContext.sql("select ptyReprotectDate(date_col1, 'Token_Date', 'Token_Date_1') as reprotected from date_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectDate() | Date | No | No | Yes | No | Yes |
ptyReprotectDateTime()
The UDF reprotects the protected timestamp format data, which was earlier protected with a different data element.
Signature:
ptyReprotectDateTime(Timestamp colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thetimestampformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
timestampformat data.
Example:
import sqlContext.implicits._
val d1 = Timestamp.valueOf("2016-12-28 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1", "datetime_col2")
val reprotectDateTimeUDF = sqlContext.udf.register( "ptyReprotectDateTime", com.protegrity.spark.udf.ptyReprotectDateTime _)
df.registerTempTable("datetime_test")
sqlContext
.sql("select ptyReprotectDateTime(datetime_col1, 'Token_Datetime', 'Token_Datetime_1') as reprotected from datetime_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectDateTime() | DateTime (YYYY-MM-DD HH:MM:SS) | No | No | Yes | No | Yes |
ptyReprotectFloat()
The UDF reprotects the protected float format data, which was earlier protected with a different data element.
Signature:
ptyReprotectFloat(Float colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thefloatformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected
floatformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1", "float_col2")
val reprotectFloatUDF = sqlContext.udf.register("ptyReprotectFloat", com.protegrity.spark.udf.ptyReprotectFloat _)
df.registerTempTable("float_test")
sqlContext
.sql("select ptyReprotectFloat(float_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from float_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectFloat() | No | No | No | Yes | No | Yes |
ptyReprotectDouble()
The UDF reprotects the protected double format data, which was earlier protected with a different data element.
Signature:
ptyReprotectDouble(Double colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains thedoubleformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected
doubleformat data.
Example:
import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1", "double_col2")
val reprotectDoubleUDF = sqlContext.udf.register("ptyReprotectDouble", com.protegrity.spark.udf.ptyReprotectDouble _)
df.registerTempTable("double_test")
sqlContext
.sql("select ptyReprotectDouble(double_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from double_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectDouble() | No | No | No | Yes | No | Yes |
ptyReprotectDecimal()
The UDF reprotects the protected decimal format data, which was earlier protected with a different data element.
Signature:
ptyReprotectDecimal(Decimal colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains theDecimalformat data to reprotect.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: Before the ptyReprotectDecimal() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Result:
- The UDF returns the protected
Decimalformat data.
Example:
import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1", "decimal_col2")
val reprotectDecimalUDF = sqlContext.udf.register("ptyReprotectDecimal", com.protegrity.spark.udf.ptyReprotectDecimal _)
df.registerTempTable("decimal_test")
sqlContext
.sql("select ptyReprotectDecimal(decimal_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from decimal_test")
.show(false)
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
|---|---|---|---|---|---|---|
| ptyReprotectDecimal() | No | No | No | Yes | No | Yes |
ptyStringEnc()
The UDF encrypts a string value to get binary data.
Signature:
ptyStringEnc(String input, String DataElement)
Parameters:
String input: Specifies thestringvalue to encrypt.String DataElement: Specifies the name of the data element to encrypt thestringvalue.
Result:
- The UDF returns an encrypted
binaryvalue.
Note: To store the binary output of the ptyStringEnc UDF in a string column, use the built-in Base64 Spark SQL function to convert the output encrypted bytes into a Base64 encoded string.
Example:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrEncUDF = sqlContext.udf.register("ptyStringEnc",com.protegrity.spark.udf.ptyStringEnc _)
val pepTest = sc.parallelize(List("hello", "world")).toDF("col1")
pepTest.registerTempTable("spark_clear_table")
val encr_spark = sqlContext.sql("select base64(ptyStringEnc(col1,'AES128_CRC')) as col1
spark_clear_table").toDF()
encr_spark.show()
encr_spark.registerTempTable("encrypted_spark")
Exception:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringEnc | No |
| No | Yes | No | Yes |
Guidelines to estimate the field size of the data
The encryption algorithm and the field sizes (in bytes) required by the features, such as, Key ID (KID), Initialization Vector (IV), and Integrity Check (CRC) is listed in the following table:
| Encryption Algorithm | KID (size in Bytes) | IV (size in Bytes) | CRC (size in Bytes) |
|---|---|---|---|
| AES | 16 | 16 | 4 |
| 3DES | 8 | 8 | 4 |
| CUSP_TRDES | 2 | N/A | 4 |
| CUSP_AES | 2 | N/A | 4 |
The byte sizes required by the input file and the encryption algorithm with the features selected is listed in the following table:
| Encryption Algorithm | Maximum Input size in bytes eligible for Encryption | Maximum Input size in bytes eligible for Decryption and Re-Encryption |
|---|---|---|
| 3DES | Less than <= 535000000 Approximately 512 MB | Less than <= 715120000 Approximately 682 MB |
| AES-128 | ||
| AES-256 | ||
| CUSP 3DES | ||
| CUSP AES-128 | ||
| CUSP AES-256 |
ptyStringDec()
The UDF decrypts a binary value to get string data.
Signature:
ptyStringDec(Binary input, String DataElement)
Parameters:
Binary input: Specifies the protectedBinaryvalue to unprotect.String DataElement: Specifies the name of the data element that was used to encrypt the string value, to decrypt the binary value.
Result:
- The UDF returns the decrypted
stringvalue.
Note: If you have previously stored the encrypted bytes as a Base64-encoded string, then decode them using the unbase64 Spark SQL built-in function before passing to the ptyStringDec UDF.
Example:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrDecUDF = sqlContext.udf.register("ptyStringDec",com.protegrity.spark.udf.ptyStringDec _)
val decyrpt_spark = sqlContext.sql("select ptyStringDec(unbase64(col1),'AES128_CRC') as col1 from encrypted_spark").toDF()
decyrpt_spark.show()
Exception:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringDec() | No |
| No | Yes | No | Yes |
ptyStringReEnc()
The UDF re-encrypts the Binary format encrypted data with a different data element to get another binary data.
Signature:
ptyStringReEnc(Binary input, String oldDataElement, String newDataElement)
Parameters:
Binary input: Specifies thebinaryvalue to re-encrypt.String oldDataElement: Specifies the data element that was used to encrypt the data earlier.String newDataElementt: Specifies the new data element to re-encrypt the data.
Result:
- The UDF returns the re-encrypted
binaryformat data.
Note:
- If you have previously stored the encrypted bytes as a Base64 encoded string, then decode them using the unbase64 Spark SQL built-in function before passing to the
ptyStringReEncUDF. - To store the Binary output of the
ptyStringReEncUDF in a String column, use the Base64 Spark SQL built-in function to convert the output re-encrypted bytes into a Base64 encoded string.
Example:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrReEncUDF = sqlContext.udf.register("ptyStringReEnc",com.protegrity.spark.udf.ptyStringReEnc _)
val reencyrpt_spark = sqlContext.sql("select base64(ptyStringReEnc(unbase64(col1),'AES128_CRC','AES128_CRC')) as col1 from
encrypted_spark").toDF()
reencyrpt_spark.show()
Exception:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.
Supported Protection Methods:
| Function Name | Tokenization | Encryption | FPE | No Encryption | Masking | Monitoring |
| ptyStringReEnc() | No |
| No | Yes | No | Yes |
8 - PySpark - Scala Wrapper UDFs
All the Spark Scala Wrapper UDFs that are available for protection and unprotection in Big Data Protector to build secure Big Data applications are listed here.
For each of the Spark SQL UDF in Spark SQL UDFs, a Scala UDF wrapper class is created so that it can be registered in the PySpark and invoked using the spark.sql() method.
ptyGetVersionScalaWrapper()
The UDF returns the current version of the protector.
Signature:
ptyGetVersionScalaWrapper()
Parameters:
- None
Result:
- The UDF returns the current version of the protector.
Example:
spark.udf.registerJavaFunction("ptyGetVersionScalaWrapper", "com.protegrity.spark.wrapper.ptyGetVersion")
spark.sql("select ptyGetVersionScalaWrapper()").show(truncate = False)
ptyGetVersionExtendedScalaWrapper()
The UDF returns the extended version information of the protector.
Signature:
ptyGetVersionExtendedScalaWrapper()
Parameters:
- None
Result:
- The UDF returns a String in the following format:where,
"BDP: <1>; JcoreLite: <2>; CORE: <3>;"- Is the current version of the Protector.
- Is the Jcorelite library version.
- Is the Core library version
Example:
spark.udf.registerJavaFunction("ptyGetVersionExtendedScalaWrapper","com.protegrity.spark.wrapper.ptyGetVersionExtended")
spark.sql("select ptyGetVersionExtendedScalaWrapper()").show(truncate = False)
ptyWhoAmIScalaWrapper()
The UDF returns the current logged in user.
Signature:
ptyWhoAmIScalaWrapper()
Parameters:
- None
Result:
- The UDF returns the current logged in user.
Example:
spark.udf.registerJavaFunction("ptyWhoAmIScalaWrapper", "com.protegrity.spark.wrapper.ptyWhoAmI")
spark.sql("select ptyWhoAmIScalaWrapper()").show(truncate = False)
ptyProtectStrScalaWrapper()
The UDF protects the string format data that is provided as an input.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the
input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian
Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic
Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
ptyProtectStrScalaWrapper(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to protect.dataElement: Specifies the data element to protect thestringformat data.
Result:
- The UDF returns the protected data in the
stringformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectStrScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectStr", StringType())
spark.sql("select ptyProtectStrScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectUnicodeScalaWrapper()
The UDF protects the string (Unicode) format data, which is provided as an input.
Warning: This UDF should be used only if you want to tokenize the Unicode data in PySpark, and migrate the tokenized data from Pyspark to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyProtectUnicodeScalaWrapper(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestring(Unicode) format to protect.dataElement: Specifies the data element to protect thestring(Unicode) format data.
Result:
- The UDF returns the protected data in the
stringformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectUnicodeScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectUnicode", StringType())
spark.sql("select ptyProtectUnicodeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectIntScalaWrapper()
The UDF protects the integer format data, which is provided as an input.
Signature:
ptyProtectIntScalaWrapper(Int input, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theintegerformat to protect.dataElement: Specifies the data element to protect theintegerformat data.
Result:
- The UDF returns the protected data in the
integerformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectIntScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectInt", IntegerType())
spark.sql("select ptyProtectIntScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectShortScalaWrapper()
The UDF protects the short format data, which is provided as an input.
Signature:
ptyProtectShortScalaWrapper(Short colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theshortformat to protect.dataElement: Specifies the data element to protect theshortformat data.
Result:
- The UDF returns the protected data in the
shortformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectShortScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectShort", ShortType())
spark.sql("select ptyProtectShortScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectLongScalaWrapper()
The UDF protects the long format data, which is provided as an input.
Signature:
ptyProtectLongScalaWrapper(Long colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thelongformat to protect.dataElement: Specifies the data element to protect thelongformat data.
Result:
- The UDF returns the protected data in the
longformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectLongScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectLong", LongType())
spark.sql("select ptyProtectLongScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectDateScalaWrapper()
The UDF protects the date format data, which is provided as an input.
Signature:
ptyProtectDateScalaWrapper(Date colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedateformat to protect.dataElement: Specifies the data element to protect thedateformat data.
Result:
- The UDF returns the protected data in the
dateformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectDateScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectDate", DateType())
spark.sql("select ptyProtectDateScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectDateTimeScalaWrapper()
The UDF protects the timestamp format data, which is provided as an input.
Signature:
ptyProtectDateTimeScalaWrapper(Timestamp colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thetimestampformat to protect.dataElement: Specifies the data element to protect thetimestampformat data.
Result:
- The UDF returns the protected data in the
timestampformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectDateTimeScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectDateTime", TimestampType())
spark.sql("select ptyProtectDateTimeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectFloatScalaWrapper()
The UDF protects the float format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Float data type, then convert the Float data to String data type and pass the Float converted String data type to the ptyProtectStrScalaWrapper() UDF with the Float tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Float datatype UDF with the Float input, then convert the Float to string data type and pass the Float converted string data type to ptyProtectStrScalaWrapper() UDF with the Float tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyProtectFloatScalaWrapper(Float colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thefloatformat to protect.dataElement: Specifies the data element to protect thefloatformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected data in the
floatformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectFloatScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectFloat", FloatType())
spark.sql("select ptyProtectFloatScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectDoubleScalaWrapper()
The UDF protects the double format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Double data type, then convert the Double data to String data type and pass the Double converted String data type to the ptyProtectStrScalaWrapper() UDF with the Double tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Double datatype UDF with the Double input, then convert the Double to string data type and pass the Double converted string data type to ptyProtectStrScalaWrapper() UDF with the Double tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyProtectDoubleScalaWrapper(Double colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedoubleformat to protect.dataElement: Specifies the data element to protect thedoubleformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected data in the
doubleformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectDoubleScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectDouble", DoubleType())
spark.sql("select ptyProtectDoubleScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyProtectDecimalScalaWrapper()
The UDF protects the decimal format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Decimal data type, then convert the Decimal data to String data type and pass the Decimal converted String data type to the ptyProtectStrScalaWrapper() UDF with the Decimal tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Decimal datatype UDF with the Decimal input, then convert the Decimal to string data type and pass the Decimal converted string data type to ptyProtectStrScalaWrapper() UDF with the decimal tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyProtectDecimalScalaWrapper(Decimal colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theDecimalformat to protect.dataElement: Specifies the data element to protect theDecimalformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: Before the ptyProtectDecimalScalaWrapper() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Result:
- The UDF returns the protected data in the
Decimalformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyProtectDecimalScalaWrapper", "com.protegrity.spark.wrapper.ptyProtectDecimal", DecimalType(precision=10, scale=4))
spark.sql("select ptyProtectDecimalScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectStrScalaWrapper()
The UDF unprotects the string format data, which is provided as an input.
Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer Date and Datetime tokenization.
Signature:
ptyUnprotectStrScalaWrapper(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to unprotect.dataElement: Specifies the data element to protect thestringformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the unprotected data in the
stringformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectStrScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectStr", StringType())
spark.sql("select ptyUnprotectStrScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectUnicodeScalaWrapper()
The UDF unprotects the string (unicode) format data, which is provided as an input.
Warning: This UDF should be used only if you want to tokenize the Unicode data in Teradata using the Protegrity Database Protector, and migrate the tokenized data from a Teradata database to PySpark and detokenize the data using the Protegrity Big Data Protector for PySpark. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyUnprotectUnicodeScalaWrapper(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thestring(unicode) format to unprotect.dataElement: Specifies the data element to protect thestring(unicode) format data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the unprotected data in the
string(unicode) format.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectUnicodeScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectUnicode", StringType())
spark.sql("select ptyUnprotectUnicodeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectIntScalaWrapper()
The UDF unprotects the integer format data, which is provided as an input.
Signature:
ptyUnprotectIntScalaWrapper(Int colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theintegerformat to unprotect.dataElement: Specifies the data element to protect theintegerformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected data in the
integerformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectIntScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectInt", IntegerType())
spark.sql("select ptyUnprotectIntScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectShortScalaWrapper()
The UDF unprotects the short format data, which is provided as an input.
Signature:
ptyUnprotectShortScalaWrapper(Short colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theshortformat to unprotect.dataElement: Specifies the data element to protect theshortformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected data in the
shortformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectShortScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectShort", ShortType())
spark.sql("select ptyUnprotectShortScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectLongScalaWrapper()
The UDF unprotects the long format data, which is provided as an input.
Signature:
ptyUnprotectLongScalaWrapper(Long colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thelongformat to unprotect.dataElement: Specifies the data element to protect thelongformat data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected data in the
longformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectLongScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectLong", LongType())
spark.sql("select ptyUnprotectLongScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectDateScalaWrapper()
The UDF unprotects the date format data, which is provided as an input.
Signature:
ptyUnprotectDateScalaWrapper(Date colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedateformat to unprotect.dataElement: Specifies the data element to protect thedateformat data.
Result:
- The UDF returns the unprotected data in the
dateformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectDateScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectDate", DateType())
spark.sql("select ptyUnprotectDateScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectDateTimeScalaWrapper()
The UDF unprotects the timestamp format data, which is provided as an input.
Signature:
ptyUnprotectDateTimeScalaWrapper(Timestamp colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thetimestampformat to unprotect.dataElement: Specifies the data element to protect thetimestampformat data.
Result:
- The UDF returns the unprotected data in the
timestampformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectDateTimeScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectDateTime", TimestampType())
spark.sql("select ptyUnprotectDateTimeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectFloatScalaWrapper()
The UDF unprotects the float format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Float data type, then convert the Float data to String data type and pass the Float converted String data type to the ptyProtectStrScalaWrapper() UDF with the Float tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Float datatype UDF with the Float input, then convert the Float to string data type and pass the Float converted string data type to ptyProtectStrScalaWrapper() UDF with the Float tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyUnprotectFloatScalaWrapper(Float colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thefloatformat to unprotect.dataElement: Specifies the data element to unprotect thefloatformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected data in the
floatformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectFloatScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectFloat", FloatType())
spark.sql("select ptyUnprotectFloatScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectDoubleScalaWrapper()
The UDF unprotects the double format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Double data type, then convert the Double data to String data type and pass the Double converted String data type to the ptyProtectStrScalaWrapper() UDF with the Double tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Double datatype UDF with the Double input, then convert the Double to string data type and pass the Double converted string data type to ptyProtectStrScalaWrapper() UDF with the Double tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyUnprotectDoubleScalaWrapper(Double colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in thedoubleformat to unprotect.dataElement: Specifies the data element to unprotect thedoubleformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the unprotected data in the
doubleformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectDoubleScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectDouble", DoubleType())
spark.sql("select ptyUnprotectDoubleScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyUnprotectDecimalScalaWrapper()
The UDF unprotects the decimal format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Decimal data type, then convert the Decimal data to String data type and pass the Decimal converted String data type to the ptyProtectStrScalaWrapper() UDF with the Decimal tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Decimal datatype UDF with the Decimal input, then convert the Decimal to string data type and pass the Decimal converted string data type to ptyProtectStrScalaWrapper() UDF with the decimal tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyUnprotectDecimalScalaWrapper(Decimal colName, String dataElement)
Parameters:
colName: Specifies the column that contains the data in theDecimalformat to unprotect.dataElement: Specifies the data element to unprotect theDecimalformat data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: Before the ptyProtectDecimalScalaWrapper() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.
Result:
- The UDF returns the unprotected data in the
Decimalformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyUnprotectDecimalScalaWrapper", "com.protegrity.spark.wrapper.ptyUnprotectDecimal", DecimalType(precision=10, scale=4))
spark.sql("select ptyUnprotectDecimalScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectStrScalaWrapper()
The UDF reprotects the string format protected data that was earlier protected using the ptyProtectStrScalaWrapper UDF, with a different data element.
Signature:
ptyReprotectStrScalaWrapper(String colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
stringformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectStrScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectStr", StringType())
spark.sql("select ptyReprotectStrScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectUnicodeScalaWrapper()
The UDF reprotects the string format protected data that was earlier protected using the ptyProtectUnicodeScalaWrapper UDF, with a different data element.
Warning: This UDF should be used only if you want to tokenize the Unicode data in PySpark, and migrate the tokenized data from Pyspark to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.
Signature:
ptyReprotectUnicodeScalaWrapper(String colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thestringformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
stringformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectUnicodeScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectUnicode", StringType())
spark.sql("select ptyReprotectUnicodeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectIntScalaWrapper()
The UDF reprotects the integer format protected data that was earlier protected with a different data element.
Signature:
ptyReprotectIntScalaWrapper(Int colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in theintegerformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
integerformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectIntScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectInt", IntegerType())
spark.sql("select ptyReprotectIntScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectShortScalaWrapper()
The UDF reprotects the short format protected data that was earlier protected with a different data element.
Signature:
ptyReprotectShortScalaWrapper(Short colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in theshortformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
shortformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectShortScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectShort", ShortType())
spark.sql("select ptyReprotectShortScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectLongScalaWrapper()
The UDF reprotects the long format protected data that was earlier protected with a different data element.
Signature:
ptyReprotectLongScalaWrapper(Long colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thelongformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
longformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectLongScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectLong", LongType())
spark.sql("select ptyReprotectLongScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectDateScalaWrapper()
The UDF reprotects the date format protected data that was earlier protected with a different data element.
Signature:
ptyReprotectDateScalaWrapper(Date colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thedateformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
dateformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectDateScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectDate", DateType())
spark.sql("select ptyReprotectDateScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectDateTimeScalaWrapper()
The UDF reprotects the timestamp format protected data that was earlier protected with a different data element.
Signature:
ptyReprotectDateTimeScalaWrapper(Timestamp colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thetimestampformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Result:
- The UDF returns the protected
timestampformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectDateTimeScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectDateTime", TimestampType())
spark.sql("select ptyReprotectDateTimeScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectFloatScalaWrapper()
The UDF reprotects the float format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Float data type, then convert the Float data to String data type and pass the Float converted String data type to the ptyProtectStrScalaWrapper() UDF with the Float tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Float datatype UDF with the Float input, then convert the Float to string data type and pass the Float converted string data type to ptyProtectStrScalaWrapper() UDF with the Float tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyReprotectFloatScalaWrapper(Float colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thefloatformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected data in the
floatformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectFloatScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectFloat", FloatType())
spark.sql("select ptyReprotectFloatScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectDoubleScalaWrapper()
The UDF reprotects the double format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Double data type, then convert the Double data to String data type and pass the Double converted String data type to the ptyProtectStr() UDF with the Double tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Double datatype UDF with the Double input, then convert the Double to string data type and pass the Double converted string data type to ptyProtectStr() UDF with the Double tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyReprotectDoubleScalaWrapper(Double colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in thedoubleformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Result:
- The UDF returns the protected data in the
doubleformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectDoubleScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectDouble", DoubleType())
spark.sql("select ptyReprotectDoubleScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyReprotectDecimalScalaWrapper()
The UDF reprotects the decimal format data, which is provided as an input.
Caution: The Float, Double, and Decimal UDFs will be deprecated in a future version of the Big Data Protector and should not be used.
It is recommended not to use the Float or Double or Decimal data type directly in the Float or Double or Decimal UDFs of Protegrity.
If you want to protect the Decimal data type, then convert the Decimal data to String data type and pass the Decimal converted String data type to the ptyProtectStrScalaWrapper() UDF with the Decimal tokenizer. Ensure that the right precision and scale of input data are maintained during conversion.
If there is a Decimal datatype UDF with the Decimal input, then convert the Decimal to string data type and pass the Decimal converted string data type to ptyProtectStrScalaWrapper() UDF with the decimal tokenizer.
Warning: Protegrity will not be responsible for any type of data conversion error that might occur during conversion.
Signature:
ptyReprotectDecimalScalaWrapper(Decimal colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains the data in theDecimalformat to be reprotected.oldDataElement: Specifies the data element that was used to protect the data earlier.newDataElement: Specifies the new data element that will be used to reprotect the data.
Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.
Caution: Before the ptyReprotectDecimal() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.
Result:
- The UDF returns the protected data in the
Decimalformat.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyReprotectDecimalScalaWrapper", "com.protegrity.spark.wrapper.ptyReprotectDecimal", DecimalType(precision=10, scale=4))
spark.sql("select ptyReprotectDecimalScalaWrapper(column1, 'Data_Element') from table1;").show(truncate = False)
ptyStringEncScalaWrapper()
The UDF encrypts the string value, provided as an input, to get binary data.
Signature:
ptyStringEncScalaWrapper(String colName, String dataElement)
Parameters:
colName: Specifies the column that contains data inStringformat to be encrypted.dataElement: The data element in theStringformat that will be used to encrypt the data.
Result:
- The UDF returns the encrypted binary format data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyStringEncScalaWrapper", "com.protegrity.spark.wrapper.ptyStringEnc", BinaryType())
spark.sql("select ptyStringEncScalaWrapper (column1, 'Data_Element') from table1;").show(truncate = False)
ptyStringDecScalaWrapper()
The UDF decrypts the binary value, provided as an input, to get string data.
Signature:
ptyStringDecScalaWrapper(Binary colName, String dataElement)
Parameters:
colName: Specifies the column that contains data inbinrayformat to be decrypted.dataElement: The data element in theStringformat that will be used to decrypt the data.
Result:
- The UDF returns the decrypted
stringformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyStringDecScalaWrapper", "com.protegrity.spark.wrapper.ptyStringDec", StringType())
spark.sql("select ptyStringDecScalaWrapper (column1, 'Data_Element') from table1;").show(truncate = False)
ptyStringReEncScalaWrapper()
The UDF re-encrypts the binary value, provided as an input, to get another binary data.
Signature:
ptyStringReEncScalaWrapper (Binary colName, String oldDataElement, String newDataElement)
Parameters:
colName: Specifies the column that contains data in theBinaryformat to be re-encrypted.oldDataElement: Specifies the data element name in theStringformat that was previously used to encrypt the data.newDataElement: Specifies the name of the new data element in theStringformat to re-encrypt the data.
Result:
- The UDF returns the re-encrypted
binaryformat data.
Example:
from pyspark.sql.types import *
spark.udf.registerJavaFunction("ptyStringReEncScalaWrapper", "com.protegrity.spark.wrapper.ptyStringReEnc", BinaryType())
spark.sql("select ptyStringReEncScalaWrapper (column1, 'Old_Data_Element', 'New_Data_Element' ) from table1;").show(truncate = False)
9 - Unity Catalog Batch Python UDFs
The UDFs in this section is applicable only to install and configure the Big Data Protector using the Standard Compute in Databricks. The information presented in this section will not apply to the Dedicated Compute as well as SQL Warehouse.
This version of the build only supports Unity Catalog Batch Python UDFs that use the Cloud Protect APIs. The Hive and Spark UDFs and APIs that provide native protection within the cluster nodes are not packaged in this build. If you want to use those features, please use the 9.1.0.0 builds.
pty_protect_binary()
This UDF protects the BINARY format data, which is provided as input.
Signature:
pty_protect_binary (input BINARY, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in BINARY format, which needs to be protected. |
data_element | Specifies the data element used to protect the BINARY format data. |
Returns:
This UDF returns the BINARY format data, which is protected.
Example:
SELECT pty_protect_binary(<column_with_binary_data>, "<binary_data_element>");
pty_unprotect_binary()
This UDF unprotects the protected BINARY data, which is provided as an input.
Signature:
pty_unprotect_binary (input BINARY, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in BINARY format, which needs to be unprotected. |
data_element | Specifies the data element used to unprotect the BINARY format data. |
Returns:
This UDF returns the BINARY format data, which is unprotected.
Example:
SELECT pty_unprotect_binary(<column_with_protected_binary_data>, "<binary_data_element>");
pty_protect_date()
This UDF protects the DATE format data, which is provided as input.
Signature:
pty_protect_date (input DATE, data_element STRING)
The supported DATE format is YYYY-MM-DD.
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in DATE format, which needs to be protected. |
data_element | Specifies the data element used to protect the DATE format data. |
Returns:
This UDF returns the DATE format data, which is protected.
Example:
SELECT pty_protect_date(<column_with_date_data>, "de_date");
pty_unprotect_date()
This UDF unprotects the protected DATE data, which is provided as an input.
Signature:
pty_unprotect_date (input DATE, data_element STRING)
The supported DATE format is YYYY-MM-DD.
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in DATE format, which needs to be unprotected. |
data_element | Specifies the data element used to unprotect the DATE format data. |
Returns:
This UDF returns the DATE format data, which is unprotected.
Example:
SELECT pty_unprotect_date(<column_with_protected_date_data>, "de_date");
pty_protect_int()
This UDF protects the INT format data, which is provided as input.
Signature:
pty_protect_int (input INT, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in INT format, which needs to be protected. |
data_element | Specifies the data element used to protect the INT format data. |
Returns:
This UDF returns the INT format data, which is protected.
Example:
SELECT pty_protect_int(<column_with_int_data>, "de_int4");
pty_unprotect_int()
This UDF unprotects the protected INT data, which is provided as an input.
Signature:
pty_unprotect_int (input INT, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in INT format, which needs to be unprotected. |
data_element | Specifies the data element used to unprotect the INT format data. |
Returns:
This UDF returns the INT format data, which is unprotected.
Example:
SELECT pty_unprotect_int(<column_with_protected_int_data>, "de_int4");
pty_protect_smallint()
This UDF protects the SMALLINT format data, which is provided as input.
Signature:
pty_protect_smallint (input SMALLINT, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in SMALLINT format, which needs to be protected. |
data_element | Specifies the data element used to protect the SMALLINT format data. |
Returns:
This UDF returns the SMALLINT format data, which is protected.
Example:
SELECT pty_protect_smallint(<column_with_smallint_data>, "de_int2");
pty_unprotect_smallint()
This UDF unprotects the protected SMALLINT data, which is provided as an input.
Signature:
pty_unprotect_smallint (input SMALLINT, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in SMALLINT format, which needs to be unprotected. |
data_element | Specifies the data element used to unprotect the SMALLINT format data. |
Returns:
This UDF returns the SMALLINT format data, which is unprotected.
Example:
SELECT pty_unprotect_smallint(<column_with_protected_smallint_data>, "de_int2");
pty_protect_string()
This UDF protects the STRING format data, which is provided as input.
For BIGINT, DATETIME, DECIMAL, DOUBLE, and FLOAT data types, it is recommended to use the pty_protect_string() UDF.
For example:
SELECT pty_protect_string(CAST(<column_with_input_data> AS STRING), "<data_element>");
It is recommended to use the following data elements corresponding to their input data type:
- For
BIGINTinput, use an integer data element.SELECT pty_protect_string(CAST(<column_with_bigint_data> AS STRING), "de_int8"); - For DATETIME input, use a date or date time data element.
SELECT pty_protect_string(CAST(<column_with_datetime_data> AS STRING), "de_datetime");SELECT pty_protect_string(CAST(<column_with_datetime_data> AS STRING), "de_date"); - For
DECIMALinput, use a decimal data element.SELECT pty_protect_string(CAST(<column_with_decimal_data> AS STRING), "de_decimal"); - For
DOUBLEinput, either use a decimal, numeric, or a no encryption data element.SELECT pty_protect_string(CAST(<column_with_double_data> AS STRING), "de_decimal");SELECT pty_protect_string(CAST(<column_with_double_data> AS STRING), "de_numeric"); - For
FLOATinput, either use a decimal, numeric, or a no encryption data element.SELECT pty_protect_string(CAST(<column_with_float_data> AS STRING), "de_decimal");SELECT pty_protect_string(CAST(<column_with_float_data> AS STRING), "de_numeric");
Signature:
pty_protect_string (input STRING, data_element STRING)
The UDF accepts a maximum input length of 4081 characters.
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in STRING format, which needs to be protected. |
data_element | Specifies the data element used to protect the STRING format data. |
Returns:
This UDF returns the STRING format data, which is protected.
Example:
SELECT pty_protect_string(<column_with_string_data>, "de_alphanum");
pty_unprotect_string()
This UDF unprotects the STRING format data, which is provided as input.
For BIGINT, DATETIME, DECIMAL, DOUBLE, and FLOAT data types, it is recommended to use the pty_unprotect_string() UDF.
For example:
SELECT pty_unprotect_string(CAST(<column_with_protected_data> AS STRING), "<data_element>");
It is recommended to use the following data elements corresponding to their input data type:
- For
BIGINTinput, use an integer data element.SELECT pty_unprotect_string(CAST(<column_with_protected_bigint_data> AS STRING), "de_int8"); - For DATETIME input, use a date or date time data element.
SELECT pty_unprotect_string(CAST(<column_with_protected_datetime_data> AS STRING), "de_datetime");SELECT pty_unprotect_string(CAST(<column_with_protected_datetime_data> AS STRING), "de_date"); - For
DECIMALinput, use a decimal data element.SELECT pty_unprotect_string(CAST(<column_with_protected_decimal_data> AS STRING), "de_decimal"); - For
DOUBLEinput, either use a decimal, numeric, or a no encryption data element.SELECT pty_unprotect_string(CAST(<column_with_protected_double_data> AS STRING), "de_decimal");SELECT pty_unprotect_string(CAST(<column_with_protected_double_data> AS STRING), "de_numeric"); - For
FLOATinput, either use a decimal, numeric, or a no encryption data element.SELECT pty_unprotect_string(CAST(<column_with_protected_float_data> AS STRING), "de_decimal");SELECT pty_unprotect_string(CAST(<column_with_protected_float_data> AS STRING), "de_numeric");
Signature:
pty_unprotect_string (input STRING, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in STRING format, which needs to be unprotected. |
data_element | Specifies the data element used to unprotect the STRING format data. |
Returns:
This UDF returns the STRING format data, which is unprotected.
Example:
SELECT pty_unprotect_string(<column_with_protected_string_data>, "de_alphanum");
pty_encrypt_string()
This UDF encrypts STRING format data, which is provided as input.
Signature:
pty_encrypt_string (input STRING, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains data in STRING format, which needs to be encrypted. |
data_element | Specifies the data element used to encrypt the STRING format data. |
Returns:
This UDF returns the BINARY format data, which is encrypted.
Example:
SELECT pty_encrypt_string(<column_with_string_data>, "<encryption_data_element>");
pty_decrypt_string()
This UDF decrypts the encrypted BINARY data, which is provided as an input.
Signature:
pty_decrypt_string (input BINARY, data_element STRING)
Parameters:
| Name | Description |
|---|---|
input | Specifies the column that contains the data in the BINARY format, which needs to be decrypted. |
data_element | Specifies the data element used to decrypt the BINARY format data. |
Returns:
This UDF returns the STRING format data, which is decrypted.
Example:
SELECT pty_decrypt_string(<column_with_encrypted_string_data>, "<encryption_data_element>");