Spark SQL UDFs

All the Spark SQL UDFs that are available for protection and unprotection in Big Data Protector to build secure Big Data applications are listed here.

Introduction

The Spark SQL module provides relational data processing capabilities to Spark. The module allows you to run SQL queries with Spark programs. It contains DataFrames, which is an RDD with an associated schema, that provide support for processing structured data in Hive tables.

Spark SQL enables structured data processing and programming of RDDs providing relational and procedural processing through a DataFrame API that integrates with Spark.

Note: The example code snippets provided in this section utilize SQL queries to invoke the UDFs, after they are registered, using the sqlContext.sql() method.

DataFrames

A DataFrame is a distributed collection of data, such as RDDs, with a corresponding schema. DataFrames can be created from a wide array of sources, such as Hive tables, external databases, structured data files, or existing RDDs. It can act as a distributed SQL query engine and is equivalent to a table in a relational database that can be manipulated, similar to RDDs. To optimize execution, DataFrames support relational operations and track their schema.

SQLContext

A SQLContext is a class that is used to initialize Spark SQL. It enables applications to run SQL queries, while running SQL functions, and provides the result as a DataFrame.

HiveContext extends the functionality of SQLContext and provides capabilities to use Hive UDFs, create Hive queries, and access and modify the data in Hive tables.

The Spark SQL CLI is used to run the Hive metastore service in local mode and execute queries. When we run Spark SQL (spark-sql), which is the client for running queries in Spark, it creates a SparkContext defined as sc and HiveContext defined as sqlContext.

Inserting Data from a File into a Table

The following commands create a class named Person with columns to store data.

scala> import sqlContext.implicits._
scala> case class Person(colname1: colname1_format, colname2: colname2_format, colname3: colname3_format)

The following command reads the local sample file basic_sample_data.csv:

scala> val input = sc.textFile("file:///opt/protegrity/samples/data/basic_sample_data.csv")

The following command creates a DataFrame by mapping the RDD to the RDD [Person] object.

scala> val df = input.map(x => x.split(",")).map(p => Person(p(0).toInt, p(1), p(2), p(3))).toDF()

The following command registers the temporary table sample_table.

scala> df.registerTempTable("sample_table")

The following commands save the table sample_table to a Parquet file.

scala> import org.apache.spark.sql.SaveMode
scala> df.write.mode(SaveMode.Ignore).save("sample_table.parquet")

where,

sample_table: Specifies the name of the table created to load the data from the input CSV file from the required path.
colname1, colname2, colname3: Specifies the name of the columns.
colname1_format, colname2_format, colname3_format: Specifies the data types contained in the respective columns.

Protecting Existing Data

This following command creates a Spark SQL table with the protected data.

"SELECT ID, " +
"ptyProtectStr(colname1, 'dataElement1') as colname1," +
"ptyProtectStr(colname1, 'dataElement2') as colname2," +
"ptyProtectStr(colname3, 'dataElement3') as colname3," + "FROM basic_sample".registerTempTable("basic_sample_protected")

Note: Ensure that the user performing the task has the permissions to protect the data, as required, in the data security policy.

where,

basic_sample_protected: Specifies the table to store the protected data.
colname1, colname2, colname3: Specifies the name of the columns.
dataElement1, dataElement2, dataElement3: Specifies the data elements corresponding to the columns.
basic_sample: Specifies the table containing the original data in the cleartext format.
basic_sample_protected: Specifies the table to store the protected data.

Unprotecting and Viewing the Protected Data

To unprotect and view the protected data, you need to specify the name of the table which contains the protected data, and the columns and their respective data elements.

Ensure that the user performing the task has permissions to unprotect the data as required in the data security policy. The following commands unprotect the protected data from the table table_protected.

scala> drop table if exists table_unprotected;
scala> create table table_unprotected (colname1 colname1_format, colname2 colname2_format,
colname3 colname3_format) distributed randomly;
scala> sqlContext.sql(
"SELECT ID," +
"ptyUnprotectStr(colname1, 'dataElement1') as colname1," +
"ptyUnprotectStr(colname2, 'dataElement2') as colname2," +
"ptyUnprotectStr(colname3, 'dataElement3') as colname3," +
"FROM table_protected"
).show(false)

where,

ptyUnprotectStr: Is the Protegrity Spark SQL UDF to unprotect the String data.
colname1, colname2, colname3: Specifies the names of the columns.
dataElement1, dataElement2, dataElement3: Specifies the data elements corresponding to the columns.
table_protected: Specifies the table containing the protected data.

Retrieving Data from a Table

To retrieve data from a table, you must have access to the table.

The following command displays the data contained in the table.

scala> sqlContext.sql("SELECT * table").show()

where,

table: Specifies the name of the table.

Calling Spark SQL UDFs from Domain Specific Language (DSL)

You can utilize the functions of the Domain-Specific Langugage (DSL) and call Spark SQL UDFs to protect or unprotect data from the Dataframe APIs. The following sample snippet describes how to call the Spark SQL UDFs from a DSL:

package com.protegrity.spark.dsl

import com.protegrity.spark.PtySparkProtectorException
import org.apache.spark.sql.{Column, DataFrame, UserDefinedFunction}

/**
  * DSL API for applying protection on DataFrames implicitly.
  *
  * e.g
  * import sqlContext.implicits._
  * import com.protegrity.spark.dsl.PtySparkDSL._
  * val df = sc.parallelize(List("hello", "world")).toDF()
  * df.protect("_1", "AlphaNum")
  *    .withColumnRenamed("_1", "protected")
  *    .show()
  */
object PtySparkDSL {

  implicit class PtySparkDSL(dataFrame: DataFrame) {

    import org.apache.spark.sql.functions._

    private def applyUDFOnColumns(colname: String,
                                  dataElement: String,
                                  func: UserDefinedFunction): Seq[Column] = {
      dataFrame.schema.map { field =>
        val name = field.name
        if (name.equals(colname)) {
          func(col(colname), lit(dataElement)).as(colname)
        } else {
          column(name)
        }
      }
    }

    private def applyUDFOnColumns(colname: String, oldDataElement: String, newDataElement: String, func: UserDefinedFunction): Seq[Column] = {
      dataFrame.schema.map { field =>
        val name = field.name
        if (name.equals(colname)) {
          func(col(colname), lit(oldDataElement), lit(newDataElement)).as(colname)
        } else {
          column(name)
        }
      }
    }

    /**
      * Returns data type of input field from DataFrame
      * @param colname
      * @return data type of the column
      */
    private def getFieldType(colname: String): String = {
      try {
        dataFrame.schema(colname).dataType.typeName
      } catch {
        case e: IllegalArgumentException =>
          throw new PtySparkProtectorException(e.getMessage)
      }
    }

    def protect(colname: String, dataElement: String): DataFrame = {
      val dataType = getFieldType(colname)
      val function = dataType match {
        case "short" => udf(com.protegrity.spark.udf.ptyProtectShort _)
        case "integer" => udf(com.protegrity.spark.udf.ptyProtectInt _)
        case "long" => udf(com.protegrity.spark.udf.ptyProtectLong _)
        case "float" => udf(com.protegrity.spark.udf.ptyProtectFloat _)
        case "double" => udf(com.protegrity.spark.udf.ptyProtectDouble _)
        case "decimal(38,18)" =>
          udf(com.protegrity.spark.udf.ptyProtectDecimal _)
        case "string" => udf(com.protegrity.spark.udf.ptyProtectStr _)
        case "date" => udf(com.protegrity.spark.udf.ptyProtectDate _)
        case "timestamp" => udf(com.protegrity.spark.udf.ptyProtectDateTime _)
        case _ =>
          throw new PtySparkProtectorException(
            "Error!! DSL API invoked on unsupported column type - " + dataType)
      }
      val columns = applyUDFOnColumns(colname, dataElement, function)
      dataFrame.select(columns: _*)
    }

    def protectUnicode(colname: String, dataElement: String): DataFrame = {
      val function = udf(com.protegrity.spark.udf.ptyProtectUnicode _)
      val columns = applyUDFOnColumns(colname, dataElement, function)
      dataFrame.select(columns: _*)
    }

    def unprotect(colname: String, dataElement: String): DataFrame = {
      val dataType = getFieldType(colname)
      val function = dataType match {
        case "short" => udf(com.protegrity.spark.udf.ptyUnprotectShort _)
        case "integer" => udf(com.protegrity.spark.udf.ptyUnprotectInt _)
        case "long" => udf(com.protegrity.spark.udf.ptyUnprotectLong _)
        case "float" => udf(com.protegrity.spark.udf.ptyUnprotectFloat _)
        case "double" => udf(com.protegrity.spark.udf.ptyUnprotectDouble _)
        case "decimal(38,18)" =>
          udf(com.protegrity.spark.udf.ptyUnprotectDecimal _)
        case "string" => udf(com.protegrity.spark.udf.ptyUnprotectStr _)
        case "date" => udf(com.protegrity.spark.udf.ptyUnprotectDate _)
        case "timestamp" =>
          udf(com.protegrity.spark.udf.ptyUnprotectDateTime _)
        case _ =>
          throw new PtySparkProtectorException(
            "Error!! DSL API invoked on unsupported column type - " + dataType)
      }
      val columns = applyUDFOnColumns(colname, dataElement, function)
      dataFrame.select(columns: _*)
    }

    def unprotectUnicode(colname: String, dataElement: String): DataFrame = {
      val function = udf(com.protegrity.spark.udf.ptyUnprotectUnicode _)
      val columns = applyUDFOnColumns(colname, dataElement, function)
      dataFrame.select(columns: _*)
    }

    def reprotect(colname: String, oldDataElement: String, newDataElement: String): DataFrame = {
      val dataType = getFieldType(colname)
      val function = dataType match {
        case "short" => udf(com.protegrity.spark.udf.ptyReprotectShort _)
        case "integer" => udf(com.protegrity.spark.udf.ptyReprotectInt _)
        case "long" => udf(com.protegrity.spark.udf.ptyReprotectLong _)
        case "float" => udf(com.protegrity.spark.udf.ptyReprotectFloat _)
        case "double" => udf(com.protegrity.spark.udf.ptyReprotectDouble _)
        case "decimal(38,18)" =>
          udf(com.protegrity.spark.udf.ptyReprotectDecimal _)
        case "string" => udf(com.protegrity.spark.udf.ptyReprotectStr _)
        case "date" =>
          udf(com.protegrity.spark.udf.ptyReprotectDate _)
        case "timestamp" =>
          udf(com.protegrity.spark.udf.ptyReprotectDateTime _)
        case _ =>
          throw new PtySparkProtectorException(
            "Error!! DSL API invoked on unsupported column type - " + dataType)
      }
      val columns = applyUDFOnColumns(colname, oldDataElement, newDataElement, function)
      dataFrame.select(columns: _*)
    }

def reprotectUnicode(colname: String, oldDataElement: String, newDataElement: String): DataFrame = {
  val function = udf(com.protegrity.spark.udf.ptyReprotectUnicode _)
  val columns = applyUDFOnColumns(colname, oldDataElement, newDataElement, function)
  dataFrame.select(columns: _*)
  }
  }
}

ptyGetVersion()

The UDF returns the current version of the protector.

Signature:

ptyGetVersion()

Parameters:

None

Result:

The UDF returns the current version of the protector.

Example:

sqlContext.udf.register("ptyGetVersion", com.protegrity.spark.udf.ptyGetVersion _)
sqlContext.sql("select ptyGetVersion()").show()

ptyGetVersionExtended()

The UDF returns the extended version information of the protector.

Signature:

ptyGetVersionExtended()

Parameters:

None

Result:

The UDF returns a String in the following format:
```
"BDP: <1>; JcoreLite: <2>; CORE: <3>;"
```
where,
- 1. Is the current Protector version.
- 1. Is the Jcorelite library version.
- 1. Is the Core library version.

Example:

sqlContext.udf.register("ptyGetVersionExtended", com.protegrity.spark.udf.ptyGetVersionExtended _)
sqlContext.sql("select ptyGetVersionExtended()").show()

ptyWhoAmI()

The UDF returns the current logged in user.

Signature:

ptyWhoAmI()

Parameters:

None

Result:

The UDF returns the current logged in user.

Example:

sqlContext.udf.register("ptyWhoAmI", com.protegrity.spark.udf.ptyWhoAmI _)
sqlContext.sql("select ptyWhoAmI()").show()

ptyProtectStr()

The UDF protects the string format data that is provided as an input.

Note: For Date and Datetime type of data elements, the protect API returns an invalid input data error if the input value falls between the non-existent date range from 05-OCT-1582 to 14-OCT-1582 of the Gregorian Calendar.
For more information about the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar, refer to Date and Datetime tokenization.

Signature:

ptyProtectStr(String colName, String dataElement)

Parameters:

colName : Specifies the column that contains data in the string format to be protected.
dataElement : Specifies the data element to protect the string format data.

Result:

The UDF returns the protected string format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List("hello", "world")).toDF("string_col")
val protectStrUDF = sqlContext.udf
.register("ptyProtectStr", com.protegrity.spark.udf.ptyProtectStr _)
df.registerTempTable("string_test")
sqlContext
.sql( "select ptyProtectStr(string_col, 'Token_Alphanum') as protected from string_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectStr()	Numeric (0-9) Credit Card Alpha (A-Z) Upper-case Alpha (A-Z) Alpha-Numeric (0-9, a-z, A-Z) Upper Alpha-Numeric (0-9, A-Z) Lower ASCII Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY) Datetime (YYYY-MM-DD HH:MM:SS) Decimal Unicode (Gen2) Unicode (Legacy) Unicode (Base64) Email	No	Yes	Yes	Yes	Yes

ptyProtectUnicode()

The UDF protects the string (Unicode) format data, which is provided as input.

Warning: This UDF should be used only if you want to tokenize the Unicode data in SparkSQL, and migrate the tokenized data from SparkSQL to a Teradata database and detokenize the data using the Protegrity Database Protector. Ensure that you use this UDF with a Unicode tokenization data element only.

Signature:

ptyProtectUnicode(String colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the String (Unicode) format to be protected.
dataElement: Specifies the data element to protect the string (Unicode) format data.

Result:

The UDF returns the protected string format data.

Example:

import sqlContext.implicits._

val df = sc.parallelize(List("瀚聪Marylène", "瀚聪")).toDF("unicode_col")

val protectUnicodeUDF = sqlContext.udf.register(
  "ptyProtectUnicode",
  com.protegrity.spark.udf.ptyProtectUnicode _)
  
df.registerTempTable("unicode_test")

sqlContext
  .sql(
"select ptyProtectUnicode(unicode_col, 'Token_Unicode') as protected from unicode_test")
  .show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectUnicode()	- Unicode (Legacy) - Unicode (Base64)	No	No	Yes	No	Yes

ptyProtectInt()

The UDF protects the integer format data, which is provided as input.

Signature:

ptyProtectInt(Int colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the integer format to be protected.
dataElement: Specifies the data element to protect the integer format data.

Result:

The UDF returns the protected integer format data.

Example:

import sqlContext.implicits._

val df = sc.parallelize(List(1234, 2345)).toDF("int_col")

val protectIntUDF = sqlContext.udf.register("ptyProtectInt", com.protegrity.spark.udf.ptyProtectInt _)

df.registerTempTable("int_test")

sqlContext.sql("select ptyProtectInt(int_col, 'Token_Int') as protected from int_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectInt()	Integer (4 Bytes)	No	No	Yes	No	Yes

ptyProtectShort()

The UDF protects the short format data, which is provided as input.

Signature:

ptyProtectShort(Short colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the short format to be protected.
dataElement: Specifies the data element to protect the short format data.

Result:

The UDF returns the protected short format data.

Example:

import sqlContext.implicits._

val df = sc.parallelize(List(1234, 2345)).map{x =>
ShortClass(x.toShort)
}.toDF("short_col")

val protectShortUDF = sqlContext.udf.register("ptyProtectShort", com.protegrity.spark.udf.ptyProtectShort _)

df.registerTempTable("short_test")

sqlContext.sql("select ptyProtectShort(short_col, 'Token_Short') as protected from short_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectShort()	Integer (2 Bytes)	No	No	Yes	No	Yes

ptyProtectLong()

The UDF protects the long format data, which is provided as input.

Signature:

ptyProtectLong(Long colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the long format to be protected.
dataElement: Specifies the data element to protect the long format data.

Result:

The UDF returns the protected long format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(1234l, 2345l)).toDF("long_col")
val protectLongUDF = sqlContext.udf
.register("ptyProtectLong", com.protegrity.spark.udf.ptyProtectLong _)
df.registerTempTable("long_test")
sqlContext
.sql("select ptyProtectLong(long_col, 'Token_Long') as protected from long_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectLong()	Integer (8 Bytes)	No	No	Yes	No	Yes

ptyProtectDate()

The UDF protects the date format data, which is provided as input.

Signature:

ptyProtectDate(Date colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the date format to be protected.
dataElement: Specifies the data element to protect the date format data.

Result:

The UDF returns the protected date format data.

Example:

import sqlContext.implicits._
val d1 = Date.valueOf("2016-12-28")
val d2 = Date.valueOf("2016-12-28")
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1","date_col2")
val protectDateUDF = sqlContext.udf
.register("ptyProtectDate", com.protegrity.spark.udf.ptyProtectDate _)
df.registerTempTable("date_test")
sqlContext
.sql("select ptyProtectDate(date_col1, 'Token_Date') as protected from date_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectDate()	Date	No	No	Yes	No	Yes

ptyProtectDateTime()

The UDF protects the timestamp format data, which is provided as input.

Signature:

ptyProtectDateTime(Timestamp colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the timestamp format to be protected.
dataElement: Specifies the data element to protect the timestamp format data.

Result:

The UDF returns the protected timestamp format data.

Example:

import sqlContext.implicits._
val d1 = Timestamp.valueOf("2016-12-28 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1","datetime_col2")
val protectDateTimeUDF = sqlContext.udf.register(
"ptyProtectDateTime",com.protegrity.spark.udf.ptyProtectDateTime _)
df.registerTempTable("datetime_test")
sqlContext
.sql(
"select ptyProtectDateTime(datetime_col1, 'Token_Datetime') as protected from
datetime_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectDateTime()	Datetime (YYYY-MM-DD HH:MM:SS)	No	No	Yes	No	Yes

ptyProtectFloat()

The UDF protects the float format data, which is provided as input.

Signature:

ptyProtectFloat(Float colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the float format to be protected.
dataElement: Specifies the data element to protect the float format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the protected float format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1","float_col2")
val protectFloatUDF = sqlContext.udf
.register("ptyProtectFloat", com.protegrity.spark.udf.ptyProtectFloat _)
df.registerTempTable("float_test")
sqlContext
.sql(
"select ptyProtectFloat(float_col1, 'Token_NoEncryption') as protected from float_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectFloat()	No	No	No	Yes	No	Yes

ptyProtectDouble()

The UDF protects the double format data, which is provided as input.

Signature:

ptyProtectDouble(Double colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the double format to be protected.
dataElement: Specifies the data element to protect the double format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the protected double format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1","double_col2")
val protectDoubleUDF = sqlContext.udf.register(
"ptyProtectDouble",com.protegrity.spark.udf.ptyProtectDouble _)
df.registerTempTable("double_test")
sqlContext.sql("select ptyProtectDouble(double_col1, 'Token_NoEncryption') as protected from double_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectDouble()	No	No	No	Yes	No	Yes

ptyProtectDecimal()

The UDF protects the decimal format data, which is provided as input.

Signature:

ptyProtectDecimal(Decimal colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the Decimal format to be protected.
dataElement: Specifies the data element to protect the Decimal format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the protected Decimal format data.

Example:

import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1","decimal_col2")
val protectDecimalUDF = sqlContext.udf.register("ptyProtectDecimal",com.protegrity.spark.udf.ptyProtectDecimal _)
df.registerTempTable("decimal_test")
sqlContext.sql("select ptyProtectDecimal(decimal_col1, 'Token_NoEncryption') as protected from decimal_test").show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyProtectDecimal()	No	No	No	Yes	No	Yes

ptyUnprotectStr()

The UDF unprotects the protected string format data.

Signature:

ptyUnprotectStr(String colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the string format to unprotect.
dataElement: Specifies the data element to unprotect the string format data.

Result:

The UDF returns the unprotected string format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List("A2yae", "2LbRS")).toDF("string_col")
val unprotectStrUDF = sqlContext.udf
.register("ptyUnprotectStr", com.protegrity.spark.udf.ptyUnprotectStr _)
df.registerTempTable("string_test")
sqlContext
.sql(
"select ptyUnprotectStr(string_col, 'Token_Alphanum') as unprotected from string_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectStr()	Numeric (0-9) Credit Card Alpha (A-Z) Upper-case Alpha (A-Z) Alpha-Numeric (0-9, a-z, A-Z) Upper Alpha-Numeric (0-9, A-Z) Lower ASCII Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY) Datetime (YYYY-MM-DD HH:MM:SS) Decimal Unicode (Gen2) Unicode (Legacy) Unicode (Base64) Email	No	Yes	Yes	Yes	Yes

ptyUnprotectUnicode()

The UDF unprotects the protected string format data.

Warning: This UDF should be used only if you want to tokenize the Unicode data in Teradata using the Protegrity Database Protector,and migrate the tokenized data from a Teradata database to SparkSQL and detokenize the data using the Protegrity Big Data Protector for SparkSQL. Ensure that you use this UDF with a Unicode tokenization data element only.

Signature:

ptyUnprotectUnicode(String colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data in the string format to unprotect.
dataElement: Specifies the data element to unprotect the string format data.

Result:

The UDF returns the unprotected string (Unicode) format data.

Example:

import sqlContext.implicits._
val df =
sc.parallelize(List("jmR6Dw4Tqzlw441n5qEMtMEUKsI", "Q1dwK")).toDF("unicode_col")
val unprotectUnicodeUDF = sqlContext.udf.register(
"ptyUnprotectUnicode",
com.protegrity.spark.udf.ptyUnprotectUnicode _)
df.registerTempTable("unicode_test")
sqlContext
.sql(
"select ptyUnprotectUnicode(unicode_col, 'Token_Unicode') as unprotected from
unicode_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectUnicode()	- Unicode (Legacy) - Unicode (Base64)	No	No	Yes	No	Yes

ptyUnprotectInt()

The UDF unprotects the integer format data, which is provided as input.

Signature:

ptyUnprotectInt(Int colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the integer format, to unprotect.
dataElement: Specifies the data element to unprotect the integer format data.

Caution: If an unauthorized user, with no privileges to unprotect data in the security policy, and the output value set to NULL, attempts to unprotect the protected data of Numeric type data containing Short, Int, Float, Long, Double, and Decimal format values using the respective Spark SQL UDFs, then the output is 0.

Result:

The UDF returns the unprotected integer format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).toDF("int_col")
val protectIntUDF = sqlContext.udf.register("ptyProtectInt", com.protegrity.spark.udf.ptyProtectInt _)
df.registerTempTable("int_test")
sqlContext.sql("select ptyProtectInt(int_col, 'Token_Int') as protected from int_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectInt()	Integer (4 Bytes)	No	No	Yes	No	Yes

ptyUnprotectShort()

The UDF unprotects the short format data, which is provided as input.

Signature:

ptyUnprotectShort(Short colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the short format, to unprotect.
dataElement: Specifies the data element to unprotect the short format data.

Result:

The UDF returns the unprotected short format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(-24453, 1827)).map(x =>
ShortClass(x.toShort))toDF("short_col")
val unprotectShortUDF = sqlContext.udf.register("ptyUnprotectShort", com.protegrity.spark.udf.ptyUnprotectShort _)
df.registerTempTable("short_test")
sqlContext.sql("select ptyUnprotectShort(short_col, 'Token_Short') as unprotected from short_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectShort()	Integer (2 Bytes)	No	No	Yes	No	Yes

ptyUnprotectLong()

The UDF unprotects the long format data, which is provided as input.

Signature:

ptyUnprotectLong(Long colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the long format, to unprotect.
dataElement: Specifies the data element to unprotect the long format data.

Result:

The UDF returns the unprotected long format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(4960833108022315290l, -1854566784751726548l)).toDF("long_col")
val unprotectLongUDF = sqlContext.udf.register("ptyUnprotectLong", com.protegrity.spark.udf.ptyUnprotectLong _)
df.registerTempTable("long_test")
sqlContext.sql("select ptyUnprotectLong(long_col, 'Token_Long') as unprotected from long_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectLong()	Integer (8 Bytes)	No	No	Yes	No	Yes

ptyUnprotectDate()

The UDF unprotects the date format data, which is provided as input.

Signature:

ptyUnprotectDate(Date colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the date format, to unprotect.
dataElement: Specifies the data element to unprotect the date format data.

Result:

The UDF returns the unprotected date format data.

Example:

import sqlContext.implicits._
val d1 = Date.valueOf("1881-04-07") //new Date(System.currentTimeMillis())
val d2 = Date.valueOf("2016-12-28") //new Date(System.currentTimeMillis())
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1", "date_col2")
val unprotectDateUDF = sqlContext.udf.register("ptyUnprotectDate", com.protegrity.spark.udf.ptyUnprotectDate _)
df.registerTempTable("date_test")
sqlContext.sql("select ptyUnprotectDate(date_col1, 'Token_Date') as unprotected from date_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectDate()	Date	No	No	Yes	No	Yes

ptyUnprotectDateTime()

The UDF unprotects the timestamp format data, which is provided as input.

Signature:

ptyUnprotectDateTime(Timestamp colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the timestamp format, to unprotect.
dataElement: Specifies the data element to unprotect the timestamp format data.

Result:

The UDF returns the unprotected timestamp format data.

Example:

import sqlContext.implicits._
val d1 = Timestamp.valueOf("1197-02-10 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1", "datetime_col2")
val unprotectDateTimeUDF = sqlContext.udf.register("ptyUnprotectDateTime", com.protegrity.spark.udf.ptyUnprotectDateTime _)
df.registerTempTable("datetime_test")
sqlContext.sql("select ptyUnprotectDateTime(datetime_col1, 'Token_Datetime') as unprotected from datetime_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectDateTime()	Datetime (YYYY-MM-DD HH:MM:SS)	No	No	Yes	No	Yes

ptyUnprotectFloat()

The UDF unprotects the float format data, which is provided as input.

Signature:

ptyUnprotectFloat(Float colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the float format, to unprotect.
dataElement: Specifies the data element to unprotect the float format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the unprotected float format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1","float_col2")
val unprotectFloatUDF = sqlContext.udf.register( "ptyUnprotectFloat", com.protegrity.spark.udf.ptyUnprotectFloat _)
df.registerTempTable("float_test")
sqlContext.sql("select ptyUnprotectFloat(float_col1, 'Token_NoEncryption') as unprotected from float_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectFloat()	No	No	No	Yes	No	Yes

ptyUnprotectDouble()

The UDF unprotects the double format data, which is provided as input.

Signature:

ptyUnprotectDouble(Double colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the double format, to unprotect.
dataElement: Specifies the data element to unprotect the double format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the unprotected double format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1", "double_col2'")
val unprotectDoubleUDF = sqlContext.udf.register("ptyUnprotectDouble", com.protegrity.spark.udf.ptyUnprotectDouble _)
df.registerTempTable("double_test")
sqlContext.sql("select ptyUnprotectDouble(double_col1, 'Token_NoEncryption') as unprotected from double_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectDouble()	No	No	No	Yes	No	Yes

ptyUnprotectDecimal()

The UDF unprotects the decimal format data, which is provided as input.

Signature:

ptyUnprotectDecimal(Decimal colName, String dataElement)

Parameters:

colName: Specifies the column that contains the data, in the Decimal format, to unprotect.
dataElement: Specifies the data element to unprotect the Decimal format data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Caution: Before the ptyUnprotectDecimal() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.

Result:

The UDF returns the unprotected Decimal format data.

Example:

import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1","decimal_col2")
val unprotectDecimalUDF = sqlContext.udf.register("ptyUnprotectDecimal",com.protegrity.spark.udf.ptyUnprotectDecimal _)
df.registerTempTable("decimal_test")
sqlContext.sql("select ptyUnprotectDecimal(decimal_col1, 'Token_NoEncryption') as unprotected from decimal_test").show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyUnprotectDecimal()	No	No	No	Yes	No	Yes

ptyReprotectStr()

The UDF reprotects the protected string format data, which was earlier protected using the ptyProtectStr UDF, with a different data element.

Signature:

ptyReprotectStr(String colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the string format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected string format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List("hello", "world")).toDF("string_col")
val reprotectStrUDF = sqlContext.udf
.register("ptyReprotectStr", com.protegrity.spark.udf.ptyReprotectStr _)
df.registerTempTable("string_test")
sqlContext
.sql("select ptyReprotectStr(string_col, 'Token_Alphanum', ' Token_Alphanum_1') as reprotected from string_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectStr()	Numeric (0-9) Credit Card Alpha (A-Z) Upper-case Alpha (A-Z) Alpha-Numeric (0-9, a-z, A-Z) Upper Alpha-Numeric (0-9, A-Z) Lower ASCII Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY) Datetime (YYYY-MM-DD HH:MM:SS) Decimal Unicode (Gen2) Unicode (Legacy) Unicode (Base64) Email	No	Yes	Yes	Yes	Yes

ptyReprotectUnicode()

The UDF reprotects the protected string format data, which was earlier protected using the ptyProtectUnicode UDF, with a different data element.

Signature:

ptyReprotectUnicode(String colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the string format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected string format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List("##Marylène", "##")).toDF("unicode_col")
val reprotectUnicodeUDF = sqlContext.udf.register( "ptyReprotectUnicode", com.protegrity.spark.udf.ptyReprotectUnicode _)
df.registerTempTable("unicode_test")
sqlContext
.sql("select ptyReprotectUnicode(unicode_col, 'Token_Unicode', 'Token_Unicode_1') as reprotected from unicode_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectUnicode()	- Unicode (Legacy) - Unicode (Base64)	No	No	Yes	No	Yes

ptyReprotectInt()

The UDF reprotects the protected integer format data, which was earlier protected with a different data element.

Signature:

ptyReprotectInt(Int colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the Integer format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected Integer format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).toDF("int_col")
val reprotectIntUDF = sqlContext.udf
.register("ptyReprotectInt", com.protegrity.spark.udf.ptyReprotectInt _)
df.registerTempTable("int_test")
sqlContext
.sql("select ptyReprotectInt(int_col, 'Token_Int', ' Token_Int_1') as reprotected from int_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectInt()	Integer 4 bytes	No	No	Yes	No	Yes

ptyReprotectShort()

The UDF reprotects the protected short format data, which was earlier protected with a different data element.

Signature:

ptyReprotectShort(Short colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the Short format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected Short format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(1234, 2345)).map(x =>
ShortClass(x.toShort)).toDF("short_col")
val reprotectShortUDF = sqlContext.udf.register("ptyReprotectShort", com.protegrity.spark.udf.ptyReprotectShort _)
df.registerTempTable("short_test")
sqlContext
.sql("select ptyReprotectShort(short_col, 'Token_Short', ' Token_Short_1') as reprotected from short_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectShort()	Integer 2 Bytes	No	No	Yes	No	Yes

ptyReprotectLong()

The UDF reprotects the protected long format data, which was earlier protected with a different data element.

Signature:

ptyReprotectLong(Long colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the long format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected long format data.

Example:

import sqlContext.implicits._
val df = sc.parallelize(List(1234l, 2345l)).toDF("long_col")
val reprotectLongUDF = sqlContext.udf.register("ptyReprotectLong", com.protegrity.spark.udf.ptyReprotectLong _)
df.registerTempTable("long_test")
sqlContext
.sql("select ptyReprotectLong(long_col, 'Token_Long', 'Token_Long_1') as reprotected from long_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectLong()	Integer 8 Bytes	No	No	Yes	No	Yes

ptyReprotectDate()

The UDF reprotects the protected date format data, which was earlier protected with a different data element.

Signature:

ptyReprotectDate(Date colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the date format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected date format data.

Example:

import sqlContext.implicits._
val d1 = Date.valueOf("2016-12-28")
val d2 = Date.valueOf("2016-12-28")
val df = sc.parallelize(Seq((d1, d2))).toDF("date_col1", "date_col2")
val reprotectDateUDF = sqlContext.udf.register("ptyReprotectDate", com.protegrity.spark.udf.ptyReprotectDate _)
df.registerTempTable("date_test")
sqlContext.sql("select ptyReprotectDate(date_col1, 'Token_Date', 'Token_Date_1') as reprotected from date_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectDate()	Date	No	No	Yes	No	Yes

ptyReprotectDateTime()

The UDF reprotects the protected timestamp format data, which was earlier protected with a different data element.

Signature:

ptyReprotectDateTime(Timestamp colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the timestamp format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Result:

The UDF returns the protected timestamp format data.

Example:

import sqlContext.implicits._
val d1 = Timestamp.valueOf("2016-12-28 13:09:38.104")
val d2 = Timestamp.valueOf("2016-12-29 12:09:38.104")
val df = sc.parallelize(Seq((d1, d2))).toDF("datetime_col1", "datetime_col2")
val reprotectDateTimeUDF = sqlContext.udf.register( "ptyReprotectDateTime", com.protegrity.spark.udf.ptyReprotectDateTime _)
df.registerTempTable("datetime_test")
sqlContext
.sql("select ptyReprotectDateTime(datetime_col1, 'Token_Datetime', 'Token_Datetime_1') as reprotected from datetime_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectDateTime()	DateTime (YYYY-MM-DD HH:MM:SS)	No	No	Yes	No	Yes

ptyReprotectFloat()

The UDF reprotects the protected float format data, which was earlier protected with a different data element.

Signature:

ptyReprotectFloat(Float colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the float format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the protected float format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345f, 1343.3345f))
val df = sc.parallelize(input).toDF("float_col1", "float_col2")
val reprotectFloatUDF = sqlContext.udf.register("ptyReprotectFloat", com.protegrity.spark.udf.ptyReprotectFloat _)
df.registerTempTable("float_test")
sqlContext
.sql("select ptyReprotectFloat(float_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from float_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectFloat()	No	No	No	Yes	No	Yes

ptyReprotectDouble()

The UDF reprotects the protected double format data, which was earlier protected with a different data element.

Signature:

ptyReprotectDouble(Double colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the double format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Result:

The UDF returns the protected double format data.

Example:

import sqlContext.implicits._
val input = Seq((1234.345, 1343.3345))
val df = sc.parallelize(input).toDF("double_col1", "double_col2")
val reprotectDoubleUDF = sqlContext.udf.register("ptyReprotectDouble", com.protegrity.spark.udf.ptyReprotectDouble _)
df.registerTempTable("double_test")
sqlContext
.sql("select ptyReprotectDouble(double_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from double_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectDouble()	No	No	No	Yes	No	Yes

ptyReprotectDecimal()

The UDF reprotects the protected decimal format data, which was earlier protected with a different data element.

Signature:

ptyReprotectDecimal(Decimal colName, String oldDataElement, String newDataElement)

Parameters:

colName: Specifies the column that contains the Decimal format data to reprotect.
oldDataElement: Specifies the data element that was used to protect the data earlier.
newDataElement: Specifies the new data element that will be used to reprotect the data.

Warning: Ensure that you use the No Encryption data element only. Using any other data element might cause corruption of data.

Caution: Before the ptyReprotectDecimal() UDF is called, Spark SQL rounds off the decimal value in the table to 18 digits in scale, irrespective of the length of the data.

Result:

The UDF returns the protected Decimal format data.

Example:

import sqlContext.implicits._
val input = Seq((math.BigDecimal.valueOf(1234.345), math.BigDecimal.valueOf(1343.3345)))
val df = sc.parallelize(input).toDF("decimal_col1", "decimal_col2")
val reprotectDecimalUDF = sqlContext.udf.register("ptyReprotectDecimal", com.protegrity.spark.udf.ptyReprotectDecimal _)
df.registerTempTable("decimal_test")
sqlContext
.sql("select ptyReprotectDecimal(decimal_col1, 'Token_NoEncryption', 'Token_NoEncryption') as reprotected from decimal_test")
.show(false)

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyReprotectDecimal()	No	No	No	Yes	No	Yes

ptyStringEnc()

The UDF encrypts a string value to get binary data.

Signature:

ptyStringEnc(String input, String DataElement)

Parameters:

String input: Specifies the string value to encrypt.
String DataElement: Specifies the name of the data element to encrypt the string value.

Result:

The UDF returns an encrypted binary value.

Note: To store the binary output of the ptyStringEnc UDF in a string column, use the built-in Base64 Spark SQL function to convert the output encrypted bytes into a Base64 encoded string.

Example:

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrEncUDF = sqlContext.udf.register("ptyStringEnc",com.protegrity.spark.udf.ptyStringEnc _)
val pepTest = sc.parallelize(List("hello", "world")).toDF("col1")
pepTest.registerTempTable("spark_clear_table")
val encr_spark = sqlContext.sql("select base64(ptyStringEnc(col1,'AES128_CRC')) as col1
spark_clear_table").toDF()
encr_spark.show()
encr_spark.registerTempTable("encrypted_spark")

Exception:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyStringEnc	No	AES-128 AES-256 3DES CUSP	No	Yes	No	Yes

Guidelines to estimate the field size of the data

The encryption algorithm and the field sizes (in bytes) required by the features, such as, Key ID (KID), Initialization Vector (IV), and Integrity Check (CRC) is listed in the following table:

Encryption Algorithm	KID (size in Bytes)	IV (size in Bytes)	CRC (size in Bytes)
AES	16	16	4
3DES	8	8	4
CUSP_TRDES	2	N/A	4
CUSP_AES	2	N/A	4

The byte sizes required by the input file and the encryption algorithm with the features selected is listed in the following table:

Encryption Algorithm	Maximum Input size in bytes eligible for Encryption	Maximum Input size in bytes eligible for Decryption and Re-Encryption
3DES	Less than <= 535000000 Approximately 512 MB	Less than <= 715120000 Approximately 682 MB
AES-128
AES-256
CUSP 3DES
CUSP AES-128
CUSP AES-256

ptyStringDec()

The UDF decrypts a binary value to get string data.

Signature:

ptyStringDec(Binary input, String DataElement)

Parameters:

Binary input: Specifies the protected Binary value to unprotect.
String DataElement: Specifies the name of the data element that was used to encrypt the string value, to decrypt the binary value.

Result:

The UDF returns the decrypted string value.

Note: If you have previously stored the encrypted bytes as a Base64-encoded string, then decode them using the unbase64 Spark SQL built-in function before passing to the ptyStringDec UDF.

Example:

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrDecUDF = sqlContext.udf.register("ptyStringDec",com.protegrity.spark.udf.ptyStringDec _)
val decyrpt_spark = sqlContext.sql("select ptyStringDec(unbase64(col1),'AES128_CRC') as col1 from encrypted_spark").toDF()
decyrpt_spark.show()

Exception:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyStringDec()	No	AES-128 AES-256 3DES CUSP	No	Yes	No	Yes

ptyStringReEnc()

The UDF re-encrypts the Binary format encrypted data with a different data element to get another binary data.

Signature:

ptyStringReEnc(Binary input, String oldDataElement, String newDataElement)

Parameters:

Binary input: Specifies the binary value to re-encrypt.
String oldDataElement: Specifies the data element that was used to encrypt the data earlier.
String newDataElementt: Specifies the new data element to re-encrypt the data.

Result:

The UDF returns the re-encrypted binary format data.

Note:

If you have previously stored the encrypted bytes as a Base64 encoded string, then decode them using the unbase64 Spark SQL built-in function before passing to the ptyStringReEnc UDF.
To store the Binary output of the ptyStringReEnc UDF in a String column, use the Base64 Spark SQL built-in function to convert the output re-encrypted bytes into a Base64 encoded string.

Example:

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val protectStrReEncUDF = sqlContext.udf.register("ptyStringReEnc",com.protegrity.spark.udf.ptyStringReEnc _)
val reencyrpt_spark = sqlContext.sql("select base64(ptyStringReEnc(unbase64(col1),'AES128_CRC','AES128_CRC')) as col1 from
encrypted_spark").toDF()
reencyrpt_spark.show()

Exception:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit: The length of the input needs to be less than the maximum limit of 512 MB.

Supported Protection Methods:

Function Name	Tokenization	Encryption	FPE	No Encryption	Masking	Monitoring
ptyStringReEnc()	No	AES-128 AES-256 3DES CUSP	No	Yes	No	Yes

Feedback

Was this page helpful?

Last modified : December 18, 2025