This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Tokenization Types

It describes the tokenization type properties for different protectors. It also provides some examples for tokenized values for different token types.

1: Numeric (0-9)
2: Integer (0-9)
3: Credit Card
4: Alpha (A-Z)
5: Upper-Case Alpha (A-Z)
6: Alpha-Numeric (0-9, a-z, A-Z)
7: Upper-Case Alpha-Numeric (0-9, A-Z)
8: Lower ASCII
9: Datetime (YYYY-MM-DD HH:MM:SS)
10: Decimal
11: Unicode Gen2
12: Binary
13: Email
14: Printable
15: Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY)
16: Unicode
17: Unicode Base64

1 - Numeric (0-9)

Details about the Numeric (0-9) token type.

The Numeric token type tokenizes digits from 0 to 9.

Table: Numeric Tokenization Type properties

Tokenization Type Properties	Settings
Name	Numeric
Token type and Format	Digits 0 through 9
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	3933
SLT_1_6 SLT_2_6	Yes	Yes	1	4096
		No, return input as it is	6
		No, generate error	6
	No	NA	1	3933
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	None

The following table lists the examples of numeric tokenization values.

Table: Examples of Numeric tokenization values

Input Value	Tokenized Value	Comments
123	977	Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes The value has minimum length for SLT_1_3 tokenizer.
1	555241	Numeric, SLT_1_6, Left=0, Right=0, Length Preservation=No The value is padded up to 6 characters which is minimum length for SLT_1_6 tokenizer.
-7634.119	-4306.861	Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes Decimal point and sign are treated as delimiters and not tokenized.
12+38=50	98+24=62	Numeric, SLT_2_6, Left=0, Right=0, Length Preservation=Yes Arithmetic signs are treated as delimiters and not tokenized.
704-BBJ	134-BBJ	Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes Alpha characters are treated as delimiters and not tokenized.
704-BBJ	Error. Input too short.	Numeric, SLT_2_6, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate error Input value has only three numeric characters to tokenize, which is short for SLT_2_6 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate error.
704-BBJ 704356	704-BBJ 134432	Numeric, SLT_2_6, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than six characters to tokenize, then it is returned as is else it is tokenized.
704-BBJ	134-BBJ	Numeric, SLT_2_6, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes Input value has three numeric characters to tokenize, which meets minimum length requirement for SLT_2_6 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
704	134	Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
704-BBJ	669-BBJ642	Numeric, SLT_1_6, Left=0, Right=0, Length Preservation=No Input value is padded up to 6 characters because Length Preservation=No. Alpha characters are treated as delimiters and not tokenized.
704-BBJ	764-6BBJ	Numeric, SLT_2_3, Left=1, Right=3, Length Preservation=No 1 character from left and 3 from right are left in clear. Two numeric characters left for tokenization “04” were padded and tokenized as “646”.

Numeric Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Numeric token.

Table: Supported input data types for Application protectors with Numeric token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the Application Protectors APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Numeric token.

Table: Supported input data types for Big Data protectors with Numeric token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Numeric token.

Table: Supported input data types for Data Warehouse protectors with Numeric token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Numeric token.
Table: Supported input data types for Oracle Database protector with Numeric token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

Note: For numeric data elements where length preservation is not enabled, the maximum supported length is 3,842 characters. Data up to this length can be tokenized and de-tokenized without errors.

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Numeric token.
Table: Supported input data types for Oracle Database protector with Numeric token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

2 - Integer (0-9)

Details about the Integer token type.

The Integer token type tokenizes 2, 4, or 8 byte size integers.

Table: Integer Tokenization Type properties

Tokenization Type Properties	Settings
Name	Integer
Token type and Format	2, 4, or 8 byte size integers
Tokenizer	Length Preservation	Minimum Length	Maximum Length
SLT_1_3	Yes	2 bytes	8 bytes
Possibility to set Minimum/ maximum length	No
Left/Right settings	No
Internal IV	No
External IV	Yes
Return of Protected value	Yes
Token specific properties	Size 2, 4, or 8 bytes

The following table shows examples of the way in which a value will be tokenized with the Integer token.

Table: Examples of Integer tokenization values

Input Value	Tokenized Value	Comments
12	31345	Integer, SLT_1_3, Left=0, Right=0, Length Preservation=Yes
3	1465	For 2 bytes, the values can range from -32768 to 32767.
3	782939681	For 4 bytes, the values can range from -2147483648 to 2147483647.
3	7268379031142372719	For 8 bytes, the value range can range from -9223372036854775808 to 9223372036854775807.

The pty.ins_integer UDF in the Oracle, Teradata, and Impala Protectors, supports input data length of 4 bytes only. For 2 bytes, the following error is returned: Invalid input size.

Integer Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Integer token.

Table: Supported input data types for Application protectors with Integer token

Application Protectors	AP Java	AP Python
Supported input data types	SHORT: 2 bytes INT: 4 bytes LONG: 8 bytes	INT: 4 bytes and 8 bytes

If the user passes a 4-byte integer with values ranging from -2,147,483,648 to +2,147,483,647, the data element for the protect, unprotect, or reprotect APIs should be an 4-byte integer token type. However, if the user uses 2-byte integer token type, the data protection operation will not be successful. For a Bulk call using the protect, unprotect, and reprotect APIs, the error code, 44, appears. For a single call using the protect, unprotect, and reprotect APIs, an exception will be thrown and the error message, 44, Content of input data is not valid appears.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Integer token.

Table: Supported input data types for Big Data protectors with Integer token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	INT: 4 bytes LONG: 8 bytes	INT: 4 bytes BIGINT: 8 bytes	INT: 4 bytes	BYTE[]	SMALLINT: 2 bytes INT: 4 bytes BIGINT: 8 bytes	SHORT: 2 bytes INT: 4 bytes LONG: 8 bytes	SHORT: 2 bytes INT: 4 bytes LONG: 8 bytes	SMALLINT: 2 bytes INT: 4 bytes BIGINT: 8 bytes

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Bytes as input that are not generated from string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes should be passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Integer token.

Table: Supported input data types for Data Warehouse protectors with Integer token

Data Warehouse Protectors	Teradata
Supported input data types	SMALLINT: 2 bytes INTEGER: 4 bytes BIGINT: 8 bytes

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Integer token.

Table: Supported input data types for Oracle Database protector with Integer token

Protector	Oracle
Supported Input Data Types	INTEGER

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Integer token.

Table: Supported input data types for MSSQL Database protector with Integer token

Protector	MSSQL
Supported Input Data Types	INTEGER

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

3 - Credit Card

Details about the Credit Card token type.

The Credit Card token type helps maintain transparency. It provides ways to clearly distinguish a token from the real value which is a recommendation of the PCI DSS. The Credit Card token type supports only numeric input (no separators are allowed as input).

Table: Credit Card Tokenization properties

Tokenization Type Properties	Settings
Name	Credit Card
Token type and Format	Digits 0 through 9 (no separators are allowed as input)
Tokenizer	Length Preservation	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	3	4096
SLT_1_6 SLT_2_6	Yes	6	4096
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	Invalid LUHN Checksum Invalid Card Type Alphabetic Indicator

The credit card number real value is distinguished from the tokenized value based on the token value validation properties.

Table: Specific Properties of the Credit Card Token Type

Credit Card Token Value Validation Properties	Left in Clear	Right in Clear	Comments	Validation Properties Compatibility
Invalid Luhn Checksum (On/Off)	Yes	Yes	Right characters which are to be left in the clear can be specified. This usually requires specifying a group of up to four characters.	Can be used together.
Invalid Card Type (On/Off)	0	Yes	Left cannot be specified, it is zero by default.	Can be used together.
Alphabetic Indicator (On/Off)	Yes	Yes	The indicator will be in the token, which means that left and right can be specified.	Can be used only separately from the other token validation properties.

You can create a Credit Card token element and select no validation property for it. If the Credit Card token is involved, it will be handled similar to a Numeric token. However, additional checks will be applied to the input based on the properties detailed in the Credit Card token general properties column in the table above.

To enable the Credit Card token properties, such as, Invalid LUHN checksum and Invalid Card Type, with the SLT Tokenizers, refer to Credit Card Properties with SLT Tokenizers.

Invalid Luhn Checksum

The purpose of the Luhn checksum is to detect incorrectly entered card details. If you enable Invalid Luhn Checksum token validation, then you must use valid credit cards otherwise tokenization will be denied for an invalid credit card number.

A valid credit card has a valid Luhn checksum. Upon tokenization, the tokenized value will have an invalid Luhn checksum. Here is an example of the tokenized credit card with the invalid Luhn digit.

Table: Credit Card Number with Luhn Checksum Examples

Credit Card Number	Tokenized Values	Comments
4067604564321453	Token is not generated due to invalid input value. Error is returned.	The input value contains invalid Luhn checksum. The value cannot be tokenized with Luhn enabled.
4067604564321454	2009071778438613	The Luhn in the input value is correct, the value is tokenized. Tokenized value has invalid Luhn checksum.

Invalid Card Type

An invalid credit card indicates an issue with the credit card details. An invalid card type will result in token values not starting with the digits that real credit card numbers begin with. The first digit in a real credit card number is the Major Industry Identifier. Thus, digits 3,4,5,6, and 0 can be the first digits of the real credit card number, which are then substituted during tokenization.

Table: Real Credit Card Values with Tokenized Values

Real Credit Card Value	3	4	5	6	0
Tokenized Value	2	7	8	9	1

Here is an example of the tokenized credit card with the invalid card type.

Table: Credit Card Number with Invalid Card Type Examples

Credit Card Number	Tokenized Values	Comments
4067604564321454	7335610268467066	The credit card type is valid, the tokenization is successful.
2067604564321454	Token is not generated due to invalid input value. Error is returned.	The credit card type is invalid since the first digit of the value “2” does not belong to a real credit card. The value cannot be tokenized.

Alphabetic Indicator

The alphabetic indicator replaces the tokenized value with an alphabet. If you enable Alphabetic Indicator validation, then the resulting token value will have one alphabetic character.

You will need to choose the position of the alphabetic character before tokenizing a credit card number otherwise the resulting token will have no alphabetic indicator.

The alphabetic indicator will substitute the tokenized value according to the following rule:

Table: Alphabetic Indicator with Tokenized Digits

Tokenized digit	0	1	2	3	4	5	6	7	8	9
Alphabetic indicator	A	B	C	D	E	F	G	H	I	J

In the following table, the Visa Card Number “4067604564321454” is tokenized. A tokenized value, represented by “7594107411315001”, is substituted with an alphabetic character in a selected position.

Table: Examples of Credit Card Tokenization with Alphabetic Indicator

Credit Card Number (Input Value)	Position	Tokenized Values	Comments
4067604564321454	-	7594107411315001	No substitution since the position is undefined.
4067604564321454	14	7594107411315A01	Digit “0” is substituted with character “A” at position 14.

Credit Card Properties with SLT Tokenizers

The Credit Card Properties with SLT Tokenizers explains the minimum data length required for tokenization. This occurs when the Credit Card token properties is used in combination with the SLT Tokenizers.

If you enable Credit Card token properties for tokenization, such as Invalid LUHN checksum and Invalid Card Type, you need to select an appropriate SLT Tokenizer. This is required to ensure the minimum data length is available for successful tokenization.

The following table represents the minimum data length required for tokenization as per the usage of Credit Card token properties with the SLT Tokenizers.

Table: Minimum Data Length - Credit Card Token Properties with SLT Tokenizers

Enabled Credit Card Token Property	Minimum Data Length (in digits) Required for Tokenization
Enabled Credit Card Token Property	SLT_1_3/SLT_2_3	SLT_1_6/SLT_2_6
Invalid LUHN Checksum	4	7
Invalid Card Type	4	7
Invalid LUHN Checksum and Invalid Card Type	5	8

Credit Card Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Credit Card token.

Table: Supported input data types for Application protectors with Credit Card token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the Application Protectors APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Credit Card token.

Table: Supported input data types for Big Data protectors with Credit Card token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Bytes as input that are not generated from string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes should be passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Credit Card token.

Table: Supported input data types for Data Warehouse protectors with Credit Card token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Credit card token.

Table: Supported input data types for Oracle Database protector with Credit Card token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Credit card token.

Table: Supported input data types for MSSQL Database protector with Credit Card token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

4 - Alpha (A-Z)

Details about the Alpha (A-Z) token type.

The Alpha token type tokenizes both uppercase and lowercase letters.

Table: Alpha Tokenization Type properties

Tokenization Type Properties	Settings
Name	Alpha
Token type and Format	Lowercase letters a through z Uppercase letters A through Z
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	4076
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	None

The following table shows examples of the way in which a value will be tokenized with the Alpha token.

Table: Examples of Numeric tokenization values

Input Value	Tokenized Value	Comments
abc	nvr	Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=Yes The value has minimum length for SLT_1_3 tokenizer.
MA	TGi	Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=No The value is padded up to 3 characters which is minimum length for SLT_2_3 tokenizer.
MA	Error. Input too short.	Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate error Input value has only two alpha characters to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate error.
MA MAC	MA TGH	Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
MA	TG	Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes Input value has only two alpha characters, which meets minimum length requirement for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
131 Summer Street, Bridgewater	131 VDYgAK q vMDUn, zAEXmwqWYNQG	Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=No Numeric characters, spaces and comma are treated as delimiters and not tokenized. Output value is longer than initial value.
Albert Einstein	SldGzm OOCTzSFo	Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=Yes Space is treated as delimiters and not tokenized. Output value is the same length as initial value.
Albert Einstein	AjAkqD vvBFYLdo	Alpha, SLT_1_3, Left=1, Right=0, Length Preservation=Yes 1 character from left remains in the clear.

Alpha Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Alpha token.

Note: For both SLT_1_3 and SLT_2_3, the maximum length of the protected data is 4096 bytes. This occurs for the Alpha token element for Application Protector with no length preservation.

Table: Supported input data types for Application protectors with Alpha token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[] CHAR[] STRING	BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Alpha token.

Table: Supported input data types for Big Data protectors with Alpha token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2– The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data that is not converted to bytes from string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Alpha token.

Table: Supported input data types for Data Warehouse protectors with Alpha token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Alpha token.

Table: Supported input data types for Oracle Database protector with Alpha token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Alpha token.

Table: Supported input data types for MSSQL Database protector with Alpha token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

5 - Upper-Case Alpha (A-Z)

Details about the Upper-Case Alpha (A-Z) token type.

The Upper-Case Alpha token type tokenizes all alphabetic symbols as uppercase. After de-tokenization, all alphabetic symbols are returned as uppercase. This means that initial and detokenized values would not match if the input contains lowercase letters.

Table: Upper-Case Alpha Tokenization Type properties

Tokenization Type Properties	Settings
Name	Upper-Case Alpha
Token type and Format	Upper-Case letters A through Z
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	4049
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	Lower case characters are accepted in the input but they will be converted to upper-case in output value.

The following table shows examples of the way in which a value will be tokenized with the Upper-case Alpha token.

Table: Examples of Upper Case Alpha tokenization values

Input Value	Tokenized Value	Comments
abc	OIM	Upper-case Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=Yes The value has minimum length for SLT_2_3 tokenizer. Lowercase characters in the input are converted to uppercase in output. De-tokenization will return “ABC”.
NY	ZIZ	Upper-case Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=No The value is padded up to 3 characters which is minimum length for SLT_1_3 tokenizer.
NY	Error. Input too short.	Upper-case Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate error Input value has only two alpha characters to tokenize, which is short for SLT_2_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate error.
NY NYA	NY ZIO	Upper-case Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
NY	ZI	Upper-case Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes Input value has only two alpha characters to tokenize, which meets minimum length requirement for SLT_2_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
131 Summer Street, Bridgewater	131 ZBXDPW G FYTZP, CRTTPXPLYGCU	Upper-case Alpha, SLT_1_3, Left=0, Right=0, Length Preservation=No Numeric characters, spaces and comma are treated as delimiters and not tokenized. Output value is longer than initial value.
Albert Einstein	AOALXO POHLFHMU	Upper-case Alpha, SLT_2_3, Left=0, Right=0, Length Preservation=Yes Space is treated as delimiters and not tokenized. Output value is the same length as initial value.
704-BBJ	704-GTU	Upper-case Alpha, SLT_1_3, Left=3, Right=0, Length Preservation=Yes Three characters from left are left in clear. Dash is treated as delimiter.

Upper-case Alpha Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Upper-case Alpha token.

Table: Supported input data types for Application protectors with Upper-case Alpha token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[] CHAR[] STRING	BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Upper-Case Alpha token.

Table: Supported input data types for Big Data protectors with Upper-Case Alpha token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Upper-case Alpha token.

Table: Supported input data types for Data Warehouse protectors with Upper-case Alpha token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Upper-Case Alpha token.

Table: Supported input data types for Database protector with Upper-Case Alpha token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Upper-Case Alpha token.

Table: Supported input data types for MSSQL Database protector with Upper-Case Alpha token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

6 - Alpha-Numeric (0-9, a-z, A-Z)

Details about the Alpha-Numeric (0-9, a-z, A-Z) token type.

The Alpha-numeric token type tokenizes all alphabetic symbols, including lowercase and uppercase letters. It also tokenizes digits from 0 to 9.

Table: Alpha-Numeric Tokenization Type properties

Tokenization Type Properties	Settings
Name	Alpha-Numeric
Token type and Format	Digits 0 through 9 Lowercase letters a through z Uppercase letters A through Z
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	4080
Preserve Case	Yes, if SLT_2_3 tokenizer is selected If you select the Preserve Case or Preserve Position property on the ESA Web UI, the Preserve Length property is enabled. If you set the Allow Short Data property to Yes, it is also enabled by default. In addition, these two properties are not modifiable.
Preserve Position
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes If you are selecting the Preserve Case or Preserve Position property on the ESA Web UI, then the retention of characters or digits from the left and the right are disabled, by default. In addition, the From Left and From Right properties are both set to zero.
Internal IV	Yes, if Left/Right settings are non-zero If you are selecting the Preserve Case or Preserve Position property on the ESA Web UI, then the alphabetic part of the input value is applied as an internal IV to the numeric part of the input value prior to tokenization.
External IV	Yes If you are selecting the Preserve Case or Preserve Position property on the ESA Web UI, then the external IV property is not supported.
Return of Protected value	Yes
Token specific properties	None

The following table shows examples of the way in which a value will be tokenized with the Alpha-Numeric token.

Table: Examples of Tokenization for Alpha-Numeric Values

Input Value	Tokenized Value	Comments
123	sQO	Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes Input is numeric but tokenized value contains uppercase and lowercase alpha characters.
NY	1DT	Alpha-Numeric, SLT_2_3, Left=0, Right=0, Length Preservation=No The value is padded up to 3 characters which is minimum length for SLT_2_3 tokenizer.
j1	4t	Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes The minimum length meets the requirement for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
j1	Error. Input too short.	Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate error The input has two characters to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate error.
j1 j1Y	j1 4tD	Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
131 Summer Street, Bridgewater	ikC ejCxxp kLa 2ZZ, 5x8K2IMubcn	Alpha-Numeric, SLT_2_3, Left=0, Right=0, Length Preservation=No Spaces and comma are treated as delimiters and not tokenized.
704-BBJ	jf7-oVY	Alpha-Numeric, SLT_1_3, Left=3, Right=0, Length Preservation=Yes Dash is treated as delimiter. The rest of value is tokenized.
704-BBJ	uHq-fTr	Alpha-Numeric, SLT_2_3, Left=3, Right=0, Length Preservation=Yes Dash is treated as delimiter. The rest of value is tokenized.
Protegrity2012	Pr3CYMPilr9n12	Alpha-Numeric, SLT_1_3, Left=2, Right=2, Length Preservation=Yes Two characters from left and 2 characters from right are left in clear. The rest of value is tokenized.

Alpha-Numeric Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Alpha-Numeric token.

Table: Supported input data types for Application protectors with Alpha-Numeric token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Alpha-Numeric token.

Table: Supported input data types for Big Data protectors with Alpha-Numeric token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Alpha-Numeric token.

Table: Supported input data types for Data Warehouse protectors with Alpha-Numeric token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Alpha-Numeric token.

Table: Supported input data types for Oracle Database protector with Alpha-Numeric token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Alpha-Numeric token.

Table: Supported input data types for MSSQL Database protector with Alpha-Numeric token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

7 - Upper-Case Alpha-Numeric (0-9, A-Z)

Details about the Upper-Case Alpha-Numeric (0-9, A-Z) token type.

The Upper-Case Alpha-Numeric token type tokenizes uppercase letters A through Z and digits 0 to 9. It tokenizes all alphabetic symbols as uppercase. After de-tokenization, all alphabetic symbols are returned as uppercase. This means that initial and detokenized values would not match if the input contains lowercase letters.

Table: Upper-Case Alpha-Numeric Tokenization Type properties

Tokenization Type Properties	Settings
Name	Upper-Case Alpha-Numeric
Token type and Format	Digits 0 through 9 Uppercase letters A through Z
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	4064
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	Lower case characters are accepted in the input but they will be converted to upper-case in output value.

The following table shows examples of the way in which a value will be tokenized with the Upper-Case Alpha-Numeric token.

Table: Examples of Tokenization for Upper-Case Alpha-Numeric Values

Input Value	Tokenized Value	Comments
123	STD	Upper-Case Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes Input is numeric but tokenized value contains uppercase alpha characters.
J1	4T	Upper Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes The minimum length meets the requirement for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
J1	Error. Input too short.	Upper-Case Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate error The input has two characters to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate error.
J1 J1Y	J1 4TD	Upper-Case Alpha-Numeric, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
NY	AOZ	Upper-Case Alpha-Numeric, SLT_2_3, Left=0, Right=0, Length Preservation=No The value is padded up to 3 characters which is minimum length for SLT_2_3 tokenizer.
131 Summer Street, Bridgewater	8C9 CSD5PS 1X5 ZJH, 231JHXW8CVF	Upper-Case Alpha-Numeric, SLT_2_3, Left=0, Right=0, Length Preservation=No Spaces and comma are treated as delimiters and not tokenized. Lowercase characters in the input are converted to uppercase in output. De-tokenization will return all alpha characters in uppercase.
704-BBJ	704-EC0	Upper-Case Alpha-Numeric, SLT_1_3, Left=3, Right=0, Length Preservation=Yes Dash is treated as delimiter. The rest of value is tokenized.
704-BBJ	704-HHT	Upper-Case Alpha-Numeric, SLT_2_3, Left=3, Right=0, Length Preservation=Yes Dash is treated as delimiter. The rest of value is tokenized.
support@protegrity.com	FKNKHHQ@72CN84UKEI.com	Upper-Case Alpha-Numeric, SLT_2_3, Left=0, Right=3, Length Preservation=Yes Three characters from right are left in clear. “@” and “.” are treated as delimiters. The rest of value is tokenized. De-tokenization will return all alpha characters in uppercase.

Upper-Case Alpha-Numeric Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Upper-Case Alpha-Numeric token.

Table: Supported input data types for Application protectors with Upper-Case Alpha-Numeric token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Upper-Case Alpha-Numeric token.

Table: Supported input data types for Big Data protectors with Upper-Case Alpha-Numeric token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Upper-Case Alpha-Numeric token.

Table: Supported input data types for Data Warehouse protectors with Upper-Case Alpha-Numeric token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Alpha Numeric token.

Table: Supported input data types for Oracle Database protector with Alpha Numeric token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Upper-Case Alpha token.

Table: Supported input data types for MSSQL Database protector with Upper-Case Alpha token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

8 - Lower ASCII

Details about the Lower ASCII token type.

The Lower ASCII token type is used to tokenize printable ASCII characters.

Table: Lower ASCII Tokenization Type properties

Tokenization Type Properties	Settings
Name	Lower ASCII
Token type and Format	The lower part of ASCII table. Hex character codes from 0x21 to 0x7E. For the list of ASCII characters supported by Lower ASCII token, refer to ASCII Character Codes.
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
SLT_1_3	Yes	Yes	1	4096
		No, return input as it is	3
		No, generate error	3
	No	NA	1	4086
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	Space character is treated as delimiter

The following table shows examples of the way in which a value will be tokenized with the Lower ASCII token.

Table: Examples of Tokenization for Lower ASCII Values

Input Value	Tokenized Value	Comments
La Scala 05698	:H HnwqP v/Q`>	All characters in the input value are tokenized. Spaces are excluded from the tokenization process.
Ford Mondeo CA-0256TY M34 567 K-45	j`1$ nRSD<X T]!(~4MWF l:f cF+ R?V{	All characters in the input value are tokenized. Spaces are excluded from the tokenization process.
ac	;H	Lower ASCII, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes The minimum length meets the requirement for the SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
ac	Error. Input too short.	Lower ASCII, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate an error The input has two characters to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate an error.
ac aca	ac ;HH	Lower ASCII, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.

Lower ASCII Tokenization Properties for different protectors

Lower ASCII tokenization should not be used with JSON or XML UDFs.

Application Protector

The following table shows supported input data types for Application protectors with the Lower ASCII token.

Table: Supported input data types for Application protectors with Lower ASCII token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Lower ASCII token.

Table: Supported input data types for Big Data protectors with Lower ASCII token

Big Data Protectors	MapReduce^*3	Hive^*2	Pig^*2	HBase^*3	Impala^*2	Spark^*3	Spark SQL	Trino^*2
Supported input data types^*1	BYTE[]	STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – Ensure that you use the Horizontal tab “\t” as the field or column delimiter when loading data that is tokenized using Lower ASCII tokens for Hive, Pig, Impala, and Trino.

^*3 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Lower ASCII token.

Table: Supported input data types for Data Warehouse protectors with Lower ASCII token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Lower ASCII token.

Table: Supported input data types for Oracle Database protector with Lower ASCII token

Protector	Oracle
Supported Input Data Types	VARCHAR2, CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Lower ASCII token.

Table: Supported input data types for MSSQL Database protector with Lower ASCII token

Protector	MSSQL
Supported Input Data Types	VARCHAR, CHAR

Note: Lowercase ASCII tokenization is not applicable to JSON or XML UDFs.

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

9 - Datetime (YYYY-MM-DD HH:MM:SS)

Details about the Datetime (YYYY-MM-DD HH:MM:SS) token type.

The Datetime token type was introduced in response to requirements to allow specific date parts to remain in the clear and for date tokens to be distinguishable from real dates. The Datetime token type allows time to be tokenized (HH:MM:SS) in fractions of a second, including milliseconds (MMM), microseconds (mmmmmm), and nanoseconds (nnnnnnnnn).

Extended DateTime Tokenization with Timezone Offsets

The ISO 8601 DateTime format with timezone offsets are supported only for the following protectors:

Protector	Version
Application Protector .NET	10.0.1
Teradata Data Warehouse Protector	10.1.0

Note: No other protectors support the ISO 8601 formatted DateTime inputs.

The extended DateTime tokenization now supports ISO 8601 formatted dates including timezone offsets. For example, +05:30. The tokenizer applies protection only to the date and time element up to seconds. The fractional seconds and additional identifiers remains unchanged. The Delimiters are preserved.

Supported Format:

Examples of valid input:
- YYYY-MM-DD HH:MM:SS+hh:mm
- YYYY.MM.DD HH:MM:SS+hh:mm
Maximum length: 37 bytes
For example, the longest supported format is -
```
2019-11-07 13:37:00:000000000+05:30
```

Table: Datetime Tokenization Type properties

Tokenization Type Properties	Settings
Name	Datetime
Token type and Format	Datetime in the following formats: YYYY-MM-DD HH:MM:SS.MMM YYYY-MM-DDTHH:MM:SS.MMM YYYY-MM-DD HH:MM:SS.mmmmmm YYYY-MM-DDTHH:MM:SS.mmmmmm YYYY-MM-DD HH:MM:SS.nnnnnnnnn YYYY-MM-DDTHH:MM:SS.nnnnnnnnn YYYY-MM-DD HH:MM:SS YYYY-MM-DDTHH:MM:SS YYYY-MM-DD YYYY-MM-DD +05:30^1 YYYY-MM-DD HH:MM:SS.MMM +05:30^1 YYYY-MM-DDTHH:MM:SS.MMM +05:30^1 YYYY-MM-DD HH:MM:SS.mmmmmm +05:30^1 YYYY-MM-DDTHH:MM:SS.mmmmmm +05:30^1 YYYY-MM-DD HH:MM:SS.nnnnnnnnn +05:30^1 YYYY-MM-DDTHH:MM:SS.nnnnnnnnn +05:30^1 YYYY-MM-DD HH:MM:SS +05:30^1 YYYY-MM-DDTHH:MM:SS +05:30^*1
Input separators "delimiter" between date, month and year	dot ".", slash "/", or dash "-"
Input separators "delimiter" between hours, minutes, and seconds	colon ":" only
Input separator "delimiter" between date and hour	space " " or letter "T"
Input separator "delimiter" between seconds and fractions of a second	For DATE datatype dot "."
	For CHAR, VARCHAR, and STRING datatypes dot "." and comma ","
Input separator "delimiter" between hours, minutes, seconds, and fractions of second and timezone offset^*1	space " " or "+" or "-"
Tokenizer	Length Preservation	Minimum Length	Maximum Length
SLT_DATETIME	Yes	10	29
Possibility to set Minimum/ maximum length	No
Left/Right settings	No
Internal IV	No
External IV	No
Return of Protected value	Yes
Token specific properties
Tokenize time	Yes/No
Distinguishable date	Yes/No
Date in clear	Month/Year/None
Supported range of input dates	From "0600-01-01" to "3337-11-27"
Non-supported range of Gregorian cutover dates	From "1582-10-05" to "1582-10-14"

Note:
^*1 - Limitation. For more information, refer to Extended DateTime Tokenization with Timezone Offsets.

The Tokenize Time property defines whether the time part (HH:MM:SS) will be tokenized. If Tokenize Time is set to “No”, the time part will be treated as a delimiter. It will be added to the date after tokenization.

The Distinguishable Date property defines whether the tokenized values will be outside of the normal date range.

If the Distinguishable Date option is enabled, then all tokenized dates will be in the range from year 5596-09-06 to 8334-08-03. The tokenized value will become recognizable. As an example, tokenizing “2012-04-25” can result in “6457-07-12”, which is distinguishable.

If the Distinguishable Date option is disabled, then the tokenized dates will be in the range from year 0600-01-01 to 3337-11-27. As an example, tokenizing “2012-04-25” will result in “1856-12-03”, which is non-distinguishable.

The Date in Clear property defines whether Month or Year will be left in the clear in the tokenized value.

Note: You cannot use enabled Distinguishable Date and select month or year to be left in the clear at the same time.

The following points are applicable when you tokenize the Dates with Year as 3337 by setting the Year part to be in clear:

The tokenized Date value can be outside of the accepted Date range.
The tokenized Date value can be de-tokenized to obtain the original Date value.

For example, if the Date 3337-11-27 is tokenized by setting the Year part 3337 in clear, then the resultant tokenized value 3337-12-15 is outside of the accepted Date range. The detokenization of this tokenized value returns the original Date 3337-11-27.

The following table shows examples of the way in which a value will be tokenized with the Datetime token.

Table: Examples of Tokenization for DateTime Values

Input Values	Tokenized Values	Comments
2009.04.12 12:23:34.333	1595.06.19 14:31:51.333	YYYY-MM-DD HH:MM:SS.MMM. The milliseconds value is left in the clear.
2009.04.12 12:23:34.333666	1595.06.19 14:31:51.333666	YYYY-MM-DD HH:MM:SS.mmmmmm. The microseconds value is left in the clear.
2009.04.12 12:23:34.333666999	1595.06.19 14:31:51.333666999	YYYY-MM-DD HH:MM:SS.nnnnnnnnn. The nanoseconds value is left in the clear.
2009.04.12 12:23:34	1595.06.19 14:31:51	YYYY-MM-DD HH:MM:SS with space separator between day and hour.
2234.10.12T12:23:23	2755.08.04T22:33:43	YYYY-MM-DDTHH:MM:SS with T separator between day and hour values.
2009.04.12 12:23:34.333	5150.05.14T17:49:34.333	Datetime with distinguishable date property enabled and the year value is outside the normal date range.
2234.12.22 22:53:34	2755.03.15 19:03:21	Datetime token in any format with distinguishable date property enabled and the year value is within the normal date range in the tokenized output.
2009.04.12 12:23:34.333	1595.04.19 14:31:51.333	Datetime token with month in the clear.
2009.04.12 12:23:34.333	2009.06.19 14:31:51.333	Datetime token with year in the clear.
2009.04.12 12:23:34.333666999+05:30^*1	2009.06.19 14:31:51.333666999+05:30	Extended DateTime token with nanoseconds value and timezone identifier left in the clear.

Note:
^*1 - Limitation. For more information, refer to Extended DateTime Tokenization with Timezone Offsets.

Datetime Tokenization for Cutover Dates of the Proleptic Gregorian Calendar
The data systems, such as, Oracle or Java-based systems, do not accept the cutover dates of the Proleptic Gregorian Calendar. The cutover dates of the Proleptic Gregorian Calendar fall in the interval 1582-10-05 to 1582-10-14. These dates are converted to 1582-10-15. When using Oracle, conversion occurs by adding ten days to the source date. Due to this conversion, data loss occurs as the system is not capable to return the actual date value after the de-tokenization.

Note: The tokenization of the Date values in the cutover Date range of the Proleptic Gregorian Calendar results in an “Invalid Input” error.

The following points are applicable when the Distinguishable Date option is disabled:

If the Distinguishable Date option is disabled, then the tokenized dates are in the range 0600-01-01 to 3337-11-27, which also includes the cutover date range. During tokenization, an internal validation is performed to check whether the value is tokenized to the cutover date. If it is a cutover date, then the Year part (1582) of the tokenized value is converted to 3338 and then returned.
During de-tokenization, an internal check is performed to validate whether the Year is 3338. If the Year is 3338, then it is internally converted to 1582.

The following points are applicable when you tokenize the dates from the Year 1582 by setting the Year part to be in clear:

The tokenized value can result in the cutover Date range. In such a scenario, the Year part of the tokenized Date value is converted to 3338.
During de-tokenization, the Year part of the Date value is converted to 1582 to obtain the original date value.

For example, if the date 1582.04.30 12:12:12 is tokenized by setting the Year part in clear and the resultant tokenized value falls in the cutover Date range, then the Year part is converted to 3338 resulting in a tokenized value as 3338.10.10 12:12:12. The de-tokenization of this tokenized value returns the original Date 1582.04.30 12:12:12.

Note:
The tokenization accepts the date range 0600-01-01 to 3337-11-27 excluding the cutover date range.
The de-tokenization accepts the date range 0600-01-01 to 3337-11-27 and date values from the Year 3338. The year 3338 is accepted due to our support for tokenized value from the cutover date range.

Consider a scenario where you are migrating the protected data from Protector 1 to Protector 2. The Protector 1 includes the Datetime tokenizer update to process the cutover dates of the Proleptic Gregorian Calendar as input. The Protector 2 does not include this update. In such a scenario, an “Invalid Date Format” error occurs in Protector 2, when you try to unprotect the protected data as it fails to accept the input year 3338. The following steps must be performed to mitigate this issue:

Unprotect the protected data from Protector 1.
Migrate the unprotected data to Protector 2.
Protect the data from Protector 2.

Time zone Normalization for Datetime Tokens
The Datetime tokenizer does not normalize the timestamp with respect to the timezone before protecting the data.

In a few Protectors, the timezone normalization is done by the APIs that are used by the Protectors to retrieve the timestamp. However, this behavior can also be configured.

There are differences in handling timestamps. Therefore, you cannot rely on Datetime tokens for migration or transfer to different systems or timezones.

So, before migrating the Datetime tokens, ensure that the timestamps are normalized for timezones so that unprotecting the token value returns the original expected value.

Datetime Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Datetime token.

Table: Supported input data types for Application protectors with Datetime token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	DATE STRING CHAR[] BYTE[]	DATE BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Decimal token.

Table: Supported input data types for Big Data protectors with Decimal token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Decimal token.

Table: Supported input data types for Data Warehouse protectors with Decimal token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Decimal token.

Table: Supported input data types for Oracle Database protector with Decimal token

Protector	Oracle
Supported Input Data Types	NUMBER (p,s) VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Decimal token.

Table: Supported input data types for MSSQL Database protector with Decimal token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

11 - Unicode Gen2

Details about the Unicode Gen2 token type.

The Unicode Gen2 token type can be used to tokenize multi-byte code point character strings. The input Unicode data after protection returns a token value in the same Unicode character format. The Unicode Gen2 token type gives you the liberty to customize how the protected token value is returned. It allows you to leverage existing built-in alphabets or create custom alphabets by defining code points. The Unicode Gen2 token type preserves code point length. If the length preservation option is selected, the protected token length will be equal to the input data length in code points.

For instance, the respective lengths for UTF-8 and UTF-16 in bytes, is described in the following table. The input is protected with the Unicode Gen2 tokenizer. The example alphabet used is Basic Latin combined with Japanese characters. The code point length is preserved.

Table: Lengths for UTF-8 and UTF-16

Input Value	Code Points	UTF-8	UTF-16	Output Value	UTF-8	UTF-16
データ保護	5	15	10	睯窯闒懻辶	15	10
Protegrity	10	10	20	鑹晓侐晊秦龡箳蕛矱蝠	30	20
Protegrity_データ保護	16	26	32	门醆湏鞄眡莧閲楌蹬鑹_晓箳麻京眡	46	32

As the token type provides customizations through defining code points and creating custom token values, there are some considerations that must be taken before using such custom alphabets.

Note: For more information about the considerations, refer to Considerations while creating custom Unicode alphabets.

The performance benefits of this token type are higher compared to the other Unicode token types.

Table: Unicode Gen2 Tokenization Type properties

Tokenization Type Properties	Settings
Name	Unicode Gen2
Token type and Format	Application Protectors support UTF-8, UTF-16LE and UTF-16BE encoding. Code points from U+0020 to U+3FFFF excluding D800-DFFF. Encoding supported by the Unicode Gen2 data element is UTF-8,UTF-16LE, and UTF-16BE.
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length^*1
SLT_1_3^2 SLT_X_1^3	Yes	Yes	1 Code Point	4096 Code Points
		No, return input as it is	3 Code Points
		No, generate error	3 Code Points
Possibility to set Minimum/Maximum length	No
Left/Right settings	Yes
Internal IV	Yes
External IV	Yes
Return of Protected value	Yes
Token specific properties	Result is based on the alphabets selected while creating the token.

^*1 – The maximum input length to safely tokenize and detokenize the data is 4096 code points, which is irrespective of the byte representation.

^*2 - The SLT_1_3 tokenizer supports small alphabet size from 10-160 code points.

^*3 - The SLT_X_1 tokenizer supports large alphabet size from 161-100k code points.

The following table shows examples of the way in which a value will be tokenized with the Unicode Gen2 token.

Table: Examples of Tokenization for Unicode Gen2 Values

Input Values	Tokenized Values	Comments
даних	Ухбыш	Input value contains Cyrillic characters. Tokenization results include Cyrillic characters as the data element is created with the Cyrillic alphabet in its definition. The length of the tokenized value is equal to the length of the input data.
Protegrity	93VbLvI12g	Input value contains English characters. Tokenization results include English characters as the data element is created with the Basic Latin Alpha Numeric alphabet in its definition. Algorithm is length preserving. Hence, the length of the tokenized value is equal to the length of the input data.
ЕЖ	ao	Input value contains Cyrillic characters. Tokenization results include Cyrillic characters as the data element is created with the Cyrillic alphabet in its definition. Allow Short Data=Yes Algorithm is length preserving. The length of the tokenized value is equal to the length of the input data.

Considerations while creating custom Unicode alphabets

This section describes the important considerations to be aware of while working with Unicode. When creating a custom alphabet, a combination of existing alphabets, individual code points or ranges of code points can be used. The alphabet determines which code points are considered for tokenization. The code points not in the alphabet function as delimiters.

While this feature gives you the flexibility to generate token values in Unicode characters, the data element creation does not validate if the code point is defined or undefined. For example, consider that you create a data element that protects Greek and Coptic Unicode block. Though not recommended, a way you might consider to create the custom alphabet would be using the code point range option to include the whole Unicode block that ranges from U+0370 to U+03FF. As seen from the following image, this range includes both defined and undefined code points.

Greek and Coptic Code Points

The code point, U+0378 in the defined Greek and Coptic code point range is an undefined code point. When any input data is protected, since the code point range includes both defined and undefined code points, it might result in a corrupted token value if the entire code point range is defined.

It is hence recommended that for Unicode code point ranges where both defined and undefined code points exist, you must create code points ranges excluding any undefined code points. So, in case of the Greek and Coptic characters, a recommended strategy to define alphabets would be to create multiple alphabet entries, such as a range to cover U+0371 to U+0377, another range to cover U+037A to U+037F, and so on, thus skipping undefined code points.

Note: Only the alphabet characters that are supported by the OS fonts are displayed on the Web UI.

Note: Ensure that code points in the alphabet are supported by the protectors using this alphabet.

Unicode Gen2 Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Unicode Gen2 token.

Note: The string as an input and byte as an output API is unsupported by Unicode Gen2 data elements for AP Java and AP Python.

Table: Supported input data types for Application protectors with Unicode Gen2 token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[] CHAR[] STRING	BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Unicode Gen2 token.

Table: Supported input data types for Big Data protectors with Unicode Gen2 token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING	Not supported	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The External IV is not supported in Data Warehouse Protector.

The following table shows the supported input data types for the Teradata protector with the Unicode Gen2 token.

Table: Supported input data types for Data Warehouse protectors with Unicode Gen2 token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR UNICODE

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Unicode Gen2 token.

Table: Supported input data types for Oracle Database protector with Unicode Gen2 token

Protector	Oracle
Supported Input Data Types	VARCHAR2 and NVARCHAR2

The maximum input lengths supported for the Oracle database protector are as described by the following points:

Unicode Gen2 – Data type : VARCHAR2:
1. If the tokenizer length preservation parameter is selected as Yes, then the maximum limit that can be safely tokenized and detokenized is 4000 bytes.
2. If the tokenizer length preservation parameter is selected as No, then the maximum limit that can be safely tokenized and detokenized is 3000 bytes.
Unicode Gen2 – Data type : NVARCHAR2:
1. If the tokenizer length preservation parameter is selected as Yes, then the maximum limit that can be safely tokenized and detokenized is 4000 bytes.
2. If the tokenizer length preservation parameter is selected as No, then the maximum limit that can be safely tokenized and detokenized is 3000 bytes.
For more information about the Oracle Database protector, refer to Oracle Database Protector.
MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Unicode Gen2 token.

Table: Supported input data types for MSSQL Database protector with Unicode Gen2 token

Protector	MSSQL
Supported Input Data Types	NVARCHAR

Unicode Gen2 - Tokenizers
- The Unicode Gen2 data element supports SLT_1_3 and SLT_X_1 tokenizers.
- The SLT_1_3 tokenizer supports small alphabet size from 10-160 code points.
- The SLT_X_1 tokenizer supports large alphabet size from 161-100K code points.
For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

12 - Binary

Details about the Binary token type.

The Binary token type can be used to tokenize binary data with Hex codes from 0x00 to 0xFF.

Table: Binary Tokenization Type properties

Tokenization Type Properties	Settings
Name	Binary
Token type and Format	Hex character codes from 0x00 to 0xFF.
Tokenizer	Length Preservation	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3	No	3	4095
Possibility to set Minimum/ maximum length	No
Left/Right settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero.
External IV	Yes
Return of Protected value	No
Token specific properties	Tokenization result is binary.

The following table shows examples of the way in which a value will be tokenized with the Binary token.

Table: Examples of Tokenization for Binary Values

Input Values	Tokenized Values	Comments
Protegrity	0x05C1CF0C310B2D38ACAD4C	Tokenization result is returned as a binary stream.
123	0x19707E	Tokenization of the value with Minimum supported length.

Binary Tokenization Properties for different protectors

Application Protector

It is recommended to use Binary tokenization only with APIs that accept BYTE[] as input and provide BYTE[] as output. If Binary tokens are generated using APIs that accept BYTE[] as input and provide BYTE[] as output, and uniform encoding is maintained across protectors, then the tokens can be used across various protectors.

The following table shows supported input data types for Application protectors with the Binary token.

Table: Supported input data types for Application protectors with Binary token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[]	BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Binary token.

Table: Supported input data types for Big Data protectors with Binary token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]^*3	Not supported	Not supported	BYTE[]^*3	Not supported	BYTE[]^*3	Not supported	Not supported

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – It is recommended to use Binary tokenization only with APIs that accept BYTE[] as input and provide BYTE[] as output. If Binary tokens are generated using APIs that accept input and provide output as BYTE[], these tokens can be used across various protectors. The Binary tokens is assumed to have uniform encoding across protectors.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Binary token.

Table: Supported input data types for Data Warehouse protectors with Binary token

Data Warehouse Protectors	Teradata
Supported input data types	Not Supported

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Binary token.

Table: Supported input data types for Oracle Database protector with Binary token

Protector	Oracle
Supported Input Data Types	Unsupported

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Binary token.

Table: Supported input data types for MSSQL Database protector with Binary token

Protector	MSSQL
Supported Input Data Types	Unsupported

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

13 - Email

Details about the Email token type.

Email token type allows tokenization of an email address. Email tokens keep the domain name and all characters after the “@” sign in the clear. The local part, which is the part before the “@” sign, gets tokenized.

The table lists minimum and maximum length requirements for this token type, which should be applied for the local part, domain part and the entire e-mail.

Table: Email Tokenization Type Properties

Tokenization Type Properties	Settings
Name	Email
Token type and Format	Alphabetic and numeric only. The rest of the characters will be treated as delimiters.
Tokenizer	Length Preservation	Minimum Length			Maximum Length
Tokenizer	Length Preservation	Local	Domain	Entire	Local	Domain	Entire
SLT_1_3 SLT_2_3	No	1	1	3	63	252	256
SLT_1_3 SLT_2_3	No	1	1	3	63	252	256
SLT_1_3 SLT_2_3	Yes	3^*1	1	5	64	252^*2	256
SLT_1_3 SLT_2_3	Yes	3^*1	1	5	64	252^*2	256
Possibility to set minimum/ maximum length	No
Left/Right settings	No
Internal IV	N/A
External IV	Yes
Return of Protected value	Yes
Token specific properties	At least one @ character is required in the input. The right most @ character defines the delimiter between the local and domain parts.

^*1 – If the settings for short data tokenization is set to Yes, then the minimum tokenizable length for the local part of an email is one else it is three.

^*2 – If the settings for short data tokenization is set to Yes, then the maximum length for the domain part of an email is 253 else it is 252.

Email Token Format

An Email token format indicates the tokenization format for email. The email address consists of a local part and a domain, local-part@domain. The local part can be up to 64 characters and the domain name can be up to 254 characters, but the entire email address cannot be longer than 256 characters.

The following table explains email token format input requirements and tokenized output format:

Table: Output Values for Email Token Format

Local Part Input value can consist	Output value can consist
Commonly used: Uppercase and lower case characters through a-z/A-Z. Digits 0-9 Special characters !#$%&'*+-/=?^_`\|}{~ and ASCII: 33, 35-39, 42, 43, 45, 47, 61, 63, 94-96, 123-126 Comments are allowed with parentheses. Used with restrictions: dot character "." when it is not the first or the last and it does not appear more than one time consecutively. Special characters, ASCII: 32, 34, 40, 41, 44, 58, 59, 60, 62, 64, 91-93 are allowed with restrictions. They must only be used when contained between quotation marks. These are the space "32", backslash "92", and quotation mark "34". It must also be preceded by a backslash, for example, "\ \\\". International characters above U+007F are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local parts.	The part before “@” sign will be tokenized. The following will be tokenized: All valid characters will be tokenized by the same rules as alpha-numeric token Comments will be tokenized. The following characters will be considered as delimiters and not tokenized: “.” dot character “()” left and right parenthesis Special characters in local part.
@ Part The “@” character defines the delimiter between the local and domain parts, and will be left in clear.
Domain Part Input value can consist	Output value can consist
Letters and digits Hyphens and dots IP address within square brackets, for example, john.smith@[1.1.1.1]. Non-ASCII domain, internationalized domain parts. Comments are allowed within parentheses	The part after “@” sign will not be tokenized.

Note:
Comments are allowed both in local and domain part of the e-mail token, and comments will be tokenized only if they are in the local part. Here are the examples of comments usage for the e-mail - john.smith@example.com:

john.smith(comment)@example.com
“john(comment).smith@example.com”
john(comment)n.smith@example.com
john.smith@(comment)example.com
john.smith@example.com(comment)

The following table shows examples of the way in which a value will be tokenized with the Email token.

Table: Examples of Tokenization for Email Token Formats

Input Values	Tokenized Values	Comments
Protegrity1234@gmail.com	UNfOxcZ51jWbXMq@gmail.com	All characters before @ symbol are tokenized.
john.smith!@#@$%$%^&@gmail.com	hX3p.yDcwD!@#@$%$%@gmail.com	All symbols except alphabetic are distinguish as delimiters.
email@protegrity@gmail.com	F00CJ@RjDEX9LMDq@gmail.com	The right most @ character defines the delimiter between the local and domain parts.
q@a	asj@a	Min 3 symbols in local part for none length preserving tokens
qdd@a	S0Y@a	Min 5 symbols in local part for length preserving tokens
a@protegrity.com	o@protegrity.com	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=Yes The local part of the email has at least one character to tokenize, which meets the minimum length requirement for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
a@protegrity.com email@protegrity.com	a@protegrity.com F00CJ@protegrity.com	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
a@protegrity.com	Error. Input too short.	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=No, generate an error The local part of the email has one character to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate an error.

Email Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Email token.

Table: Supported input data types for Application protectors with Email token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 – The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 – The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Email token.

Table: Supported input data types for Big Data protectors with Email token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Email token.

Table: Supported input data types for Data Warehouse protectors with Email token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Email token.

Table: Supported input data types for Oracle Database protector with Email token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Email token.

Table: Supported input data types for MSSQL Database protector with Email token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

14 - Printable

Details about the Printable token type.

Deprecated

Starting from v10.0.x, the Printable token type is deprecated.
It is recommended to use the Unicode Gen2 token type instead of the Printable token type.

The Printable token type tokenizes ASCII printable characters from the ISO 8859-15 alphabet, which include letters, digits, punctuation marks, and miscellaneous symbols.

Table: Printable Tokenization Type properties

Tokenization Type Properties	Settings
Name	Printable
Token type and Format	ASCII printable characters, which include letters, digits, punctuation marks, and miscellaneous symbols. Hex character codes from 0x20 to 0x7E and from 0xA0 to 0xFF. Refer to ASCII Character Codes for the list of ASCII characters supported by Printable token.
Tokenizer^12	Length Preservation	Allow Short Data		Minimum Length		Maximum Length
SLT_1_3	Yes	Yes		1		4096
		No, return input as it is		3
		No, generate error		3
	No	NA	1		4091
Possibility to set Minimum/ maximum length	No
Left settings	Yes
Internal IV	Yes, if Left/Right settings are non-zero
External IV	Yes
Return of Protected value	Yes
Token specific properties	Token tables are large in size, approximately 27MB. Refer to SLT Tokenizer Characteristics for the exact numbers.

^*1 – The character column “CHAR” to protect is configured to remove trailing spaces before the tokenization. This means that the space character can be lost in translation for Printable tokens. To avoid this consider using Lower ASCII token instead of Printable for CHAR columns and input data having spaces.

^*2 – Printable tokenization is not supported on databases where the character set is UTF.

The following table shows examples of the way in which a value will be tokenized with the Printable token.

Table: Examples of Tokenization for Printable Values

Input Values	Tokenized Values	Comments
La Scala 05698	F\|ZpÙç\|Ôä%s^¦4	All characters in the input value, including spaces, are tokenized.
Ford Mondeo CA-0256TY M34 567 K-45	§)%ß#)ðYjt{Â¬ÓÊEµV²ù²	All characters in the input value, including spaces, are tokenized.
qw	rD	Printable, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=Yes The minimum length meets the requirement for the SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
qw	Error. Input too short.	Printable, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, generate an error The input has two characters to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate an error.
qw qwa	qw rDZ	Printable, SLT_1_3, Left=0, Right=0, Length Preservation=Yes, Allow Short Data=No, return input as it is. If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.

Printable Tokenization Properties for different protectors

Application Protector

Printable tokenization is recommended for APIs that accept BYTE [] as input and provide BYTE [] as output. If uniform encoding is maintained across protectors, tokens generated by these APIs can be used across various protectors.

To ensure accurate tokenization results, user must use ISO 8859-15 character encoding when converting String data to Byte. This input should then be passed to Byte APIs.

Note: If Printable tokens are generated using APIs or UDFs that accept STRING or VARCHAR as input, then the protected values can only be unprotected using the protector with which it was protected. If you are unprotecting the protected data using any other protector, then you could get inconsistent results.

The following table shows supported input data types for Application protectors with the Printable token.

Table: Supported input data types for Application protectors with Printable token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protector only supports bytes converted from the string data type. If any other data type is directly converted to bytes and passed as input to the Application Protectors APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Printable token.

Table: Supported input data types for Big Data protectors with Printable token

Big Data Protectors	MapReduce^4^5	Hive	Pig	HBase^4^5	Impala^2^3	Spark^4^5	Spark SQL	Trino
Supported input data types^1^6	BYTE[]	Not supported	Not supported	BYTE[]	STRING	BYTE[]^*5	Not supported	VARCHAR

^*1 – If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 – Ensure that you use the Horizontal tab “\t” as the field or column delimiter when loading data that is tokenized using Printable tokens for Impala.

^*3 – Though the tokenization results for Impala may not be formatted and displayed accurately, they will be unprotected to the original values, using the respective protector.

^*4 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*5 – It is recommended to use Printable tokenization with APIs that accepts BYTE[] as input and provides BYTE[] as output. If uniform encoding is maintained across protectors, Printable tokens generated by such APIs can be used across various protectors. To ensure accurate formatting and display of tokenization results, clients should use ISO 8859-15 character encoding. Before passing input to Byte APIs, clients must convert String data type to Byte and apply ISO 8859-15 character encoding.

^*6 – Printable tokens are generated using APIs or UDFs. These APIs or UDFs accept STRING or VARCHAR as input. Then, the protected values can only be unprotected using the protector with which it was protected. If you are unprotecting the protected data using any other protector, then you could get inconsistent results.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

Printable tokens are generated using APIs or UDFs. These APIs or UDFs accept STRING or VARCHAR as input. Then, the protected values can only be unprotected using the protector with which it was protected. If you are unprotecting the protected data using any other protector, then you could get inconsistent results.

Important: Tokenizing XML or JSON data with Printable tokenization will not return valid XML or JSON format output.

JSON and XML UDFs are supported for the Teradata Data Warehouse Protector.

The following table shows the supported input data types for the Teradata protector with the Printable token.

Table: Supported input data types for Data Warehouse protectors with Printable token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Printable token.

Table: Supported input data types for Oracle Database protector with Printable token

Protector	Oracle
Supported Input Data Types	VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Printable token.

Table: Supported input data types for MSSQL Database protector with Printable token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

Note: If Printable tokens are generated using APIs or UDFs that accept STRING or VARCHAR as input, then the protected values can only be unprotected using the protector with which it was protected. If you are unprotecting the protected data using any other protector, then you could get inconsistent results.

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

15 - Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY)

Details about the Date (YYYY-MM-DD, DD/MM/YYYY, MM.DD.YYYY) token type.

Deprecated

Starting from v10.0.x, the Date YYYY-MM-DD, Date DD/MM/YYYY, and Date MM.DD.YYYY tokenization types are deprecated.
It is recommended to use the Datetime (YYYY-MM-DD HH:MM:SS MMM) token type instead of the Date YYYY-MM-DD, Date DD/MM/YYYY, and Date MM.DD.YYYY token types.

The Date token type supports date formats corresponding to the big endian, little endian, and middle endian forms. It protects dates in one of the following formats:

YYYY<delim>MM<delim>DD
DD<delim>MM<delim>YYYY
MM<delim>DD<delim>YYYY

Where <delim> is one of the allowed separators: dot “.”, slash “/”, or dash “-”.

Table: Date Tokenization Type properties

Tokenization Type Properties	Settings
Name	Date
Token type and Format	Date in big endian form, starting with the year (YYYY-MM-DD). Date in little endian form, starting with the day (DD/MM/YYYY). Date in middle endian form, starting with the month (MM.DD.YYYY). The following separators are supported: dot ".", slash "/", or dash "-".
Tokenizer	Length Preservation	Minimum Length	Maximum Length
SLT_1_3 SLT_2_3 SLT_1_6 SLT_2_6	Yes	10	10
Possibility to set Minimum/ maximum length	No
Left/Right settings	No
Internal IV	No
External IV	No
Return of Protected value	Yes
Token specific properties	All separators, such as dot ".", slash "/", or dash "-" are allowed.
Supported range of input dates	From “0600-01-01” to “3337-11-27”
Non-supported range of Gregorian cutover dates	From "1582-10-05" to "1582-10-14"

The following table shows examples of the way in which a value will be tokenized with the Date token.

Table: Examples for Tokenization of Date

Input Values	Tokenized Values	Comments
2012-02-29 2012/02/29 2012.02.29	2150-02-20 2150/02/20 2150.02.20	Date (YYYY-MM-DD) token is used. All three separators are successfully accepted. They are treated as delimiters not impacting tokenized value.
31/01/0600	08/05/2215	Date (DD/MM/YYYY) token is used. Date in the past is tokenized.
10.30.3337	09.05.2042	Date (MM.DD.YYYY) token is used. Date in the future is tokenized.
2012:08:24 1975-01-32	Token is not generated due to invalid input value. Error is returned.	Date (YYYY-MM-DD) token is used. Input values with non-supported separators or with invalid dates produce error.

Date Tokenization for Cutover Dates of the Proleptic Gregorian Calendar

The data systems, such as, Oracle or Java-based systems, do not accept the cutover dates of the Proleptic Gregorian Calendar. The cutover dates of the Proleptic Gregorian Calendar fall in the interval 1582-10-05 to 1582-10-14. These dates are converted to 1582-10-15. When using Oracle, conversion occurs by adding ten days to the source date. Due to this conversion, data loss occurs as the system is not capable to return the actual date value after the de-tokenization.

The following points are applicable for the tokenization and de-tokenization of the cutover dates of the Proleptic Gregorian Calendar:

The tokenization of the date values in the cutover date range of the Proleptic Gregorian Calendar results in an ‘Invalid Input’ error.
During tokenization, an internal validation is performed to check whether the value is tokenized to the cutover date. If it is a cutover date, then the Year part (1582) of the tokenized value is converted to 3338 and then returned. During de-tokenization, an internal check is performed to validate whether the Year is 3338. If the Year is 3338, then it is internally converted to 1582.

Note:
The tokenization accepts the date range 0600-01-01 to 3337-11-27 excluding the cutover date range.
The de-tokenization accepts the date ranges 0600-01-01 to 3337-11-27 and 3338-10-05 to 3338-10-14.

Consider a scenario where you are migrating the protected data from Protector 1 to Protector 2. The Protector 1 includes the Date tokenizer update to process the cutover dates of the Proleptic Gregorian Calendar as input. The Protector 2 does not include this update. In such a scenario, an “Invalid Date Format” error occurs in Protector 2, when you try to unprotect the protected data as it fails to accept the input year 3338. The following steps must be performed to mitigate this issue:

Unprotect the protected data from Protector 1.
Migrate the unprotected data to Protector 2.
Protect the data from Protector 2.

Date Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Date token.

Table: Supported input data types for Application protectors with Date token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	DATE STRING CHAR[] BYTE[]	DATE BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Date token.

Table: Supported input data types for Big Data protectors with Date token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING DATE^*3	CHARARRAY	BYTE[]	STRING DATE^*3	BYTE[] STRING	STRING DATE^*3	DATE^*3

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – In the Big Data Protector, the date format supported for Hive, Impala, Spark SQL, and Trino is YYYY-MM-DD only.

Date input values are not fully validated to ensure they represent valid dates. For instance, entering a day value greater than 31 or a month value greater than 12 will result in an error. However, the date 2011-02-30 does not cause an error but is converted to 2011-03-02, which is not the intended date.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Date token.

Table: Supported input data types for Data Warehouse protectors with Date token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Date token.

Table: Supported input data types for Oracle Database protector with Date token

Protector	Oracle
Supported Input Data Types	DATE VARCHAR2 CHAR

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Date token.

Table: Supported input data types for MSSQL Database protector with Date token

Protector	MSSQL
Supported Input Data Types	VARCHAR CHAR

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

16 - Unicode

Details about the Unicode token type.

Deprecated

Starting from v10.0.x, the Unicode token type is deprecated.
It is recommended to use the Unicode Gen2 token type instead of the Unicode token type.

The Unicode token type can be used to tokenize multi-byte character strings. The input is treated as a byte stream, hence there are no delimiters. There are also no character conversions or code point validation done on the input. The token value will be alpha-numeric.

The encoding and unicode character set of the input data will affect the protected data length. For instance, the respective lengths for UTF-8 and UTF-16, in bytes, is described in the following table.

Table: Lengths for UTF-8 and UTF-16

Input Values	UTF-8	UTF-16
導字社導字會	18 bytes	12 bytes
Protegrity	10 bytes	20 bytes

Table: Unicode Tokenization Type properties

Tokenization Type Properties	Settings
Name	Unicode
Token type and Format	Application protectors support UTF-8, UTF-16LE, and UTF-16BE encoding. Hex character codes from 0x00 to 0xFF. For the list of supported characters, refer to ASCII Character Codes.
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length^*2
SLT_1_3^1 SLT_2_3^1	No	Yes	1 byte	4096
		No, return input as it is	3 bytes
		No, generate error	3 bytes
Possibility to set Minimum/ maximum length	No
Left/Right settings	No
Internal IV	No
External IV	Yes
Return of Protected value	Yes
Token specific properties	Tokenization result is Alpha-Numeric.

^*1 - If the input and output types of the API are BYTE[], then the customer application should convert the input to and output from the byte array, before calling the API.

^*2 - The maximum input length to safely tokenize and detokenize the data is 4096 bytes, which is irrespective of the byte representation.

The following table shows examples of the way in which a value will be tokenized with the Unicode token.

Table: Examples of Tokenization for Unicode Values

Input Value	Tokenized Value	Comments
Протегріті	WurIeXLFZPApXQorkFCKl3hpRaGR28K	Input value contains Cyrillic characters. Tokenization result is Alpha-Numeric.
安全	xM2EcAQ0LVtQJ	Input value contains characters in Simplified Chinese. Tokenization result is Alpha-Numeric.
Protegrity	RsbQU8KdcQzHJ1	Algorithm is non-length preserving. Tokenized value is longer than initial one.
a	V2wU	Unicode, Allow Short Data=Yes Algorithm is non-length preserving. Tokenized value is longer than initial one.
a9c	A0767Vo

Unicode Tokenization Properties for different protectors

Unicode tokenization is supported only by Application Protectors, Big Data Protector and Data Warehouse Protector.

Application Protector

The following table shows supported input data types for Application protectors with the Unicode token.

Table: Supported input data types for Application protectors with Unicode token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[] CHAR[] STRING	BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The minimum and maximum lengths supported for the Big Data Protector are as described by the following points:

MapReduce: The maximum limit that can be safely tokenized and detokenized back is 4096 bytes. The user controls the encoding, as required.
Spark: The maximum limit that can be safely tokenized and detokenized back is 4096 bytes. The user controls the encoding, as required.
Hive: The ptyProtectUnicode and ptyUnprotectUnicode UDFs convert data to UTF-16LE encoding internally. These encoding has a minimum requirement of four bytes of data in UTF-16LE encoding. Additionally, it has a maximum limit of 4096 bytes in UTF-16LE encoding for safely tokenizing and detokenizing the data. The pty_ProtectStr and pty_UnprotectStr UDFs convert data to UTF-8 encoding internally. This encoding has a minimum requirement of three bytes for data in UTF-8 encoding. Additionally, it has a maximum limit of 4096 bytes for safely tokenizing and detokenizing the data.
Impala: The pty_UnicodeStringIns and pty_UnicodeStringSel UDFs convert data to UTF-16LE encoding internally. These encoding has a minimum requirement of four bytes of data in UTF-16LE encoding. Additionally, it has a maximum limit of 4096 bytes in UTF-16LE encoding for safely tokenizing and detokenizing the data. The pty_StringIns and pty_StringSel UDFs convert data to UTF-8 encoding internally. This encoding has a minimum requirement of three bytes for data in UTF-8 encoding. Additionally, it has a maximum limit of 4096 bytes for safely tokenizing and detokenizing the data.

The following table shows supported input data types for Big Data protectors with the Unicode token.

Table: Supported input data types for Big Data protectors with Unicode token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING	Not supported	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

If short data tokenization is not enabled, the minimum length for Unicode tokenization type is 3 bytes. The input value in Teradata Unicode UDF is encoded using UTF16 due to which internally the data length is multiplied by 2 bytes. Hence, the Teradata Unicode UDF is able to tokenize a data length that is less than the minimum supported length of 3 bytes.

The External IV is not supported in Data Warehouse Protector.

The following table shows the supported input data types for the Teradata protector with the Unicode token.

Table: Supported input data types for Data Warehouse protectors with Unicode token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR UNICODE

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Unicode token.

Table: Supported input data types for Oracle Database protector with Unicode token

Protector	Oracle
Supported Input Data Types	VARCHAR2

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Unicode token.

Table: Supported input data types for MSSQL Database protector with Unicode token

Protector	MSSQL
Supported Input Data Types	NVARCHAR

Note:
For the MSSQL database protector, if Unicode UDFs are provided with an input data exceeding 4000 characters, then SQL Server internally processes only the first 4000 characters, truncating any additional characters.
Cross-product data migration for Unicode token type is compatible between products that use the same encoding technique. For example, the Teradata database cross product data migration for Unicode token type is compatible with the MSSQL database protector, because, both the protectors use the UTF-16 encoding technique. However, it is not compatible with the Oracle database protector, because, it uses the UTF-8 encoding.

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.

17 - Unicode Base64

Details about the Unicode Base64 token type.

Deprecated

Starting from v10.0.x, the Unicode Base64 token type is deprecated.
It is recommended to use the Unicode Gen2 token type instead of the Unicode Base64 token type.

The Unicode Base64 token type can be used to tokenize multi-byte character strings. The input is treated as a byte stream, hence there are no delimiters. Any character conversions or code point validation are not performed on the input. This token element uses Base64 encoding. This encoding results in better performance compared to Unicode token element. It includes three additional characters, namely +, /, and = along with alpha numeric characters. The token value generated includes alpha numeric, +, /, and =.

The encoding and unicode character set of the input data will affect the protected data length. For instance, the respective lengths for UTF-8 and UTF-16, in bytes, is described in the following table.

Table: Lengths for UTF-8 and UTF-16

Input Values	UTF-8	UTF-16
導字社導字會	18 bytes	12 bytes
Protegrity	10 bytes	20 bytes

Table: Unicode Base64 Tokenization Type properties

Tokenization Type Properties	Settings
Name	Unicode Base64
Token type and Format	Application protectors support UTF-8, UTF-16LE, and UTF-16BE encoding. Hex character codes from 0x00 to 0xFF. For the list of supported characters, refer to ASCII Character Codes.
Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length^*1
SLT_1_3 SLT_2_3	No	Yes	1 byte	4096
		No, return input as it is	3 bytes
		No, generate error	3 bytes
Possibility to set Minimum/Maximum length	No
Left/Right settings	No
Internal IV	No
External IV	Yes
Return of Protected value	Yes
Token specific properties	Tokenization result is Alpha-Numeric, "+", "/", and "=".

^*1 - The maximum input length to safely tokenize and detokenize the data is 4096 bytes, which is irrespective of the byte representation.

The following table shows examples of the way in which a value will be tokenized with the Unicode Base64 token.

Table: Examples of Tokenization for Unicode Base64 Values

Input Values	Tokenized Values	Comments
захист даних	B/ftgx=VysiXmq0t+O+I8v	Input value contains Cyrillic characters. Tokenization result include alpha numeric characters, such as =, /, and +.
Protegrity	9NHI=znyLfgRiRvD	Algorithm is non-length preserving. Tokenized value is longer than initial one.
aÈ	=+bg	Unicode Base64 token element Algorithm is non-length preserving. Tokenized value is longer than initial one.
P+	+BIN	Unicode Base64 token element, Allow Short Data=Yes Algorithm is non-length preserving. Tokenized value is longer than initial one.

Unicode Base64 Tokenization Properties for different protectors

The Unicode Base64 tokenization is supported only by Application Protectors, Big Data Protector, Data Warehouse Protector, and Data Security Gateway.

Application Protector

The following table shows supported input data types for Application protectors with the Unicode Base64 token.

Table: Supported input data types for Application protectors with Unicode Base64 token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	BYTE[] CHAR[] STRING	BYTES STRING

^*1 - The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 - The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The minimum and maximum lengths supported for the Big Data Protector are as described by the following points:

MapReduce: The maximum limit that can be safely tokenized and detokenized back is 4096 bytes. The user controls the encoding, as required.
Spark: The maximum limit that can be safely tokenized and detokenized back is 4096 bytes. The user controls the encoding, as required.
Hive: The ptyProtectUnicode and ptyUnprotectUnicode UDFs convert data to UTF-16LE encoding internally. These encoding has a minimum requirement of four bytes of data in UTF-16LE encoding. Additionally, it has a maximum limit of 4096 bytes in UTF-16LE encoding for safely tokenizing and detokenizing the data.
The pty_ProtectStr and pty_UnprotectStr UDFs convert data to UTF-8 encoding internally. This encoding has a minimum requirement of three bytes for data in UTF-8 encoding. Additionally, it has a maximum limit of 4096 bytes for safely tokenizing and detokenizing the data.
Impala: The pty_UnicodeStringIns and pty_UnicodeStringSel UDFs convert data to UTF-16LE encoding internally. These encoding has a minimum requirement of four bytes of data in UTF-16LE encoding. Additionally, it has a maximum limit of 4096 bytes in UTF-16LE encoding for safely tokenizing and detokenizing the data.
The pty_StringIns and pty_StringSel UDFs convert data to UTF-8 encoding internally. This encoding has a minimum requirement of three bytes for data in UTF-8 encoding. Additionally, it has a maximum limit of 4096 bytes for safely tokenizing and detokenizing the data.

The following table shows supported input data types for Big Data protectors with the Unicode Base64 token.

Table: Supported input data types for Big Data protectors with Unicode Base64 token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	STRING	Not supported	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The External IV is not supported in Data Warehouse Protector.

The following table shows the supported input data types for the Teradata protector with the Unicode Base64 token.

Table: Supported input data types for Data Warehouse protectors with Unicode Base64 token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR UNICODE

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

Oracle Database Protector

The following table shows supported input data types for Oracle Database protector with the Unicode Base64 token.

Table: Supported input data types for Oracle Database protectors with Unicode Base64 token

Protector	Oracle
Supported Input Data Types	VARCHAR2 and NVARCHAR2

The maximum input lengths supported for the Oracle database protector are as described by the following points:

Base 64 – Data type : VARCHAR2: The maximum limit that can be safely tokenized and detokenized back is 3000 bytes.

For more information about the Oracle Database protector, refer to Oracle Database Protector.

MSSQL Database Protector

The following table shows supported input data types for MSSQL Database protector with the Unicode Base64 token.

Table: Supported input data types for MSSQL Database protectors with Unicode Base64 token

Protector	MSSQL
Supported Input Data Types	NVARCHAR

Note: Cross-product data migration for Unicode token type is compatible between products that use the same encoding technique. For example, the Teradata database cross product data migration for Unicode token type is compatible with the MSSQL database protector, because, both the protectors use the UTF-16 encoding technique. However, it is not compatible with the Oracle database protector, because, it uses the UTF-8 encoding.

For more information about the MSSQL Database protector, refer to MSSQL Database Protector.