Tokenization Properties on

Data Type and Alphabet

Mon, 01 Jan 0001 00:00:00 +0000

An alphabet contains all characters considered for tokenization, it is derived from the tokenization type. Characters outside the alphabet are considered delimiters.

Note: This is applicable only for Unicode Gen2 token.

Refer to Tokenization Types for the full list of supported token types.

Static Lookup Table (SLT) Tokenizers

Mon, 01 Jan 0001 00:00:00 +0000

A static lookup table (SLT) contains a pre-generated list of all possible values from a given set of characters. An alphabetic lookup table for instance might contain all values from “Aa” to “Zz”. All entries are then shuffled so that they are in random order.

SLT tokenizer uses multiple SLTs to generate tokens. This is done by first dividing the input value into smaller pieces, called token blocks, which correspond to entries in the lookup tables. The token blocks are then substituted with values from the SLTs and chained together to form the final token value. This means that the token is a result of multiple lookups in multiple SLTs.

From Left and From Right Settings

Mon, 01 Jan 0001 00:00:00 +0000

This property indicates the number of characters from left and right that will remain in the clear and hence be excluded from tokenization. Not all token types will allow the end-user to specify these values. The From Left and From Right settings can be configured in the Tokenize Options during the Data Element creation on the ESA Web UI.

For example;
Input Value: 5511309239934975
Credit Card Token: Left=0 Right=4
Output Value: 8278278929904975

Internal Initialization Vector (IV)

Mon, 01 Jan 0001 00:00:00 +0000

Internal IV is automatically applied to the input value when the token element’s left and right properties are non-zero, designating some characters to remain in the clear. An Internal IV provides an additional security during the tokenization process.

Data to tokenize can be logically divided into three components: left, middle, and right. If an IV is used, then the left and right components are concatenated to form the IV. This IV is then added to the middle component before the value is tokenized.

Length Preserving

Mon, 01 Jan 0001 00:00:00 +0000

With the Preserve Length flag enabled, the length of the input data and protected token value is the same.

For data elements with the Preserve Length flag available, you have an option to generate token values that are of the same length as the input data.

Note: The Unicode Gen2 token element is Code Point length preserving when this option is enabled. The length in bytes can vary depending on the alphabet selected during data element creation.

Short Data Tokenization

Mon, 01 Jan 0001 00:00:00 +0000

When using tokenizers, such as, SLT_1_3, SLT_2_3, and SLT_X_1, the minimum input limit for tokenizable characters or bytes is three. When using tokenizers, such as, SLT_1_6 and SLT_2_6, the minimum input limit for tokenizable characters or bytes is six.

The possible flag values for short data tokenization are described in the following table.

Table: Short tokens flag values

Short Token Flag Value	Action
No, generate error	Do not tokenize the short input but generate an error code and an audit log stating that the data is too short.
Yes	Tokenize the data if the input is short.
No, return input as it is	Do not tokenize the short input but return the input as it is.

The following tokens support short data tokenization:

Truncating Whitespaces

Mon, 01 Jan 0001 00:00:00 +0000

With fixed length fields or columns, input data may be shorter than the length of the field. When this happens, data may be appended with either, or both, trailing and leading whitespace. In those situations, the whitespace is considered during Tokenization. It will affect the tokenization results.

For instance, consider a scenario where the name “Hultgren Caylor” is stored in a Hive Char(30) column.

As the length of the data is less than 30 characters, trailing whitespaces are appended to it. In this case, assume that we need to protect this column with a data element that preserves the first and last character (L=1, R=1). Now with this setting, the expectation is to preserve character H at the start and the character r at the end, in the protected value output. However, the actual data has trailing whitespaces. This results in the output containing the character “H” at the start and a whitespace character " " at the end. The unnecessary trailing whitespaces cause the final protected output to generate a different token.