Data Type and Alphabet
The data type specifies the data that should be tokenized, for instance with the characters to expect as input and the output to generate.
Table: Common Tokenization Properties
| Token Property | Description |
|---|---|
| User configured token properties | |
| Name | Unique name identifying the token element. Maximum length is 56 characters. |
| Data Type | Type of data to tokenize. Name of the alphabet, which indicates the specific characters to tokenize. |
| Static Lookup Table (SLT) Tokenizers | Mentions the type of SLT tokenizers (SLT_1_3, SLT_1_6, SLT_2_3, SLT_2_6, SLT_6_DECIMAL, SLT_DATETIME, and SLT_X_1). |
| Preserve Case | Whether the case of the alphabets and position of the alphabets and numbers must be preserved when tokenizing the value. This is applicable when using the Alpha-Numeric (0-9, a-z, A-Z) token type and the SLT_2_3 tokenizer only. |
| Preserve Position | Whether the position of the alphabets and numbers must be preserved when tokenizing the value. This is applicable when using the Alpha-Numeric (0-9, a-z, A-Z) token type and the SLT_2_3 tokenizer only. |
| Preserve Length | Whether tokens will be the same length as the input or not. |
| Allow Short Data Tokenization | Whether short tokens will be enabled or not. We have the following options: “Yes”, “No, generate error”, or “No, return input as it is”. |
| From Left | Number of characters from left to keep in clear in tokenized output. |
| From Right | Number of characters from right to keep in clear in tokenized output. |
| Minimum Input Length | Minimum length of the input data that can be tokenized. |
| Maximum Input Length | Maximum length of the input data that can be tokenized. |
| Alphabet | Name of the alphabet, which is configured to enable specific set of characters to use for tokenization. |
| Automatically calculated token properties | |
| Internal Initialization Vector (IV) | Whether internal initialization vector (IV) will be used or not. |
| Other token properties | |
| External Initialization Vector (IV) | Whether external initialization vector (IV) will be used or not. |
The following table shows what properties can be set for the token types.
Table: Tokenization Properties for Token Types
| Tokenization Data Type | Tokenizer | Preserve length | Preserve Case/ Preserve Position | Allow Short Tokens | From Left, From Right | Minimum/ Maximum length | External IV | Internal IV |
|---|---|---|---|---|---|---|---|---|
| Numeric | SLT_1_3, SLT_2_3, SLT_1_6, SLT_2_6 | √ | X | √ | √ | X | √ | √ |
| Integer | SLT_1_3 | √ | X | X | X | X | X | X |
| Credit Card | SLT_1_3, SLT_2_3, SLT_1_6, SLT_2_6 | √ (always yes) | X | X | √ | X | √ | √ |
| Alpha | SLT_1_3, SLT_2_3 | √ | X | √ | √ | X | √ | √ |
| Upper-case Alpha | SLT_1_3, SLT_2_3 | √ | X | √ | √ | X | √ | √ |
| Alpha-Numeric | SLT_1_3 | √ | X | √ | √ | X | √ | √ |
| SLT_2_3 | √ | √ | √ | √ | X | √ | √ | |
| Upper-Case Alpha-Numeric | SLT_1_3, SLT_2_3 | √ | X | √ | √ | X | √ | √ |
| Lower ASCII | SLT_1_3 | √ | X | √ | √ | X | √ | √ |
| Datetime | SLT_DATETIME | √ (always yes) | X | X | X (Left in clear = 0, Right in clear = 0) | X | X | X |
| Decimal | SLT_6_DECIMAL | X (always no) | X | X | X (Left in clear = 0, Right in clear = 0) | √ | X | X |
| Unicode Gen2 | SLT_1_3, SLT_X_1 | √ | X | √ | √ | X | √ | √ |
| Binary | SLT_1_3, SLT_2_3 | X (always no) | X | X | √ | X | √ | √ |
| SLT_1_3, SLT_2_3 | √ | X | √ | X (Left in clear = 0, Right in clear = 0) | X | √ | X |
- X - means that Property is disabled and cannot be specified.
- √ - means that Property is enabled or can be specified.
The following table shows what properties can be set for the deprecated token types.
Table: Tokenization Properties for deprecated Token Types
| Tokenization Data Type | Tokenizer | Preserve length | Preserve Case/ Preserve Position | Allow Short Tokens | From Left, From Right | Minimum/ Maximum length | External IV | Internal IV |
|---|---|---|---|---|---|---|---|---|
| Printable | SLT_1_3 | √ | X | √ | √ | X | √ | √ |
| Date (YYYY-MM-DD) | SLT_1_3, SLT_2_3, SLT_1_6, SLT_2_6 | √ (always yes) | X | X | X (Left in clear = 0, Right in clear = 0) | X | X | X |
| Date (DD/MM/YYYY) | SLT_1_3, SLT_2_3, SLT_1_6, SLT_2_6 | √ (always yes) | X | X | X (Left in clear = 0, Right in clear = 0) | X | X | X |
| Date (MM.DD.YYYY) | SLT_1_3, SLT_2_3, SLT_1_6, SLT_2_6 | √ (always yes) | X | X | X (Left in clear = 0, Right in clear = 0) | X | X | X |
| Unicode | SLT_1_3, SLT_2_3 | X (always no) | X | √ | X (Left in clear = 0, Right in clear = 0) | X | √ | X |
| Unicode Base64 | SLT_1_3, SLT_2_3 | X (always no) | X | √ | X (Left in clear = 0, Right in clear = 0) | X | √ | X |
- X - means that Property is disabled and cannot be specified.
- √ - means that Property is enabled or can be specified.
The data type specifies the data that should be tokenized, for instance with the characters to expect as input and the output to generate.
SLT tokenizer represents a method that uses multiple SLTs to generate tokens.
The From Left and From Right settings can be configured to specify the number of characters to leave in clear while tokenizing.
An Internal IV is used during the tokenization process to make it more difficult to detect patterns in multiple tokenized values.
The minimum and maximum input lengths are the boundaries that are used in input validation.
The length preserving tokenization property provides an option to generate token values to preserve the length of input data.
Data is considered short when the number of tokenizable characters is below the tokenizer’s limit. The behavior for short input data can be configured, as it generally produces weaker tokens.
If you work with the Alpha-Numeric (0-9, a-z, A-Z) token type and SLT_2_3 tokenizer, you can specify additional tokenization options for case preservation and position preservation.
The External Initialization Vector (EIV) feature offers an additional level of security. It allows for different tokenized results across protectors for the same input data and token element. The tokenized results are based on the External IV setting on each protector.
Truncating Whitespaces ensures that only the actual data is considered during tokenization.
Was this page helpful?