Minimum and Maximum Input Length
The minimum and maximum input lengths are the boundaries that are used in input validation.
In Protegrity tokenization only the Decimal token type allows for defining the Minimum and Maximum length of the token element when created. Some token types, such as Datetime, have a fixed length. For the remainder, Minimum and Maximum length depends on token type, tokenizer, length preservation, and short token setting.
The following table illustrates length settings by token type.
Table: Minimum and Maximum Input Length for Token Types
Token Type | Tokenizer | Length Preservation | Allow Short Data | Minimum Length | Maximum Length |
|---|
Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 3933 |
SLT_1_6 SLT_2_6 | Yes | Yes | 1 | 4096 |
No, return input as it is | 6 |
No, generate error |
No | NA | 1 | 3933 |
Integer | SLT_1_3 | Yes | NA | 2 | 8 |
Credit Card | SLT_1_3 SLT_2_3 | Yes | NA | 3 | 4096 |
SLT_1_6 SLT_2_6 | Yes | NA | 6 | 4096 |
Alpha | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 4076 |
Upper-case Alpha | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 4049 |
Alpha-Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 4080 |
Upper-Case Alpha-Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 4064 |
Lower ASCII | SLT_1_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 |
No, generate error |
No | NA | 1 | 4086 |
Datetime | SLT_DATETIME | Yes | NA | 10 | 29 |
Decimal | SLT_6_DECIMAL | No | NA | 1 | 36 |
Unicode Gen2 | SLT_1_3 SLT_X_1 | Yes | Yes | 1 Code Point | 4096 Code Points |
| No, return input as it is | 3 Code Points |
| No, generate error |
Binary | SLT_1_3 SLT_2_3 | No | NA | 3 | 4095 |
Email | SLT_1_3 SLT_2_3 | Yes | Yes | 3 | 256 |
No, return input as it is | 5 |
No, generate error |
No | NA | 3 | 256 |
- The minimum and maximum length validation on input data is done on the characters to tokenize.
- The From Left and From right clear characters are not counted. Additionally, characters outside of the alphabet for the selected token type are also not counted.
- The NULL values are accepted but not tokenized.
Table: Minimum and Maximum Input Length for Deprecated Token Types
1 - Calculating Token Length
The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.
For a Numeric token type, non-numeric values are considered as delimiters. The unsupported characters will be treated as delimiters and left un-tokenized. This occurs when the input value does not contain tokenizable characters with the selected token type.
The number of characters to tokenize is calculated as described on the following image:

If the input value does not contain characters to tokenize, then it is considered a zero-length token. The tokenization of a zero-length input value will not produce an error during the tokenization, and input value will be returned as output.

If the input value has at least one character and short data tokenization is enabled, then the source data can be tokenized. If short data tokenization is not enabled, then the source data will be returned as it is. Alternatively, an appropriate error will appear due to tokenization.
For more information on short data tokenization, refer to Short Data Tokenization.

If the input value contains more characters than the maximum for tokenization, then the value of tokenization is considered too long. The tokenization process provides an appropriate error message.

If the input value has a sufficient number of characters, the tokenization process is successful. This occurs when the character count falls between the minimum and maximum settings.

Table: Token Length Examples
| Token Properties | Input Value | Output Value | Comments |
|---|
Numeric Token Left/Right undefined Allow Short Data=Yes | ab1cd | ab6cd | Non-numeric values are considered as delimiters. Input is tokenized as short data is enabled and minimum length is 1
character. |
Numeric Token Left=0 Right=0 Allow Short Data=No, generate error | ab1cd | Error. Input too short. | Non-numeric values are considered as delimiters. Input is short since short data is not enabled and the minimum number of characters to tokenize for this token type is 3 characters. |
Numeric Token Left=0 Right=0 Allow Short Data= No, return input as it is | 12 | 12 | Input is returned as is as per the settings for short data. |
Numeric Token Left=2Right=2 | 48ghdg83 | 48ghdg83 | The input value is left unchanged during tokenization. This is because it is an empty value for tokenization. In tokenization, both left and right settings remove all numeric characters during tokenization. |
Numeric Token Left=2Right=2 | 4568 | 4568 | The input value is left unchanged by the tokenization since it is an empty value for tokenization. |
Numeric Token Left=0 Right=0 | ab123cd | ab857cd | Input value has enough characters for tokenization, only supported by numeric token type values are tokenized. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=Yes | 345465 | 34546c | Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as yes, the sixth digit is tokenized. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, generate error | 345465 | error | Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as no, generate error and the length of data to be tokenized is less than 3, an Input too short error is generated. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, return input as it is | 345465 | 345465 | Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as No, return input as it is and the length of data to be tokenized is less than 3, the data is passed as is. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=Yes | 34546 | 34546 | Input is evaluated first for left and right settings. Since left settings are set to 5 and the input is five digits, no data exists to be tokenized. As no data exists, it is considered as a zero length token and the input is passed as is. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, generate error | 34546 | 34546 |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, return input as it is | 34546 | 34546 |
Alpha Numeric Token Left=5Right=0 Allow Short Data=Yes | 3454 | error | Input is evaluated first for left and right
settings. Since left settings are set to 5 and the input is four digits, the left and right settings condition is not met. This results in an Input too short error. |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, generate error | 3454 | error |
Alpha Numeric Token Left=5Right=0 Allow Short Data=No, return input as it is | 3454 | error |
Unicode Token (Cyrillic alphabet) Left= 0Right=0 Allow Short Data=Yes | abдаcd | abшcd | Non-Cyrillic values are considered as delimiters. Input data is tokenized as as short data is enabled. |
Unicode Token (Cyrillic alphabet) Left= 0Right=0 Allow Short Data=No | abдаcd | Error. Input too Short | Non-Cyrillic values are considered as delimiters. Input is too short since the word да (Cyrillic meaning yes - pronounced da) is only two codepoints. The minimum number of codepoints for this token type is 3 characters. |