Calculating Token Length
The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.
In Protegrity tokenization only the Decimal token type allows for defining the Minimum and Maximum length of the token element when created. Some token types, such as Datetime, have a fixed length. For the remainder, Minimum and Maximum length depends on token type, tokenizer, length preservation, and short token setting.
The following table illustrates length settings by token type.
Table: Minimum and Maximum Input Length for Token Types
Token Type | Tokenizer | Length Preservation | Allow Short Data | Minimum Length | Maximum Length |
|---|---|---|---|---|---|
Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 3933 | ||
SLT_1_6 SLT_2_6 | Yes | Yes | 1 | 4096 | |
No, return input as it is | 6 | ||||
No, generate error | |||||
No | NA | 1 | 3933 | ||
Integer | SLT_1_3 | Yes | NA | 2 | 8 |
Credit Card | SLT_1_3 SLT_2_3 | Yes | NA | 3 | 4096 |
SLT_1_6 SLT_2_6 | Yes | NA | 6 | 4096 | |
Alpha | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4076 | ||
Upper-case Alpha | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4049 | ||
Alpha-Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4080 | ||
Upper-Case Alpha-Numeric | SLT_1_3 SLT_2_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4064 | ||
Lower ASCII | SLT_1_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4086 | ||
Datetime | SLT_DATETIME | Yes | NA | 10 | 29 |
Decimal | SLT_6_DECIMAL | No | NA | 1 | 36 |
Unicode Gen2 | SLT_1_3 SLT_X_1 | Yes | Yes | 1 Code Point | 4096 Code Points |
| No, return input as it is | 3 Code Points | ||||
| No, generate error | |||||
Binary | SLT_1_3 SLT_2_3 | No | NA | 3 | 4095 |
SLT_1_3 SLT_2_3 | Yes | Yes | 3 | 256 | |
No, return input as it is | 5 | ||||
No, generate error | |||||
No | NA | 3 | 256 |
- The minimum and maximum length validation on input data is done on the characters to tokenize.
- The From Left and From right clear characters are not counted. Additionally, characters outside of the alphabet for the selected token type are also not counted.
- The NULL values are accepted but not tokenized.
Table: Minimum and Maximum Input Length for Deprecated Token Types
Token Type | Tokenizer | Length Preservation | Allow Short Data | Minimum Length | Maximum Length |
|---|---|---|---|---|---|
Printable | SLT_1_3 | Yes | Yes | 1 | 4096 |
No, return input as it is | 3 | ||||
No, generate error | |||||
No | NA | 1 | 4091 | ||
Date YYYY-MM-DD Date DD/MM/YYYY Date MM.DD.YYYY | SLT_1_3 SLT_2_3 SLT_1_6 SLT_2_6 | Yes | NA | 10 | 10 |
Unicode | SLT_1_3 SLT_2_3 | No | Yes | 1 byte | 4096 bytes |
| No, return input as it is | 3 bytes | ||||
| No, generate error | |||||
Unicode Base64 | SLT_1_3 | No | Yes | 1 byte | 4096 bytes |
The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.
Was this page helpful?