Short Data Tokenization
When using tokenizers, such as, SLT_1_3, SLT_2_3, and SLT_X_1, the minimum input limit for tokenizable characters or bytes is three. When using tokenizers, such as, SLT_1_6 and SLT_2_6, the minimum input limit for tokenizable characters or bytes is six.
The possible flag values for short data tokenization are described in the following table.
Table: Short tokens flag values
| Short Token Flag Value | Action |
|---|---|
| No, generate error | Do not tokenize the short input but generate an error code and an audit log stating that the data is too short. |
| Yes | Tokenize the data if the input is short. |
| No, return input as it is | Do not tokenize the short input but return the input as it is. |
The following tokens support short data tokenization:
- Numeric (0-9)
- Alpha (a-z, A-Z)
- Upper-case Alpha (A-Z)
- Alpha-Numeric (0-9, a-z, A-Z)
- Upper-Case Alpha-Numeric (0-9, A-Z)
- Lower ASCII
- Unicode Gen2
The following deprecated tokens support short data tokenization:
Important: Short input data tokenization can be at risk as a user can easily guess the lookup table and the original data by tokenizing some input data. Consider carefully before using the short data tokenization. If possible, short data input must be avoided.
For more information about the maximum length setting for non-length-preserving token elements, refer to Minimum and Maximum Input Length by Token Types.
Feedback
Was this page helpful?