Calculating Token Length

The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.

For a Numeric token type, non-numeric values are considered as delimiters. The unsupported characters will be treated as delimiters and left un-tokenized. This occurs when the input value does not contain tokenizable characters with the selected token type.

The number of characters to tokenize is calculated as described on the following image:

Number of characters to tokenize

If the input value does not contain characters to tokenize, then it is considered a zero-length token. The tokenization of a zero-length input value will not produce an error during the tokenization, and input value will be returned as output.

Input value returned as a result of tokenization with zero-length token

If the input value has at least one character and short data tokenization is enabled, then the source data can be tokenized. If short data tokenization is not enabled, then the source data will be returned as it is. Alternatively, an appropriate error will appear due to tokenization.

For more information on short data tokenization, refer to Short Data Tokenization.

Output returned when the input is too short

If the input value contains more characters than the maximum for tokenization, then the value of tokenization is considered too long. The tokenization process provides an appropriate error message.

Error returned when the input is too long

If the input value has a sufficient number of characters, the tokenization process is successful. This occurs when the character count falls between the minimum and maximum settings.

Tokenized value returned when the input is enough for tokenization

Table: Token Length Examples

Token PropertiesInput ValueOutput ValueComments

Numeric Token

Left/Right undefined

Allow Short Data=Yes
ab1cdab6cdNon-numeric values are considered as delimiters. Input is tokenized as short data is enabled and minimum length is 1 character.

Numeric Token

Left=0
Right=0

Allow Short Data=No, generate error
ab1cdError. Input too short.Non-numeric values are considered as delimiters. Input is short since short data is not enabled and the minimum number of characters to tokenize for this token type is 3 characters.

Numeric Token

Left=0
Right=0

Allow Short Data= No, return input as it is
1212Input is returned as is as per the settings for short data.

Numeric Token

Left=2
Right=2
48ghdg8348ghdg83The input value is left unchanged during tokenization. This is because it is an empty value for tokenization. In tokenization, both left and right settings remove all numeric characters during tokenization.

Numeric Token

Left=2
Right=2
45684568The input value is left unchanged by the tokenization since it is an empty value for tokenization.

Numeric Token

Left=0
Right=0
ab123cdab857cdInput value has enough characters for tokenization, only supported by numeric token type values are tokenized.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
34546534546cInput is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as yes, the sixth digit is tokenized.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
345465errorInput is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as no, generate error and the length of data to be tokenized is less than 3, an Input too short error is generated.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
345465345465Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as No, return input as it is and the length of data to be tokenized is less than 3, the data is passed as is.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
3454634546Input is evaluated first for left and right settings. Since left settings are set to 5 and the input is five digits, no data exists to be tokenized. As no data exists, it is considered as a zero length token and the input is passed as is.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
3454634546

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
3454634546

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
3454errorInput is evaluated first for left and right settings. Since left settings are set to 5 and the input is four digits, the left and right settings condition is not met. This results in an Input too short error.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
3454error

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
3454error

Unicode Token (Cyrillic alphabet)

Left= 0
Right=0

Allow Short Data=Yes
abдаcdabшcdNon-Cyrillic values are considered as delimiters. Input data is tokenized as as short data is enabled.

Unicode Token (Cyrillic alphabet)

Left= 0
Right=0

Allow Short Data=No
abдаcdError. Input too ShortNon-Cyrillic values are considered as delimiters. Input is too short since the word да (Cyrillic meaning yes - pronounced da) is only two codepoints. The minimum number of codepoints for this token type is 3 characters.

Last modified : August 21, 2025