Minimum and Maximum Input Length

The minimum and maximum input lengths are the boundaries that are used in input validation.

1: Calculating Token Length

In Protegrity tokenization only the Decimal token type allows for defining the Minimum and Maximum length of the token element when created. Some token types, such as Datetime, have a fixed length. For the remainder, Minimum and Maximum length depends on token type, tokenizer, length preservation, and short token setting.

The following table illustrates length settings by token type.

Table: Minimum and Maximum Input Length for Token Types

Token Type	Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
Numeric	SLT_1_3 SLT_2_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	3933
	SLT_1_6 SLT_2_6	Yes	Yes	1	4096
			No, return input as it is	6
			No, generate error	6
		No	NA	1	3933
Integer	SLT_1_3	Yes	NA	2	8
Credit Card	SLT_1_3 SLT_2_3	Yes	NA	3	4096
Credit Card	SLT_1_6 SLT_2_6	Yes	NA	6	4096
Alpha	SLT_1_3 SLT_2_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4076
Upper-case Alpha	SLT_1_3 SLT_2_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4049
Alpha-Numeric	SLT_1_3 SLT_2_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4080
Upper-Case Alpha-Numeric	SLT_1_3 SLT_2_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4064
Lower ASCII	SLT_1_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4086
Datetime	SLT_DATETIME	Yes	NA	10	29
Decimal	SLT_6_DECIMAL	No	NA	1	36
Unicode Gen2	SLT_1_3 SLT_X_1	Yes	Yes	1 Code Point	4096 Code Points
			No, return input as it is	3 Code Points
			No, generate error	3 Code Points
Binary	SLT_1_3 SLT_2_3	No	NA	3	4095
Email	SLT_1_3 SLT_2_3	Yes	Yes	3	256
			No, return input as it is	5
			No, generate error	5
		No	NA	3	256

The minimum and maximum length validation on input data is done on the characters to tokenize.
The From Left and From right clear characters are not counted. Additionally, characters outside of the alphabet for the selected token type are also not counted.
The NULL values are accepted but not tokenized.

Table: Minimum and Maximum Input Length for Deprecated Token Types

Token Type	Tokenizer	Length Preservation	Allow Short Data	Minimum Length	Maximum Length
Printable	SLT_1_3	Yes	Yes	1	4096
			No, return input as it is	3
			No, generate error	3
		No	NA	1	4091
Date YYYY-MM-DD Date DD/MM/YYYY Date MM.DD.YYYY	SLT_1_3 SLT_2_3 SLT_1_6 SLT_2_6	Yes	NA	10	10
Unicode	SLT_1_3 SLT_2_3	No	Yes	1 byte	4096 bytes
			No, return input as it is	3 bytes
			No, generate error	3 bytes
Unicode Base64	SLT_1_3	No	Yes	1 byte	4096 bytes

1 - Calculating Token Length

The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.

For a Numeric token type, non-numeric values are considered as delimiters. The unsupported characters will be treated as delimiters and left un-tokenized. This occurs when the input value does not contain tokenizable characters with the selected token type.

The number of characters to tokenize is calculated as described on the following image:

Number of characters to tokenize

If the input value does not contain characters to tokenize, then it is considered a zero-length token. The tokenization of a zero-length input value will not produce an error during the tokenization, and input value will be returned as output.

Input value returned as a result of tokenization with zero-length token

If the input value has at least one character and short data tokenization is enabled, then the source data can be tokenized. If short data tokenization is not enabled, then the source data will be returned as it is. Alternatively, an appropriate error will appear due to tokenization.

For more information on short data tokenization, refer to Short Data Tokenization.

Output returned when the input is too short

If the input value contains more characters than the maximum for tokenization, then the value of tokenization is considered too long. The tokenization process provides an appropriate error message.

Error returned when the input is too long

If the input value has a sufficient number of characters, the tokenization process is successful. This occurs when the character count falls between the minimum and maximum settings.

Tokenized value returned when the input is enough for tokenization

Table: Token Length Examples

Token Properties	Input Value	Output Value	Comments
Numeric Token Left/Right undefined Allow Short Data=Yes	ab1cd	ab6cd	Non-numeric values are considered as delimiters. Input is tokenized as short data is enabled and minimum length is 1 character.
Numeric Token Left=0 Right=0 Allow Short Data=No, generate error	ab1cd	Error. Input too short.	Non-numeric values are considered as delimiters. Input is short since short data is not enabled and the minimum number of characters to tokenize for this token type is 3 characters.
Numeric Token Left=0 Right=0 Allow Short Data= No, return input as it is	12	12	Input is returned as is as per the settings for short data.
Numeric Token Left=2 Right=2	48ghdg83	48ghdg83	The input value is left unchanged during tokenization. This is because it is an empty value for tokenization. In tokenization, both left and right settings remove all numeric characters during tokenization.
Numeric Token Left=2 Right=2	4568	4568	The input value is left unchanged by the tokenization since it is an empty value for tokenization.
Numeric Token Left=0 Right=0	ab123cd	ab857cd	Input value has enough characters for tokenization, only supported by numeric token type values are tokenized.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=Yes	345465	34546c	Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as yes, the sixth digit is tokenized.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, generate error	345465	error	Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as no, generate error and the length of data to be tokenized is less than 3, an Input too short error is generated.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, return input as it is	345465	345465	Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as No, return input as it is and the length of data to be tokenized is less than 3, the data is passed as is.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=Yes	34546	34546	Input is evaluated first for left and right settings. Since left settings are set to 5 and the input is five digits, no data exists to be tokenized. As no data exists, it is considered as a zero length token and the input is passed as is.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, generate error	34546	34546
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, return input as it is	34546	34546
Alpha Numeric Token Left=5 Right=0 Allow Short Data=Yes	3454	error	Input is evaluated first for left and right settings. Since left settings are set to 5 and the input is four digits, the left and right settings condition is not met. This results in an Input too short error.
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, generate error	3454	error
Alpha Numeric Token Left=5 Right=0 Allow Short Data=No, return input as it is	3454	error
Unicode Token (Cyrillic alphabet) Left= 0 Right=0 Allow Short Data=Yes	abдаcd	abшcd	Non-Cyrillic values are considered as delimiters. Input data is tokenized as as short data is enabled.
Unicode Token (Cyrillic alphabet) Left= 0 Right=0 Allow Short Data=No	abдаcd	Error. Input too Short	Non-Cyrillic values are considered as delimiters. Input is too short since the word да (Cyrillic meaning yes - pronounced da) is only two codepoints. The minimum number of codepoints for this token type is 3 characters.