This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Minimum and Maximum Input Length

The minimum and maximum input lengths are the boundaries that are used in input validation.

In Protegrity tokenization only the Decimal token type allows for defining the Minimum and Maximum length of the token element when created. Some token types, such as Datetime, have a fixed length. For the remainder, Minimum and Maximum length depends on token type, tokenizer, length preservation, and short token setting.

The following table illustrates length settings by token type.

Table: Minimum and Maximum Input Length for Token Types


Token Type

Tokenizer

Length Preservation

Allow Short Data

Minimum Length

Maximum Length

Numeric

SLT_1_3

SLT_2_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

3933

SLT_1_6

SLT_2_6

Yes

Yes

1

4096

No, return input as it is

6

No, generate error

No

NA

1

3933

Integer

SLT_1_3

Yes

NA

2

8

Credit Card

SLT_1_3

SLT_2_3

Yes

NA

3

4096

SLT_1_6

SLT_2_6

Yes

NA

6

4096

Alpha

SLT_1_3

SLT_2_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4076

Upper-case Alpha

SLT_1_3

SLT_2_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4049

Alpha-Numeric

SLT_1_3

SLT_2_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4080

Upper-Case Alpha-Numeric

SLT_1_3

SLT_2_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4064

Lower ASCII

SLT_1_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4086

Datetime

SLT_DATETIME

Yes

NA

10

29

Decimal

SLT_6_DECIMAL

No

NA

1

36

Unicode Gen2

SLT_1_3

SLT_X_1

Yes

Yes

1 Code Point

4096 Code Points
No, return input as it is
3 Code Points
No, generate error

Binary

SLT_1_3

SLT_2_3

No

NA

3

4095

Email

SLT_1_3

SLT_2_3

Yes

Yes

3

256

No, return input as it is

5

No, generate error

No

NA

3

256
  • The minimum and maximum length validation on input data is done on the characters to tokenize.
  • The From Left and From right clear characters are not counted. Additionally, characters outside of the alphabet for the selected token type are also not counted.
  • The NULL values are accepted but not tokenized.

Table: Minimum and Maximum Input Length for Deprecated Token Types


Token Type

Tokenizer

Length Preservation

Allow Short Data

Minimum Length

Maximum Length

Printable

SLT_1_3

Yes

Yes

1

4096

No, return input as it is

3

No, generate error

No

NA

1

4091

Date YYYY-MM-DD

Date DD/MM/YYYY

Date MM.DD.YYYY

SLT_1_3

SLT_2_3

SLT_1_6

SLT_2_6

Yes

NA

10

10

Unicode

SLT_1_3

SLT_2_3

No

Yes

1 byte

4096 bytes
No, return input as it is3 bytes
No, generate error

Unicode Base64

SLT_1_3

No

Yes

1 byte

4096 bytes

1 - Calculating Token Length

The Calculating Token Length process calculates the number of tokens and shows how text is divided into tokens.

For a Numeric token type, non-numeric values are considered as delimiters. The unsupported characters will be treated as delimiters and left un-tokenized. This occurs when the input value does not contain tokenizable characters with the selected token type.

The number of characters to tokenize is calculated as described on the following image:

Number of characters to tokenize

If the input value does not contain characters to tokenize, then it is considered a zero-length token. The tokenization of a zero-length input value will not produce an error during the tokenization, and input value will be returned as output.

Input value returned as a result of tokenization with zero-length token

If the input value has at least one character and short data tokenization is enabled, then the source data can be tokenized. If short data tokenization is not enabled, then the source data will be returned as it is. Alternatively, an appropriate error will appear due to tokenization.

For more information on short data tokenization, refer to Short Data Tokenization.

Output returned when the input is too short

If the input value contains more characters than the maximum for tokenization, then the value of tokenization is considered too long. The tokenization process provides an appropriate error message.

Error returned when the input is too long

If the input value has a sufficient number of characters, the tokenization process is successful. This occurs when the character count falls between the minimum and maximum settings.

Tokenized value returned when the input is enough for tokenization

Table: Token Length Examples

Token PropertiesInput ValueOutput ValueComments

Numeric Token

Left/Right undefined

Allow Short Data=Yes
ab1cdab6cdNon-numeric values are considered as delimiters. Input is tokenized as short data is enabled and minimum length is 1 character.

Numeric Token

Left=0
Right=0

Allow Short Data=No, generate error
ab1cdError. Input too short.Non-numeric values are considered as delimiters. Input is short since short data is not enabled and the minimum number of characters to tokenize for this token type is 3 characters.

Numeric Token

Left=0
Right=0

Allow Short Data= No, return input as it is
1212Input is returned as is as per the settings for short data.

Numeric Token

Left=2
Right=2
48ghdg8348ghdg83The input value is left unchanged during tokenization. This is because it is an empty value for tokenization. In tokenization, both left and right settings remove all numeric characters during tokenization.

Numeric Token

Left=2
Right=2
45684568The input value is left unchanged by the tokenization since it is an empty value for tokenization.

Numeric Token

Left=0
Right=0
ab123cdab857cdInput value has enough characters for tokenization, only supported by numeric token type values are tokenized.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
34546534546cInput is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as yes, the sixth digit is tokenized.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
345465errorInput is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as no, generate error and the length of data to be tokenized is less than 3, an Input too short error is generated.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
345465345465Input is evaluated first for left and right settings. Since left settings are set to 5, the first five digits are excluded and the sixth digit can be tokenized. As the Allow Short Data is set as No, return input as it is and the length of data to be tokenized is less than 3, the data is passed as is.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
3454634546Input is evaluated first for left and right settings. Since left settings are set to 5 and the input is five digits, no data exists to be tokenized. As no data exists, it is considered as a zero length token and the input is passed as is.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
3454634546

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
3454634546

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=Yes
3454errorInput is evaluated first for left and right settings. Since left settings are set to 5 and the input is four digits, the left and right settings condition is not met. This results in an Input too short error.

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, generate error
3454error

Alpha Numeric Token

Left=5
Right=0

Allow Short Data=No, return input as it is
3454error

Unicode Token (Cyrillic alphabet)

Left= 0
Right=0

Allow Short Data=Yes
abдаcdabшcdNon-Cyrillic values are considered as delimiters. Input data is tokenized as as short data is enabled.

Unicode Token (Cyrillic alphabet)

Left= 0
Right=0

Allow Short Data=No
abдаcdError. Input too ShortNon-Cyrillic values are considered as delimiters. Input is too short since the word да (Cyrillic meaning yes - pronounced da) is only two codepoints. The minimum number of codepoints for this token type is 3 characters.