Email

Details about the Email token type.

Email token type allows tokenization of an email address. Email tokens keep the domain name and all characters after the “@” sign in the clear. The local part, which is the part before the “@” sign, gets tokenized.

The table lists minimum and maximum length requirements for this token type, which should be applied for the local part, domain part and the entire e-mail.

Table: Email Tokenization Type Properties

Tokenization Type Properties	Settings
Name	Email
Token type and Format	Alphabetic and numeric only. The rest of the characters will be treated as delimiters.
Tokenizer	Length Preservation	Minimum Length			Maximum Length
Tokenizer	Length Preservation	Local	Domain	Entire	Local	Domain	Entire
SLT_1_3 SLT_2_3	No	1	1	3	63	252	256
SLT_1_3 SLT_2_3	No	1	1	3	63	252	256
SLT_1_3 SLT_2_3	Yes	3^*1	1	5	64	252^*2	256
SLT_1_3 SLT_2_3	Yes	3^*1	1	5	64	252^*2	256
Possibility to set minimum/ maximum length	No
Left/Right settings	No
Internal IV	N/A
External IV	Yes
Return of Protected value	Yes
Token specific properties	At least one @ character is required in the input. The right most @ character defines the delimiter between the local and domain parts.

^*1 – If the settings for short data tokenization is set to Yes, then the minimum tokenizable length for the local part of an email is one else it is three.

^*2 – If the settings for short data tokenization is set to Yes, then the maximum length for the domain part of an email is 253 else it is 252.

Email Token Format

An Email token format indicates the tokenization format for email. The email address consists of a local part and a domain, local-part@domain. The local part can be up to 64 characters and the domain name can be up to 254 characters, but the entire email address cannot be longer than 256 characters.

The following table explains email token format input requirements and tokenized output format:

Table: Output Values for Email Token Format

Local Part Input value can consist	Output value can consist
Commonly used: Uppercase and lower case characters through a-z/A-Z. Digits 0-9 Special characters !#$%&'*+-/=?^_`\|}{~ and ASCII: 33, 35-39, 42, 43, 45, 47, 61, 63, 94-96, 123-126 Comments are allowed with parentheses. Used with restrictions: dot character "." when it is not the first or the last and it does not appear more than one time consecutively. Special characters, ASCII: 32, 34, 40, 41, 44, 58, 59, 60, 62, 64, 91-93 are allowed with restrictions. They must only be used when contained between quotation marks. These are the space "32", backslash "92", and quotation mark "34". It must also be preceded by a backslash, for example, "\ \\\". International characters above U+007F are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local parts.	The part before “@” sign will be tokenized. The following will be tokenized: All valid characters will be tokenized by the same rules as alpha-numeric token Comments will be tokenized. The following characters will be considered as delimiters and not tokenized: “.” dot character “()” left and right parenthesis Special characters in local part.
@ Part The “@” character defines the delimiter between the local and domain parts, and will be left in clear.
Domain Part Input value can consist	Output value can consist
Letters and digits Hyphens and dots IP address within square brackets, for example, john.smith@[1.1.1.1]. Non-ASCII domain, internationalized domain parts. Comments are allowed within parentheses	The part after “@” sign will not be tokenized.

Note:
Comments are allowed both in local and domain part of the e-mail token, and comments will be tokenized only if they are in the local part. Here are the examples of comments usage for the e-mail - john.smith@example.com:

john.smith(comment)@example.com
“john(comment).smith@example.com”
john(comment)n.smith@example.com
john.smith@(comment)example.com
john.smith@example.com(comment)

The following table shows examples of the way in which a value will be tokenized with the Email token.

Table: Examples of Tokenization for Email Token Formats

Input Values	Tokenized Values	Comments
Protegrity1234@gmail.com	UNfOxcZ51jWbXMq@gmail.com	All characters before @ symbol are tokenized.
john.smith!@#@$%$%^&@gmail.com	hX3p.yDcwD!@#@$%$%@gmail.com	All symbols except alphabetic are distinguish as delimiters.
email@protegrity@gmail.com	F00CJ@RjDEX9LMDq@gmail.com	The right most @ character defines the delimiter between the local and domain parts.
q@a	asj@a	Min 3 symbols in local part for none length preserving tokens
qdd@a	S0Y@a	Min 5 symbols in local part for length preserving tokens
a@protegrity.com	o@protegrity.com	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=Yes The local part of the email has at least one character to tokenize, which meets the minimum length requirement for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=Yes.
a@protegrity.com email@protegrity.com	a@protegrity.com F00CJ@protegrity.com	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=No, return input as it is If the input value has less than three characters to tokenize, then it is returned as is else it is tokenized.
a@protegrity.com	Error. Input too short.	Email, SLT_1_3, Length Preservation=Yes, Allow Short Data=No, generate an error The local part of the email has one character to tokenize, which is short for SLT_1_3 tokenizer when Length Preservation=Yes and Allow Short Data=No, generate an error.

Email Tokenization Properties for different protectors

Application Protector

The following table shows supported input data types for Application protectors with the Email token.

Table: Supported input data types for Application protectors with Email token

Application Protectors^*2	AP Java^*1	AP Python
Supported input data types	STRING CHAR[] BYTE[]	STRING BYTES

^*1 – The API accepts and returns data in BYTE[] format. The customer application needs to convert the input into byte arrays before calling the API, and similarly, convert the output from byte arrays after receiving the response from the API.

^*2 – The Protegrity Application Protectors only support bytes converted from the string data type. If int, short, or long format data is directly converted to bytes and passed as input to the Application Protector APIs that support byte as input and provide byte as output, then data corruption might occur.

For more information about Application protectors, refer to Application Protector.

Big Data Protector

Protegrity supports MapReduce, Hive, Pig, HBase, Spark, and Impala, which utilizes Hadoop Distributed File System (HDFS) or Ozone as the data storage layer. The data is protected from internal and external threats, and users and business processes can continue to utilize the secured data. Protegrity protects data inside the files using tokenization and strong encryption protection methods.

The following table shows supported input data types for Big Data protectors with the Email token.

Table: Supported input data types for Big Data protectors with Email token

Big Data Protectors	MapReduce^*2	Hive	Pig	HBase^*2	Impala	Spark^*2	Spark SQL	Trino
Supported input data types^*1	BYTE[]	CHAR^*3 STRING	CHARARRAY	BYTE[]	STRING	BYTE[] STRING	STRING	VARCHAR

^*1 – If the input and output types of the API are BYTE [], the customer application should convert the input to a byte array. Then, call the API and convert the output from the byte array.

^*2 – The Protegrity MapReduce protector, HBase coprocessor, and Spark protector only support bytes converted from the string data type. Data types that are not bytes converted from the string data type might cause data corruption to occur when:

Any other data type is directly converted to bytes and passed as input to the MapReduce or Spark API that supports byte as input and provides byte as output.
Any other data type is directly converted to bytes and inserted in an HBase table. Where the HBase table is configured with the Protegrity HBase coprocessor.

^*3 – If you are using the Char tokenization UDFs in Hive, then ensure that the data elements have length preservation selected. In Char tokenization UDFs, using data elements without length preservation selected, is not supported.

For more information about Big Data protectors, refer to Big Data Protector.

Data Warehouse Protector

The Protegrity Data Warehouse Protector is an advanced security solution designed to protect sensitive data at the column level. This enables you to secure your data, while still permitting access to authorized users. Additionally, the Data Warehouse Protector integrates seamlessly with existing database systems using the User-Defined Functions for an enhanced security. Protegrity protects data inside the data warehouses using various tokenization and encryption methods.

The following table shows the supported input data types for the Teradata protector with the Email token.

Table: Supported input data types for Data Warehouse protectors with Email token

Data Warehouse Protectors	Teradata
Supported input data types	VARCHAR LATIN

For more information about Data Warehouse protectors, refer to Data Warehouse Protector.

Database Protectors

The following table shows supported input data types for Database protectors with the Email token.

Table: Supported input data types for Database protectors with Email token

Protector	Oracle	MSSQL
Supported Input Data Types	VARCHAR2 CHAR	VARCHAR CHAR

For more information about Database protectors, refer to Database Protectors

Feedback

Was this page helpful?

Last modified : March 05, 2026