Mapping File Configuration

Key concepts for defining the mapping.json file

1: Mapping File
2: Column Mapping Rules

1 - Mapping File

The mapping.json is used for configuring how S3 Protector transforms the input data.

Overview

S3 Protector uses a mapping.json file to determine how columns in the source file are mapped to Protegrity data elements for protection. The Lambda resolves the mapping file location using the following precedence order (highest to lowest):

Priority	Source	Description
1 — S3 Object Tag	AWS S3 object tag `MAPPING_LOCATION` on the source file	If the source S3 object has a tag with the key `MAPPING_LOCATION`, its value is used to locate the mapping file. The value can be a full S3 URI (`s3://bucket/path/to/mapping.json`) pointing to an exact file, or a bucket name for a hierarchical folder walk. This takes precedence over all other methods. Requires `s3:GetObjectTagging` permission on the source bucket.
2 — `MAPPING_CONFIG_BUCKET` (mirror bucket)	`MAPPING_CONFIG_BUCKET` environment variable	When set, S3 Protector looks for the mapping file in this dedicated bucket, using the same folder path as the source file (mirroring the source bucket’s folder structure). Use this to centralise mapping files without tagging every individual object or using source bucket.
3 — Source bucket	Source S3 bucket (used by default)	If neither a tag nor `MAPPING_CONFIG_BUCKET` resolves a mapping file, S3 Protector falls back to loading `mapping.json` from the same bucket and folder as the source file.

Using S3 Object Tags for Mapping File Resolution

Add a tag to the source S3 object to point to a specific mapping file. The tag key is always MAPPING_LOCATION. The tag value supports two formats:

Tag key	Tag value format	Example	Behaviour
`MAPPING_LOCATION`	Full S3 URI — `s3://bucket/path/to/mapping.json`	`s3://my-config-bucket/configs/customer_a/mapping.json`	Loads exactly that file. Raises an error if the file is not found. No hierarchical folder walk is performed.
`MAPPING_LOCATION`	Bucket name only	`my-config-bucket`	Performs the same hierarchical `mapping.json` folder walk as `MAPPING_CONFIG_BUCKET`, starting from the source file’s folder within the named bucket.

IAM note: The Lambda execution role must have s3:GetObjectTagging on the source bucket when using tag-based resolution.

Configuration Structure

The mapping.json file must be formatted in valid JSON with the key-values configuration pairs described below:

{
  "ignored-columns": ["<ignored-col-name-1>", "<ignored-col-name-n>"],
  "columns": {
    "<col-name-1>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    },
    "<col-name-2>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    }
  },
  "input": {
    "format": "<file-format>",
    "spec": { "<reader-arg>": "<value>" }
  },
  "output": {
    "format": "<file-format>",
    "spec": { "<writer-arg>": "<value>" }
  }
}

Data Columns Transformation

Every source file column must appear in either ‘columns’ or ‘ignored-columns’.

“columns” (required) - Maps input data columns to Protegrity security operation such as ‘protect’ or ‘unprotect’. Each operation is applied using provided data element.
“ignored-columns” (optional) - Lists the names of input data columns which do not require any Protegrity security operations applied. Data for these columns will be left unprocessed and will be written to target file as is.

Input Data Configuration

The “input” optional configuration contains the following key-values pairs:

format

Specifies the format of the input data files. If format is not provided in the mapping json, the format will be inferred from the file extension.

spec

Provides additional configuration for input file processing. This allows processing of non-default file formats. For example, pipe delimited files, header-less files, and various JSON record structures.

Important

Supplying custom arguments might result in an unexpected S3 Protector behavior. Protegrity is not responsible for any damages caused due to the use of custom Pandas configuration. Use this option at your own risk.

The properties within the input spec block correspond with the Python Pandas reader functions arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

CSV - read_csv
Note
The default configuration expects header record, comma-delimited fields, and double quotes for text-qualified fields.
Excel - read_excel
Parquet - read_parquet
Note
The default configuration reads Parquet files in batches to reduce memory usage. This ignores storage_options argument, which affects non-AWS S3 implementations such as MinIO and LocalStack. To load full file and enable storage_options, set "chunked": false in input.spec.
JSON - read_json
Note
The default configuration expects the JSON input file to represent tabular data. Common supported layouts are a flat JSON array of records or JSON Lines. Each record becomes one row, and the keys become the column names. See the JSON appendix example and the Known Limitations section.

Output Data Configuration

The “output” optional configuration contains the following key-values pairs:

format

Specifies the format of the output data files. The format in the mapping json is only used when S3 Protector Function deployment parameter OutputFileFormat is set to use_mapping_spec. See the CloudFormation installation section for the full list of the output format configuration.

spec

Provides additional configuration for the output file processing.

Important

The properties within the output spec block correspond with the Python Pandas DataFrame output function arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

CSV - DataFrame.to_csv
Note
The default configuration writes header record based on the IncludeHeader deployment parameter.
Excel - DataFrame.to_excel
Parquet - DataFrame.to_parquet
Note
If a large Parquet file is being processed in chunks, only index and compression arguments are applied from the output spec. Other arguments are ignored.
JSON - DataFrame.to_json

2 - Column Mapping Rules

In order to ensure highest level of security, the S3 Protector requires users to define processing rules for all data columns. Every column that appears in a source file must be mentioned in the ‘mapping.json’ file. A column may appear in either ‘columns’ section or ‘ignored-columns’ section, but not both.

Common Error Conditions

The table below summaries common error conditions that may occur when creating a ‘mapping.json’ file:

Mapping	Error Message
A column name appears in ‘mapping.json’ but does not exist in the source file.	Columns [‘column name’] in the mapping file have no matches in the input data columns
Source file column name appears neither in ‘columns’ nor ‘ignored-columns’ sections.	Input file contains data columns which are not defined in the mapping file.
Source file column name appears in both ‘columns’ and ‘ignored-columns’ sections.	Ignored column [‘column-name’] is present in ‘columns’ list. Column must be defined in either ‘columns’ or ‘ignored-columns’, but not both.
Source file column name appears more than once in either ‘columns’ or ‘ignored-columns’ section.	Duplicate column [“column-name”] found in ‘ignored-columns’.

Note

The column names in the mapping file are case sensitive.