Mapping File

The mapping.json is used for configuring how S3 Protector Function transforms the input data. The mapping file must be uploaded to the input S3 bucket before any files can be processed.

Overview

At minimum, one mapping.json file must be uploaded to the S3 bucket in the root folder. If multiple folders exist in the S3 bucket then each folder can have it’s own mapping.json. When nested folders exist in the bucket, S3 protector will look for a mapping file starting with the same folder as the data file and moving up the directory tree until root folder is reached.

Configuration Structure

The mapping.json file must be formatted in valid JSON with the key-values configuration pairs described below:

{
  "ignored-columns": [<ignored-col-name-1>,...<ignored-col-name-n>]
  "columns": {
    "<col-name-1>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    },
    "<col-name-2>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    },
    ...
  }
  "input": {
    "format": "<file-format>",
    "spec": { <Pandas reader arguments for input format type> }
  },
  "output": {
    "format": "<file-format>",
    "spec": { <Pandas writer arguments for output format type> }
  }
}

Data Columns Transformation

Every source file column must appear in either ‘columns’ or ‘ignored-columns’.

“columns” (required) - Maps input data columns to Protegrity security operation such as ‘protect’ or ‘unprotect’. Each operation is applied using provided data element.
“ignored-columns” (optional) - Lists the names of input data columns which do not require any Protegrity security operations applied. Data for these columns will be left unprocessed and will be written to target file as is.

Input Data Configuration

The “input” optional configuration contains the following key-values pairs:

format

Specifies the format of the input data files. If format is not provided in the mapping json, the format will be inferred from the file extension.

spec

Provides additional configuration for input file processing. This allows processing of non-default file formats. For example, pipe delimited files, header-less files, and various JSON record structures.

Important

Supplying custom arguments might result in an unexpected S3 Protector behavior. Protegrity is not responsible for any damages caused due to the use of custom Pandas configuration. Use this option at your own risk.

The properties within the input spec block correspond with the Python Pandas reader functions arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

CSV - read_csv
Note
The default configuration expects header record, comma-delimited fields, and double quotes for text-qualified fields.
Excel - read_excel
Parquet - read_parquet
JSON - read_json
Note
The default configuration expects the JSON input file to be in a list-like format representing tabular data. Each element of the list is a dictionary representing a row of data. The keys of the dictionaries become the column names. See the JSON appendix example. Also see the Known Limitations section.

Output Data Configuration

The “output” optional configuration contains the following key-values pairs:

format

Specifies the format of the output data files. The format in the mapping json is only used when S3 Protector Function deployment parameter OutputFileFormat is set to use_mapping_spec. See the CloudFormation installation section for the full list of the output format configuration.

spec

Provides additional configuration for the output file processing.

Important

The properties within the output spec block correspond with the Python Pandas DataFrame output function arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

CSV - DataFrame.to_csv
Note
The default configuration writes header record based on the IncludeHeader deployment parameter.
Excel - DataFrame.to_excel
Parquet - DataFrame.to_parquet
JSON - DataFrame.to_json

Feedback

Was this page helpful?

Last modified : January 06, 2026

Mapping File

Overview

Configuration Structure

Data Columns Transformation

Input Data Configuration

format

spec

Important

Note

Note

Output Data Configuration

format

spec

Important

Note

Feedback