This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Mapping File Configuration

Key concepts for defining the mapping.json file

1 - Mapping File

The mapping.json is used for configuring how S3 Protector Function transforms the input data. The mapping file must be uploaded to the input S3 bucket before any files can be processed.

Overview

At minimum, one mapping.json file must be uploaded to the S3 bucket in the root folder. If multiple folders exist in the S3 bucket then each folder can have it’s own mapping.json. When nested folders exist in the bucket, S3 protector will look for a mapping file starting with the same folder as the data file and moving up the directory tree until root folder is reached.

Configuration Structure

The mapping.json file must be formatted in valid JSON with the key-values configuration pairs described below:

{
  "ignored-columns": [<ignored-col-name-1>,...<ignored-col-name-n>]
  "columns": {
    "<col-name-1>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    },
    "<col-name-2>": {
      "operation": "[protect|unprotect]",
      "data_element": "<data-element-name>"
    },
    ...
  }
  "input": {
    "format": "<file-format>",
    "spec": { <Pandas reader arguments for input format type> }
  },
  "output": {
    "format": "<file-format>",
    "spec": { <Pandas writer arguments for output format type> }
  }
}

Data Columns Transformation

Every source file column must appear in either ‘columns’ or ‘ignored-columns’.

  1. “columns” (required) - Maps input data columns to Protegrity security operation such as ‘protect’ or ‘unprotect’. Each operation is applied using provided data element.

  2. “ignored-columns” (optional) - Lists the names of input data columns which do not require any Protegrity security operations applied. Data for these columns will be left unprocessed and will be written to target file as is.

Input Data Configuration

The “input” optional configuration contains the following key-values pairs:

format

Specifies the format of the input data files. If format is not provided in the mapping json, the format will be inferred from the file extension.

spec

Provides additional configuration for input file processing. This allows processing of non-default file formats. For example, pipe delimited files, header-less files, and various JSON record structures.

The properties within the input spec block correspond with the Python Pandas reader functions arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

  • CSV - read_csv

  • Excel - read_excel

  • Parquet - read_parquet

  • JSON - read_json

Output Data Configuration

The “output” optional configuration contains the following key-values pairs:

format

Specifies the format of the output data files. The format in the mapping json is only used when S3 Protector Function deployment parameter OutputFileFormat is set to use_mapping_spec. See the CloudFormation installation section for the full list of the output format configuration.

spec

Provides additional configuration for the output file processing.

The properties within the output spec block correspond with the Python Pandas DataFrame output function arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:

2 - Column Mapping Rules

In order to ensure highest level of security, the S3 Protector requires users to define processing rules for all data columns. Every column that appears in a source file must be mentioned in the ‘mapping.json’ file. A column may appear in either ‘columns’ section or ‘ignored-columns’ section, but not both.

Common Error Conditions

The table below summaries common error conditions that may occur when creating a ‘mapping.json’ file:

MappingError Message
A column name appears in ‘mapping.json’ but does not exist in the source file.Columns [‘column name’] in the mapping file have no matches in the input data columns
Source file column name appears neither in ‘columns’ nor ‘ignored-columns’ sections.Input file contains data columns which are not defined in the mapping file.
Source file column name appears in both ‘columns’ and ‘ignored-columns’ sections.Ignored column [‘column-name’] is present in ‘columns’ list. Column must be defined in either ‘columns’ or ‘ignored-columns’, but not both.
Source file column name appears more than once in either ‘columns’ or ‘ignored-columns’ section.Duplicate column [“column-name”] found in ‘ignored-columns’.