Mapping File
Overview
At minimum, one mapping.json file must be uploaded to the S3 bucket in the root folder. If multiple folders exist in the S3 bucket then each folder can have it’s own mapping.json. When nested folders exist in the bucket, S3 protector will look for a mapping file starting with the same folder as the data file and moving up the directory tree until root folder is reached.
Configuration Structure
The mapping.json file must be formatted in valid JSON with the key-values configuration pairs described below:
{
"ignored-columns": [<ignored-col-name-1>,...<ignored-col-name-n>]
"columns": {
"<col-name-1>": {
"operation": "[protect|unprotect]",
"data_element": "<data-element-name>"
},
"<col-name-2>": {
"operation": "[protect|unprotect]",
"data_element": "<data-element-name>"
},
...
}
"input": {
"format": "<file-format>",
"spec": { <Pandas reader arguments for input format type> }
},
"output": {
"format": "<file-format>",
"spec": { <Pandas writer arguments for output format type> }
}
}
Data Columns Transformation
Every source file column must appear in either ‘columns’ or ‘ignored-columns’.
“columns” (required) - Maps input data columns to Protegrity security operation such as ‘protect’ or ‘unprotect’. Each operation is applied using provided data element.
“ignored-columns” (optional) - Lists the names of input data columns which do not require any Protegrity security operations applied. Data for these columns will be left unprocessed and will be written to target file as is.
Input Data Configuration
The “input” optional configuration contains the following key-values pairs:
format
Specifies the format of the input data files. If format is not provided in the mapping json, the format will be inferred from the file extension.
spec
Provides additional configuration for input file processing. This allows processing of non-default file formats. For example, pipe delimited files, header-less files, and various JSON record structures.
Important
Supplying custom arguments might result in an unexpected S3 Protector behavior. Protegrity is not responsible for any damages caused due to the use of custom Pandas configuration. Use this option at your own risk.The properties within the input spec block correspond with the Python Pandas reader functions arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:
CSV - read_csv
Note
The default configuration expects header record, comma-delimited fields, and double quotes for text-qualified fields.Excel - read_excel
Parquet - read_parquet
JSON - read_json
Note
The default configuration expects the JSON input file to be in a list-like format representing tabular data. Each element of the list is a dictionary representing a row of data. The keys of the dictionaries become the column names. See the JSON appendix example. Also see the Known Limitations section.
Output Data Configuration
The “output” optional configuration contains the following key-values pairs:
format
Specifies the format of the output data files. The format in the mapping json is only used when S3 Protector Function deployment parameter OutputFileFormat is set to use_mapping_spec. See the CloudFormation installation section for the full list of the output format configuration.
spec
Provides additional configuration for the output file processing.
Important
Supplying custom arguments might result in an unexpected S3 Protector behavior. Protegrity is not responsible for any damages caused due to the use of custom Pandas configuration. Use this option at your own risk.The properties within the output spec block correspond with the Python Pandas DataFrame output function arguments. For more information about supported format arguments refer to the Pandas documentation. Below you can find a list of links to Pandas official online documentation for each format supported by S3 Protector:
CSV - DataFrame.to_csv
Note
The default configuration writes header record based on the IncludeHeader deployment parameter.Excel - DataFrame.to_excel
Parquet - DataFrame.to_parquet
JSON - DataFrame.to_json
Feedback
Was this page helpful?