The Enhanced Adobe PDF codec extracts the PDF payload from the request and lets you define Regex to control precise extraction. This payload is available when the Action type is selected as Extract.
As part of the ruleset construction for this codec, it is mandatory to include a child Text extract rule under the Enhanced Adobe PDF codec extract rule. You must not use any other rule apart from the child Text extract rule under the Enhanced Adobe PDF codec extract rule.
In the DSG, some font files are already added to the /opt/protegrity/alliance/config/pdf_fonts directory. By default, the following font file is set in the gateway.json file.
"pdf_codec_default_font":{
"name": "OpenSans-Regular.ttf"
}
Note: The Advanced Settings can be used to configure the default font file for a specific rule.
If you want to process a PDF file that contains custom fonts, then upload it to the /opt/protegrity/alliance/config/pdf_fonts directory. If the custom fonts are not uploaded to the mentioned directory, then the OpenSans-Regular.ttf font file will be used to process the PDF file.
For more information about how-to examples to detokenize a PDF, refer to the section Using Amazon S3 to Detokenize a PDF and Using HTTP Tunnel to Detokenize a PDF in the Protegrity Data Security Gateway How-to Guide.
The following figure displays the Enhanced Adobe PDF payload fields.
The properties for the Enhanced Adobe PDF payload are explained in the following table.
Note: The configurations in the Advanced Settings are only applicable for that specific rule.
Properties | Description |
---|---|
Pattern | Pattern to be matched for is specified in the field.If no pattern is specified, then the whole input is considered for matching. |
Advanced Settings | Set the following additional configurations for the Enhanced Adobe PDF codec. Set the margins to determine if it is a line or paragraph in the PDF file.
Note: The {“layout_analysis_config” : {“char_margin”: 0.1, “line_margin”: 0.1}} settings can also be configured in the gateway.json file. Set the default font file to process the PDF file.
|
The following list describes the known limitations for this release.