This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Technical Architecture

Overview of the DSG technical architecture

1: Configuration over Programming (CoP) Architecture
2: Dynamic Configuration over Programming (CoP)

System architecture

Protegrity Gateway Technology products are assembled on a layered architecture. The lower layers provide the foundational aspects of the system such as clustering and protocol stacks. The higher layers are specialized and provide various business functions. They are building blocks that instruct on how the gateway should act on data. Some of these building blocks include functions such as decoders for various data formats as well as data transformation for cryptography.

The gateway architecture provides standard out-of-the-box building blocks. These building blocks can be extended by the customer at each layer as per their requirements. These requirements can be security-related or requirements that will aid the customer in processing data.

The following figure shows a view of the gateway system architecture.

Gateway System Architecture

Platform

The Platform Layer runs on top of customer-provided hardware or virtualization resources. It includes an operating system that has been security-hardened by Protegrity, along with an infrastructure layer above it known as the Protegrity Appliance Framework.

The Protegrity Appliance Framework is responsible for common services, such as inter-node communications mechanisms and clustering. Data communicated through the platform layer is passed onto the Data Collection Layer for further processing.

Data collection

The Data Collection Layer is the glue between the higher layers of the gateway and the external world. It is responsible for ingesting data into the gateway and passing it on higher layers for further processing. Likewise, it is responsible for receiving data from the higher layers and outputting it to the external world. In the TCP/IP architecture terms, this is the transport/application protocol layer of the gateway architecture.

Since the primary method by which the gateway interfaces with the external world is through networking, data is typically transmitted to and from the gateway using application-layer protocols such as HTTP, SFTP, and SMTP. The gateway terminates these protocol stacks. These protocols can be extended to include any custom protocol developed by a company to meet its specific requirements, using the gateway’s built-in User Defined Function (UDF) service.

Data delivered through these protocols are passed to the Data Extraction Layer for further processing.

Data extraction layer

The Data Extraction Layer is at the heart of fine-grained data inspection capabilities of the gateway. The Data Extraction layer is split into two logical functions:

Codecs: These are the parsers or the data encoders/decoders targeted at following individual native formats, such as XML, JSON, PDF, ZIP, and Open-Office file formats such as DOCX, PPTX, and XLSX.
Extractors: These are responsible for fine-grained extraction of selected data from within the larger data sets produced by the codec components. These include mechanisms such as Regular Expressions, XPath, and JSONPath.

The subsets of data extracted by the Data Extraction Layer are passed up to the Action Layer. Here, they may be transformed for data security or acted upon for some other business logic. Transformed data subsets received from the Action Layer are substituted in their original place in the original payload. The modified payload is encoded and delivered down to the Data Collection layer for outputting to the external world.

The building blocks in this layer can be extended to include custom requirements through UDFs. UDFs enables customers to build and extend the gateway with their own data decoding and extraction logic using the Python programing language.

Data extracted from payloads is passed to the Action Layer for further processing.

Action layer

The Action Layer is responsible for operating on the data passed on to it by the Data Extraction Layer. The data extracted is processed by actions in the Action Layer.

Operating on this data may include transforming the data for security purposes. This includes all the data security capabilities provided by the core Protegrity platform, such as encryption, tokenization, unprotection, re-protection, hashing, and masking.

This layer also includes a UDF component, enables customers to extend the system with their own action transformation logic using the Python programming language.

1 - Configuration over Programming (CoP) Architecture

Overview of the Configuration over Programming (CoP) concepts

CoP overview

CoP is a key paradigm used in the Protegrity Gateway Technology. The CoP technology enables a CoP administrator to create a set of rules that instructs the gateway on how to process data that traverses it.

The CoP technology is also a key component from a user experience perspective. The hierarchical structure of the rules is just as important as the rules themselves. The set of rules, their structure, and an easy-to-use interface results in a powerful toolset called the CoP.

The DSG is fundamentally architected on the CoP principle. CoP suggests that configuration should be the preferred way of extending or customizing a system as opposed to programming. Users configure rules in a Web UI to define step-by-step processing of incoming messages. This allows DSG users to handle any type of input message such as CSV, fixed-width, or plain text as long as corresponding rules exist within the DSG. The rules are generally categorized as extraction, such as message parsing, and transformation, such as data protection.

The DSG product evolution started with Static CoP, where the request processing rules are configured ahead of time. However, the DSG now incorporates Dynamic CoP, allowing JSON-structured rule definitions to be dynamically injected into request messages, such as an HTTP header field, and executed on the fly.

DSG users configure the CoP Rulesets to construct a REST API that is suitable to their environment. The DSG’s RESTful interface operates at a sufficiently high level that API users are not exposed to low-level cryptographic API message sequences, such as open and close session. Low-level parameters such as data element names, session handles, and similar details are not exposed either. User identity can either be pre-configured in the DSG, derived as a result of HTTP Basic Authentication, or dynamically provided through the API as an HTTP header, whose name is user configurable, or as part of the HTTP message body.

The following figure shows high-level functionality of the DSG RESTful interface.

For simplicity, the DSG example above uses a plain text string that is tokenized word by word, with protected tokens returned in the 200 OK response. The DSG includes a wide range of codecs, which are message parsers that enable it to interpret and process complex payload bodies. DSG’s codecs include XML, JSON, Text, Binary, CSV, Fixed Width, MS Office, PDF, Google Protocol Buffers, HPE ArcSight CEF, Date-Time, and PGP. The DSG also allows custom extraction and transformation rules to be written in Python and integrated into the CoP Rulesets.

The following sections describe the DSG Rulesets, their Structure and the Ruleset engine followed by an example.

CoP Ruleset

The DSG contains built-in standard protocol codecs that enable configuration-driven payload parsing and processing for most data security use cases encountered in typical networking protocols.

The Ruleset describes a set of instructions that the gateway uses to transform data as it traverses the gateway in any direction. The various kinds of Rule objects currently available in the gateway are illustrated in the following figure.

Rule Objects in a Ruleset

A typical Ruleset is constructed from the Extract and Transform rules.

The core rules available today are:

Extract: Extraction rules are responsible for extracting subsets of data from larger bodies of data. By way of engaging existing codecs, they are also capable of interpreting data per predefined encoding schemes. While the Extraction rules function as data filters, they do not actually manipulate data. Therefore, they are branch nodes in the Ruleset tree and have child rules below them.
Transform: Transformation rules are responsible for manipulating data passed into them. Typical data security use cases will employ pre-packaged Transformation rules for performing data protection, un-protection, re-protection, masking, or hashing.

Customers can extend the out-of-the-box transformations with custom Python-coded Transformation User-Defined Functions (UDFs) when the built-in security actions are insufficient.

Log: The Log rule object allows log entries to be added to the DSG log. The user can define the level of logging to be reflected in the log, such as Warning, Error, and so on.
Exit: The Exit option acts as a terminating action and the rules are not processed further.
Set User identity: The Set User Identity rule object comes in effect if username details are part of the payload. The Protegrity Data Protection transformation leverages the value set in this rule such that the subsequent transformation action calls are performed by the set user.
Profile Reference: An external profile can be referenced using the Profile Reference action. This rule transfers the control to a separate batch of rules grouped in a profile.
Error: Use this action to add a custom response message for any invalid content.
Dynamic Injection: Use Dynamic CoP to send rules for extraction and transformation as part of a request header along with the data for protection in the request message body.
Set Context Variable: Use this action type when you want to pass a value as input to the rule. The value set within this rule will be maintained throughout the rule’s lifecycle.

Ruleset Structure

Rulesets are organized in a hierarchical structure where Extract rules are branch nodes and other rules such as Transform rules are leaf nodes. In other words, extract specific data from the payload and then perform a Transform action on the data extracted.

Example Ruleset Hierarchy

Rules are compartmentalized into Profile containers. Profile containers can be enabled or disabled and they can also be referenced by a Profile Reference rule.

Ruleset Tree of Trees (ToT)

Typical rulesets are recursively processed in sequence. With this mechanism, sibling rules under a given parent, along with all child rules belonging to each sibling, are also recursively executed in order. This occurs from top to bottom with no provision for conditional branching.

However, this disallows decision-based, mutually exclusive execution of individual child rules on various parts of extracted data within the same extraction context. Examples include a row in a CSV file, groups within a regular expression, or multiple XPaths within an XML document. This leads to extraction or parsing of the same data multiple times. Various parts of extracted data within the same extraction context may require to be processed differently.

The RuleSet Tree of Trees (ToT) feature is an enhancement to the RuleSet algorithm that addresses this drawback. With the RuleSet ToT feature, an extraction parent rule can have multiple child rules that can be executed mutually-exclusive to each other based on some condition applied in the parent rule. The feature allows different parts of extracted data to be processed downstream using different profile references. Since the profile references are sub-trees in and of themselves, this feature adds a Tree-of-Trees structural notation to the CoP RuleSets.

The following compares the layout and execution paths of traditional rulesets with the ToT rulesets:

CoP vs CoP Tree of Trees

In the above example, a CSV payload needs to be processed as per the following requirements:

Column 1 needs to be protected using an Alphanumeric data element.
Column 6 needs to be protected using a Date data element.
Column 9 needs to be protected using a Unicode data element.

The traditional RuleSet strategy involved extracting or parsing the same CSV payload three times, once for each column requiring protection using different data elements, as illustrated on the left side. In contrast, a ToT-enabled RuleSet requires extracting the CSV payload only once where values extracted from different columns can be sent down different child rules that provide different protection data elements. Consequently, the overall CSV payload processing time reduces substantially.

In this release, the Ruleset ToT feature supports the payloads:

Ruleset execution engine

Rulesets are executed with the Ruleset engine that is built into the gateway. The Ruleset engine is responsible for cascaded execution of the Ruleset. The behaviors of Rules objects range from data processing such as Extract and Transform, to controlling the execution flow of the rule tree such as Exit, to supplementary activities such as logging like Log.

The Ruleset engine will recursively traverse the Ruleset node by node. For example, Extract nodes will extract data that will be transformed with a Transform rule node. Following this, the recursion stack is rolled up and the reverse process happens. Here, data is encoded and packaged back to its original format and sent to the intended recipient.

Ruleset and ruleset execution example

In the following example of a Ruleset, the Ruleset structure and the Ruleset execution are illustrated. This example is started with an HTTP POST with an XML payload of a person’s information. The Ruleset is a hierarchy of 3 Extract nodes with the Transform rule as the end leaf node.

Extract Rule: The Extract Rule extracts the XML document from the message body.
Extract Rule: A second Extract Rule will take the XML document and parse the data that is to be transformed – the person’s name. This is done by using XPath.
Extract Rule: A third Extract Rule will split out the name into individual words – in this example, the first and the last name. This is accomplished using REGEX.
Transform Rule: The Transform Rule will take each word and apply an action. In this example the first name is protected and the last name is protected.

The next set of rules will perform operations in the reverse and prepare the contents to go back to the sender. The same Extraction rules would perform reverse processing as the recursion unwinds.

Extract Rule: On the return trip, an Extract Rule is used to combine the protected first and last name into a single string – Name.
Extract Rule: This rule will place the Name back into the XML document.
Extract Rule: The final Extract rule will place the XML document back into the message body to be sent back to the sender with the name protected.

2 - Dynamic Configuration over Programming (CoP)

Types of CoP.

Ruleset execution can be segregated into Static CoP and Dynamic CoP. When the payload type and structure are predictable and known at system configuration time, you can define Rulesets for such payloads and process the data using Static CoP. It is assumed in Static CoP that a user who defines Rulesets is authorized and holds permission to access DSG nodes.

When organizations are divided into disparate systems or applications, and each system user needs to send custom payloads on the fly to DSG nodes with minimal predictability, granting users access to DSG nodes to define Rulesets becomes risky. In such situations, you can use Dynamic CoP to send extraction and transformation rules in the request header, along with the data to be protected in the request message body.

While creating Rulesets for Dynamic CoP, use the Profile Reference rule for data transformation instead of the Transform rule. The security benefits of using Profile Reference rule are higher than the Transform rule. The reason is that the requests can be triggered out of the secure network perimeter of an organization.

Dynamic CoP provides the following advantages:

Flexibility to send custom requests based on the payload at hand without prior customization to Ruleset configuration
Restrict or configure the allowed actions that users can send in the request header.

The following figure illustrates how Static CoP RuleSets are combined with Dynamic CoP Rulesets as part of a given REST API or Gateway transaction:

Dynamic CoP

The Static CoP Administrator creates the tunnel configurations and Ruleset for the Static CoP rule execution. This static rule forms the base for the Dynamic rule to follow. Based on the URI defined in both the Static CoP rule and Dynamic CoP rule, the entire Ruleset structure is executed when a request is received.
The REST API or gateway clients can be application developers of multiple applications in an organization who need to protect their data on the fly.
The Dynamic CoP structure provides an outline of how the request header must be constructed.
When the request is sent, the header hooks to the Dynamic Injection action type that is part of the Ruleset structure. The Ruleset executes successfully and protected data is sent as a response.

Dynamic CoP structure

Based on the type of Ruleset execution to be achieved, Dynamic CoP can either be implemented with ToT or without ToT.

The following structure explains Ruleset structure when Dynamic CoP is implemented without ToT.

Dynamic CoP without ToT

The following structure explains Ruleset structure when Dynamic CoP is implemented with ToT.

In the Figure, the profileName is the profile reference to the profile that the ToT structure follows. Ensure that you understand the Ruleset structure/hierarchy at the DSG node before configuring the Dynamic CoP with ToT rule. Refer to Dynamic rule and Dynamic rule injection.

Use case implemented using Static CoP

The following image explains how the use case would be implemented if static CoP is used.

PII Usecase with Static CoP

The individual steps are described as following.

Step 1 – This step extracts the body of the HTTP request message. The extracted body content will be the entire JSON document in our example. The extracted output of this Rule will be fed to all its children sequentially. In this example, there is only one child of this extraction rule which is step 2.
Step 2 – This step parses the JSON input as a text document. This is done such that a regular expression can be evaluated to find sensitive data in the document. This step will yield person name strings “Joe Smith” and “Alice Miller” to this child rule. In this example, there is only one child of this extraction rule which is step 3.
Step 3 – This step splits the extracted data from the previous rule into words. Step number 2 above yielded all person names in the document as strings and this rule in step 3 will split those strings into names . The names can then be protected word by word. This will be done by running a simple REGEX on the input. Each word “Joe”, “Smith”, “Alice”, will be fed into children rule nodes of this rule one by one. In this example, there is only one child to this rule, which is step 4.
Step 4– This step does the actual data protection. Since this rule is a transformation node - a leaf node without any children - the rule will return resulting ciphertext or token to the parent.

At the end of Step 4, the RuleSet recursion stack will unwind. Each branch Rule node will reverse its previous action such that the overall data can be returned to its original format. Going back in the reverse direction, Step 4 will return tokens to Step 3 which will concatenate them together into a string. Step 2 will substitute the strings yielded from Step 3 into the original JSON document in place of the original plaintext strings. Step 1 that was responsible for extracting the body of the HTTP request will replace what has been extracted with the modified JSON document. A layer of platform logic outside the RuleSet tree execution will create an HTTP response message. This message will convey the modified JSON document back to the client.

Use case implemented using Dynamic CoP

The following image explains how the use case would be implemented if dynamic CoP is used.

PII Usecase with Static CoP

Among the 4 steps described in implementing Static CoP, steps 2 and 3 are the ones that dictate the real business logic that may change on a request-by-request basis. Step 1 defines extraction of HTTP request message body, which is standard in any REST API request processing. Step 2 defines how sensitive data is extracted from an input JSON message. Step 3 defines how a string is split into words for word-by-word protection. Step 4 defines the data protection parameters.

The logic for step 4 can either be injected through Dynamic CoP or used through Static CoP using the profile references. The protection rule is statically configured in the system and can be referenced from step 3’s Dynamic CoP JSON rule. Users may choose to use statically configured protection rules. Profile references can be used for an added layer of security controls and governance.

In the example, step 4’s logic will be injected through Dynamic CoP. It shows how to convey data element name and policy user’s identity through Dynamic CoP.

Dynamic CoP Ruleset Configurations

The Dynamic CoP JSON uses the same JSON structure as the Static CoP JSON. The only difference is that Dynamic CoP JSON is dynamically injected. To start off with our Dynamic CoP JSON, parts of the corresponding Static CoP JSON have been copied. You can create the Dynamic CoP JSON programmatically or use canned JSON template strings and substitute the variable values in it on a request-by-request basis.

The RuleSet JSON fragment for steps 2, 3 and 4 is shown in the following figure. This JSON will be delivered as-is in an HTTP header. It is configured as “X-Protegrity-DCoP-Rules” in our example. The DSG will extract this configured header name and inject its value while executing the RuleSet tree.

Dynamic CoP JSON Rules

The following figure shows the skeletal Static CoP RuleSet configuration in ESA WebUI for enabling Dynamic CoP.

CoP Ruleset configuration Step 1 to support Dynamic CoP (step 2, 3, and 4)

The following figure shows how the Dynamic CoP rules are conveyed to DSG in an HTTP header field and the JSON response output in the Postman tool.

Dynamic CoP request and response header in Chrome Postman tool

The JSON response output is the same in both our Static and Dynamic CoP examples.