User Defined Functions (UDFs)
The DSG provides built-in standard protocol CoP blocks. These blocks allow configuration-driven handling for most data security use cases. In addition, the DSG UDF capability is designed for addressing unique customer requirements that are otherwise not possible to address through configuration only. Such requirements may include extracting relevant data from proprietary application layer protocols and payload formats or altering data in some custom way.
The DSG UDF mechanism is designed for customizations and extensibility of the DSG product deployments in the field. Any UDF code is part of the customer-specific deployment and is not a part of the base DSG product delivery from Protegrity. Customers are responsible for the functionality, quality, and on-going maintenance of their DSG UDF code.
User Defined Functions
The Extraction and Transformation rules are responsible for actual data processing. Thus, they are enabled with UDF functionality.
The concept of UDFs is not new. They are prevalent in the RDBMS world as a means of inserting custom logic in database queries or stored procedures. In comparison to APIs with strict client/server semantics, UDFs allow inserting a small piece of logic to an existing execution flow. UDFs are basically call-backs, which mean that they must comply with the calling program’s interface. They must not negatively affect the overall execution flow in terms of their added latency.
The DSG Extraction and Transformation UDFs are user-written pieces of logic. These must comply with the DSG Rules Engine interfaces to allow switching control and data between the main program and the UDF. However, beyond complying with the DSG’s interface, UDF writers have complete freedom in what they want to achieve within the UDF.
The following figure shows an example RuleSet tree with an Extraction and a Transformation Rule object that are defined as UDFs. In the example, the Extraction UDF performs word by word extraction of input data. The Transformation UDF toggles alphabet cases for each word passed into it.
RuleSet Tree Recursion and Generators
The DSG Rules Engine is responsible for executing RuleSet tree. However, the actual DSG data processing behavior is an outcome of tree recursion where Rule behaviors are executed in the order laid out in the tree. Since the design of RuleSet tree is completely configurable, this approach is referred to as Configuration-over-Programming (CoP).
Extraction rules are branch nodes responsible for mining data, whereas, Transformation rules are leaf nodes responsible for manipulating data. To achieve loose coupling between Rule objects, lazy searches over data and simplicity of programming, Extraction rules are implemented as Generators. Currently, the DSG UDFs are programmable in Python. This means that Extraction UDFs are written with Python yield keyword. It allows Extraction UDFs to be performance efficient. At the same time, it supports an iterator interface without returning an iterator as a data structure collection. The following figure shows how an Extraction rule works as a Generator.
Transformation UDFs require a simple Python class and typically only one method to be implemented. Users implement a Python class called UserDefinedTransformation and implement a transform method in it. The transform method inputs a dictionary as (named context in the following example. This dictionary uses the following two keys:
context[“input”] – Data input into UDF
context[“output”] – Data output from UDF (transformed in some way)
The input and the output data must be in bytes.
class UserDefinedTransformation(object):
def transform(self, context):
input = context["input"]
# Transform input in some way and return it in output
context["output"] = output
Implementing an Extraction
Extraction UDF writers implement a Python class called UserDefinedExtraction with an Extract method in it. The Extract method must be implemented as a Python generator. Similar to Transformation UDFs, Extraction UDFs input a dictionary with input and output keys. In addition, Extraction UDFs use another dictionary for returning Generator output as named item in the following example with value key. Code listing with comments in the following snippet describe the interfaces with the calling program.
class UserDefinedExtraction(object):
def extract(self, context):
input = context["input"]
# Extract desired pieces of data from input
# Return/yield extracted pieces (one by one) to caller for ….. :
# Populate item dict. with value key in it with each extracted piece of data
item = { "value": extractedData }
# Yield extracted pieces of data. They will be passed on to Transformation rules yield item
# Transformed data will be available in item dictionary with value key
transformedData = item["value"]
# Transformed data is assembled back in output and returned to caller
context["output"] = output
User Defined Variables in the UDFs
The Extraction and Transformation UDFs allow users to define their own variables that are maintained throughput the scope of RuleSet execution. This is useful in passing information across different UDFs. For example, setting a variable in one UDF and retrieving it in another UDF. A specific key called cookies has been reserved in the context dictionary for this purpose.
For example, users may use the cookies key to set their own dictionary of parameters and retrieve in a UDF called subsequently.
context["cookies"] = { “customAuthCode”: authCode }
authCode = context["cookies"] [“customAuthCode”]
Passing input arguments in UDFs
The Transformation and Extraction UDF classes allow users to pass in a variable number of statically configured input arguments in their ()init()()__ method as shown in the following screenshot.
Advanced Rule Settings in UDFs
The gateway.json file includes a configuration where vulnerable methods and modules are blocked from being imported as part of the Extract and Transform UDFs. This default behavior can be overruled by setting the Rule Advanced Settings parameter. For more information, refer here.
In the following example source code, the code requests to import the os module. This module is part of the default blocked modules in the gateway.json file. If as part of the UDF rule configuration, it is required that the os module be unblocked. Then, the Rule Advanced Settings parameter must be set as shown in the figure.
Currently, blocked methods cannot be overridden using Advanced settings.
Python code listing of Sample UDFs
This section provides a python code listing of sample UDFs.
"""
Example custom extraction implementation. Extracts words from an
input string.
"""
class UserDefinedExtraction(object):
def __init__(self, *args):
"""
Import Python RE module and compile the RE at object creation
time.
"""
import re
self.pattern = re.compile(b"\w+")
def extract(self, context):
"""
Generator implementation. Takes an input string and splits it
into words using RE module. Words are yielded one at a time.
"""
input = context["input"]
cursor = 0
output = list()
for wordMatch in self.pattern.finditer(input):
output.append(input[cursor:wordMatch.start()])
item = { "value": wordMatch.group() }
yield item
output.append(item["value"])
cursor = wordMatch.end()
output.append(input[cursor:])
context["output"] = b"".join(output)
"""
Custom Transformation UDF: Toggles alphabet cases.
"""
class UserDefinedTransformation(object):
def transform(self, context):
output = []
for c in context["input"].decode():
if c.islower():
output.append(c.upper())
else:
output.append(c.lower())
context["output"] = "".join(output).encode()
Blocked Modules and Methods in UDFs
The modules and methods that are vulnerable to run a UDF can be added to the blocked_modules and blocked_methods parameters respectively in the gateway.json file.
The following snippet shows how to add the vulnerable modules and methods to the gateway.json file.
"globalUDFSettings": {
"blocked_methods": [
"eval",
"exec",
"dir",
"__import__",
"memoryview"
],
"blocked_modules": [
"pip",
"install",
"commands",
"subprocess",
"popen2",
"sys",
"os",
"platform",
"signal",
"asyncio"
]
}
When you reimage to the DSG v3.2.0.0, the blocked modules and methods will not be part of the gateway.json file, instead allowed modules and methods will be listed. The blocked modules and methods are still supported, but it is recommended to use the allowed list approach.
Allowed Modules and Methods in UDF
The modules and methods that are safe to run a UDF is added to the allowed list. All other modules and methods that are not on the list are blocked.
These configurations are added to the globalUDFSettings parameter in the gateway.json file. By default, in the gateway.json file the following modules and methods are allowed.
"globalUDFSettings" : {
"allowed_modules":["bs4", "common.logger", "re", "gzip", "fromstring", "cStringIO","struct", "traceback"] ,
"allowed_methods" : ["BeautifulSoup", "find_all", "fromstring", "format_exc", "list", "dict", "str", "warning"]
}
If the source code in the UDF rule uses any other modules or methods, it is necessary to add them to the allowed list. If you want to allow any vulnerable modules or methods, then it is recommended to use the Rule Advanced Settings option instead.
The blocked and allowed lists are mutually exclusive. If methods or modules are listed in both the blocked and allowed list parameters, then the following error appears in the gateway.log file:
allowed module/methods '<module/method name>' cannot be used with blocked module/method