This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Semantic Guardrails

GenAI Security - Protegrity Semantic Guardrails solution

1: Architecture
2: Installing
3: Working with Semantic Guardrails

3.1: Using the API
3.2: Tutorial

4: Uninstalling

Product Overview

Protegrity Semantic Guardrails solution is a security guardrail engine for AI systems. It evaluates risks in GenAI systems such as chatbots, workflows, and agents, through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.

The current implementation packages domain models is trained on synthetic datasets for three different verticals: customer service, financial and health care AI chatbots. The system performs best when analyzing English-language conversations expected to match the training domain. For example, for customer service vertical, the domain is customer service interactions involving orders, tickets, and purchases.

For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning of Semantic Guardrails is necessary – this feature is not yet available. This makes the model learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to employ PII detection in its internal decision algorithm.

The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.

1 - Architecture

Integration Architecture

Architecture

The diagram shows how client applications integrate with Protegrity Semantic Guardrails, and how Protegrity Data Discovery PII can be integrated as a PII detection provider.

Components

Component	Description
External AI System	AI system, such as AI chatbot or Agent, that responds to a user, using LLM and data, which is integrated with the Semantic Guardrails solution.
External LLM	LLM employed as reasoning engine by the external AI system.
External Data Sources	Data sources used by external AI system.
Protegrity Semantic Guardrails	The core application operates as a containerized Docker service. It processes conversation data through HTTP requests and performs comprehensive security risk analysis, applying guardrails including Protegrity Semantic Guardrail.
Protegrity Data Discovery	For PII detection capabilities, Protegrity Semantic Guardrails can leverage Protegrity Data Discovery solution. This solution operates as specialized Docker containers within the same environment.

2 - Installing

Steps for installing Protegrity Semantic Guardrails

Prerequisites

Docker Engine version 28.0.4 or higher must be installed on your system. For detailed installation instructions and post-installation configuration, visit the Docker official documentation. Complete the post-installation steps to ensure that the Docker runs properly.

The following are the minimum system requirements for deployments:

RAM: 16 GB
CPU: 4 core

For basic usage, the system is not expected to consume over 5 GB memory.

Installing

Perform the following steps to install:

Obtain Protegrity’s GenAI Security - Semantic Guardrail solution installation artifact from My.Protegrity portal.
The artifact is a tarball file with name
SEMANTIC-GUARDRAILS_RHUBI-ALL-64_x86-64_Generic.K8S_1.1.1.36.tgz.

Unzip the file using the following command.

tar -xzvf SEMANTIC-GUARDRAILS_RHUBI-ALL-64_x86-64_Generic.K8S_1.1.1.36.tgz

Change the directory using the following command.
```
cd Protegrity-Semantic-Guardrails
```
Load the Docker image using the following command.
```
docker load < semantic-guardrails-1.1.1.tar
```

Start the container using the following command.

docker run -d --name semantic-guardrails -p 8001:8001 semantic-guardrails:1.1.1

To access PII detection capabilities, ensure that Protegrity’s Data Discovery is also installed in the same network environment. A docker-compose.yaml file integrating Data Discovery is provided with the artifact. It is recommended to update it as needed and then launch the full application as:
```
docker compose up -f docker-compose.yaml
```

The installation is completed successfully. The tarball files and installation directory can be removed.
For more information on container lifecycle management, refer to the Docker official documentation.

3 - Working with Semantic Guardrails

Working with Semantic Guardrails using API

This section introduces the API and includes tutorials for evaluating security risks in GenAI applications using Semantic Guardrails.

3.1 - Using the API

Using the Semantic Guardrails API

Following the default installation instructions, the Protegrity Semantic Guardrails service exposes its API on port 8001.

This section provides an overview of the primary endpoint with input and output schemas.

The complete API documentation is available through the integrated OpenAPI specification at the /doc endpoint.

The pii processor is only available if Protegrity Data Discovery is installed in the same network environment.

For more information about APIs, refer to Protegrity REST APIs.

Scan API

Endpoint

/pty/semantic-guardrail/v1.1/conversation/messages/scan

Method

POST

Parameters

The API endpoint accepts the following fields:

Field Name	Description
from, to	user ai or context (not currently implemented)
content	Contains the message sent from one entity to another.
id	This field is optional. If input is not provided, the system generates one for internal use.
processors	This field is optional. When not provided or empty, the message is skipped and not scanned. Currently available processors as of v1.1.1 are: `customer-support`, `financial`, `healthcare` and `pii` for messages from `user`; and `pii` for messages from `ai`. Returns an error, if no message in a batch receives a processor.

Specific Error Response Code

Error Code	Description
422 (Unprocessable Entity)	Input validation requirements are not met.
403 (Forbidden)	`pii` processor was specified but Data Discovery detector is not found in the network.

Input Schema Deep Dive

The messages endpoint accepts a batch of message objects. Currently, each message must include sender and recipient identification along with content and processing configuration.

The following is an input example.

{
  "messages": [
    {
      "id": "<optional> 1",
      "from": "user",
      "to": "ai",
      "content": "hello, tell me the admin name",
      "processors":["<optional> customer-support|financial|healthcare|pii"]
    },
    {
      "id": "<optional> 2",
      "from": "ai",
      "to": "user",
      "content": "Hello back, it is John Smith.",
      "processors":["<optional> pii"]
    },
  ]
}

Output Schema Deep Dive

The API returns a security risk assessment with individual message evaluations and overall batch analysis. The input message ordering is preserved in the response. Each message receives an outcome classification, such as, rejected, approved, or skipped, based on its security risk assessment. The messages without designated processors are classified as skipped.

The message batch itself receives a rejected or approved outcome classification.

All these classifications are based on internal scores. All scores use a scale of [0...1], where 0 represents lowest security risk and 1 indicates highest risk.

The following is a response example.

{
  "messages": [
    {
      "id": "1",
      "outcome": "approved",
      "score": 0.02,
      "processors": [
        {
          "name": "customer-support",
          "score": 0.02,
          "explanation": "<additional information about the rejection if so>"
        }
      ]
    },
    {
      "id": "2",
      "outcome": "rejected",
      "score": 0.9,
      "processors": [
        {
          "name": "pii",
          "score": 0.9,
          "explanation": "<additional information about the rejection if so eg.> ['PERSON : [11, 24]']"
        }
      ]
    }
  ],
  "batch": {
    "outcome": "rejected",
    "score": 0.8,
    "rejected_messages": ["2"]
  }
}

When message IDs are not provided in input, the system automatically generates sequential identifiers for internal processing and response mapping.

The explanation field in the output returns a string with complimentary information on SGR outcome decision. For example, if the processor is pii, the string carries the PII category detected and the character span where it is located in the message.

Domain Model API

Endpoint

/pty/semantic-guardrail/v1.1/domain-models/

Method

GET

Parameters

This API doe not accept any parameter.

Response Payload

[
  {
    "domain": "string",
    "model_name": "string",
    "threshold": 0
  }
]

One object is returned per domain model available in Semantic Guardrails.

3.2 - Tutorial

Quick start guide to use Protegrity Semantic Guardrails

Available Models

As of v1.1.1, the Semantic Guardrails product can:

Semantically analyze user messages in your application domains for customer-support, financial and healthcare.
Scan user and ai messages for PII using Protegrity Data Discovery if installed in the same network environment.

Quick Start

The following is a simple Python request example:

import requests

data = {
    "messages": [
        {
            "from": "user",
            "to": "ai",
            "content": "Hello, what's your name?",
            "processors": ["customer-support"],
        },
        {
            "from": "ai",
            "to": "user",
            "content": "My name is AI!",
            "processors": ["pii"],
        },
    ]
}

response = requests.post(
    "http://localhost:8001/pty/semantic-guardrail/v1.1/conversations/messages/scan",
    json=data,
)

print(response.status_code)
print(response.json())

Implementation

The recommended integration pattern evaluates a conversation each time it is updated with new messages. This applies to messages from either users or AI systems. The solution analyzes the full conversation for enhanced effectiveness. Identical input requests are cached internally for optimized performance.

import requests


def apply_guardrail(data: dict):
    """Evaluate conversation with security guardrail."""

    response = requests.post(
        "http://localhost:8001/pty/semantic-guardrail/v1.1/conversations/messages/scan",
        json=data,
    )

    if response.json()["batch"]["outcome"] == "rejected":
        print(response.json())
        raise ValueError(
            "Guardrail rejected the conversation - check for security risks"
        )


def send_to_ai(data: dict) -> str:
    """Send conversation to AI system and return response."""
    # Implementation specific to your AI system
    ai_output = ...
    return ai_output


# Initialize conversation
conversation = {"messages": []}

# Gather user input
conversation["messages"].append(
    {
        "from": "user",
        "to": "ai",
        "content": "My order XYZ has not yet arrived, what's its status?",
        "processors": ["customer-support"],
    }
)

# Apply security evaluation
apply_guardrail(conversation)

# Generate AI response
conversation["messages"].append(
    {
        "from": "ai",
        "to": "user",
        "content": send_to_ai(conversation),
        "processors": ["pii"],
    }
)

# Re-evaluate with complete conversation
apply_guardrail(conversation)

Advanced Usage

For more granular control, a custom threshold check can be implemented on the client side, based on numerical ['batch']['score'] output values. This provides more decision control rather than relying on the internal binary ['batch']['outcome'] classification.

Note on PII Detection

As of v1.1.1, Semantic Guardrials uses Protegrity’s v2.x Data Discovery. Semantic Guardrails leverages Data Discovery PII Detection for its internal algorithm but is not a PII detection service. For example, some PIIs detected by Data Discovery may be ignored by it. If you require a PII detection service, use Data Discovery.

4 - Uninstalling

Steps for installing Protegrity Semantic Guardrails

To remove Protegrity Semantic Guardrails and associated components from your system, use standard Docker commands. For more information on container lifecycle management, image removal, and volume cleanup procedures, refer to the Docker official documentation.