Semantic Guardrails

GenAI Security - Protegrity Semantic Guardrails solution

Product Overview

Protegrity Semantic Guardrails solution is a security guardrail engine for AI systems. It evaluates risks in GenAI systems such as chatbots, workflows, and agents, through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.

The current implementation packages domain models is trained on synthetic datasets for three different verticals: customer service, financial and health care AI chatbots. The system performs best when analyzing English-language conversations expected to match the training domain. For example, for customer service vertical, the domain is customer service interactions involving orders, tickets, and purchases.

For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning of Semantic Guardrails is necessary – this feature is not yet available. This makes the model learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to employ PII detection in its internal decision algorithm.

The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.

Feedback

Was this page helpful?

Last modified : March 18, 2026

Semantic Guardrails

Product Overview

Architecture

Installing

Working with Semantic Guardrails

Uninstalling

Feedback