Azure is currently in Private Preview and is not available for General Availability (GA). It should not be used in production environments, as features and functionality may change before the final GA release.

Azure Databricks

Big Data Protector on Azure Databricks

The Protegrity Big Data Protector for Azure Databricks delivers end‑to‑end data protection. Organizations deploying the Big Data Protector rely on modern, supported storage options such as Workspace storage, Unity Catalog Volumes, and cloud object storage like ADLS Gen2 or Azure Blob Storage.

Designed to secure sensitive data across analytics pipelines, the Big Data Protector applies advanced tokenization and encryption during Spark execution and enforces centralized, policy‑driven controls. Whether installed via Unity Catalog Volumes for init script and .tgz delivery, the Protector ensures resilient execution across Azure Databricks clusters.

By embracing cloud‑native storage paths, this approach ensures long‑term compatibility with Databricks platform changes while maintaining Protegrity’s standard of seamless and transparent protection. Organizations can continue to process high‑value datasets on Azure Databricks with confidence knowing that sensitive information is secured across its lifecycle, even as the underlying platform evolves.

The Protegrity Big Data Protector for Azure Databricks empowers organizations to secure sensitive data across their analytics pipelines by combining high‑performance protection mechanisms with flexible deployment models tailored for modern cloud architectures. Central to this capability are two approaches; Application Protector REST (AP REST) and Cloud Protector approach. Each approach is designed to address different customer requirements around scalability, infrastructure usage, and cost optimization.

Application Protector REST Approach

The AP REST model enables data protection directly within the Databricks cluster itself, eliminating the need for a separate Cloud API infrastructure. This approach is particularly suitable for customers who want to avoid maintaining additional cloud-native services for protection operations.

With AP REST, protection workflows are executed through REST endpoints running on the cluster, allowing seamless scaling along with Databricks’ auto-scaling compute. This ensures that sensitive data remains protected throughout processing while also adapting automatically to dynamically assigned IPs in auto-scaling environments. This results in an operationally efficient fit for Spark-driven workloads on Azure.

For the Application Protector REST Approach, the following cluster types are supported:

  • Databricks Dedicated Compute
  • Databricks Standard Compute

For the Application Protector REST approach, the following sections are applicable:

Cloud Protector Approach

The Cloud Protector approach extends protection capabilities by offering centralized, cloud-hosted security services for environments that require externally managed protection layers. It enables highly scalable, policy-driven tokenization and encryption without requiring protection logic to reside inside the Databricks compute itself.

In contexts where Cloud Protector is integrated with the Big Data Protector, organizations benefit from lifecycle-wide protection that spans storage, compute, and inter-system data transfers. Cloud Protector provides the foundation for UDF-driven protections (including Spark and Unity Catalog–level enforcement), ensuring centralized governance across distributed analytics ecosystems.

For the Cloud Protector approach, the following cluster types are supported:

  • Databricks Dedicated Compute
  • Databricks Standard Compute
  • Databricks SQL Warehouse

For the Cloud Protector approach, the following sections are applicable:

Conclusion

Together, these two approaches provide enterprises the flexibility to choose a data protection strategy aligned with their architectural, cost, and compliance requirements whether fully cluster-local using AP REST, centrally managed via Cloud Protector, or in hybrid deployments. This dual-path model ensures that Azure Databricks customers can achieve seamless, transparent, policy-based data protection while continuing to extract high-value insights from their data securely and efficiently.


Last modified : May 07, 2026