This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Performance

Performance benchmarks and considerations.

1: Function App Performance
2: Log Forwarder Performance

1 - Function App Performance

Guidance on Function App Performance Performance settings and considerations.

Function App Performance

Overview

Azure Function apps offer different hosting plans that directly impact the performance, scalability, and cost of Cloud Protect deployments. Understanding these plans and their characteristics is essential for optimizing your data protection operations.

Azure Function App Service Plans

Azure Functions provides several hosting options, each with different characteristics:

Consumption Plan

The Consumption plan provides automatic scaling and charges only for compute resources used during function execution. While cost-effective for sporadic workloads, this plan has limitations:

Cold start latency: Functions may experience delays when starting after periods of inactivity
Limited execution time: Maximum execution duration of 10 minutes per function invocation
Shared infrastructure: Resources are shared across tenants, which can lead to variable performance
Memory constraints: Limited to 1.5 GB of memory per instance

Important

Not recommended for Cloud Protect due to cold start issues and limited resources for data protection operations.

Premium Plan (Recommended)

The Premium plan is the recommended option for Cloud Protect on Azure. It provides enhanced performance and enterprise-grade features:

Pre-warmed instances: Always-ready instances eliminate cold start delays, ensuring consistent performance
Enhanced compute resources: Flexible compute sizing, see App Service Premium version 3 plan
VNET integration: Secure connectivity to on-premises resources and Azure private networks
Unlimited execution duration: No time limits for long-running protection operations
Predictable performance: Dedicated infrastructure ensures consistent throughput
Better scaling control: Minimum and maximum instance count configuration

Important

Use Premium plan for: Production workloads, high-volume data protection, latency-sensitive applications, and enterprise deployments.

Elastic Premium Plan (Recommended)

The Elastic Premium plan extends the Premium plan with additional elasticity and performance optimization:

Rapid scale-out: Faster scaling response to demand spikes
Greater instance limits: Support for larger-scale deployments
Optimized cold start: Even faster initialization compared to standard Premium
Event-driven scaling: More granular scaling based on event sources
All Premium features: Includes VNET integration, pre-warmed instances, and unlimited execution time

Important

Use Elastic Premium plan for: Large-scale deployments, highly variable workloads, mission-critical applications requiring maximum performance and availability.

Cloud Protect Recommendations

Cloud Protect on Azure recommends using either Premium or Elastic Premium plans for production deployments. These plans provide:

Consistent Performance: Pre-warmed instances ensure data protection operations execute immediately without cold start delays
Sufficient Resources: Memory and CPU resources adequate for cryptographic operations and high-volume data processing
Reliability: Dedicated infrastructure for predictable performance and SLA compliance
Security: VNET integration enables secure communication with ESA (Enterprise Security Administrator) and other protected resources
Scalability: Automatic scaling handles variable workloads while maintaining performance standards

Performance Considerations

When deploying Cloud Protect on Azure Functions, consider the following factors:

Instance Sizing

Select appropriate instance sizes based on your workload:

EP1 (Elastic Premium 1): 1 vCPU, 3.5 GB RAM - suitable for moderate workloads
EP2 (Elastic Premium 2): 2 vCPU, 7 GB RAM - recommended for standard production deployments
EP3 (Elastic Premium 3): 4 vCPU, 14 GB RAM - for high-volume or resource-intensive operations

Scaling Configuration

Configure scaling parameters to match your protection requirements:

Minimum instances: Set to at least 1-2 pre-warmed instances to eliminate cold starts
Maximum instances: Configure based on peak load expectations and budget constraints
Scale-out rules: Define appropriate triggers based on CPU, memory, or queue depth metrics

Network Considerations

Use VNET integration to reduce latency when communicating with ESA servers
Enable private endpoints for secure, high-performance connectivity to Azure services (Storage, Key Vault)
Consider proximity placement to co-locate Function apps with dependent resources

Monitoring and Optimization

Monitor execution duration metrics to identify bottlenecks
Track instance count and scaling patterns to optimize configuration
Review memory and CPU utilization to right-size your plan
Set up Application Insights for detailed performance telemetry and diagnostics

Cost vs. Performance Trade-offs

While Premium and Elastic Premium plans have higher baseline costs compared to Consumption, they provide:

Predictable performance and cost structure
Reduced total cost for steady-state workloads (no per-execution charges)
Better resource utilization through persistent instances
Lower operational overhead from consistent behavior

For Cloud Protect deployments handling sensitive data with compliance requirements, the Premium/Elastic Premium investment ensures reliable, performant data protection operations.

2 - Log Forwarder Performance

Guidance on Log Forwarder Performance settings and considerations.

Log Forwarder Performance

Log forwarder architecture is optimized to minimize the amount of connections and reduce the overall network bandwidth required to send audit logs to ESA. This is achieved with batching and aggregation taking place on two levels.

The first level is in protect function instances, where audit logs from consecutive requests to an instance are batched and aggregated. The second level of batching takes place in Azure Event Hub instance where log records from different protect function instances are additionally batched and sent to log forwarder function where they are aggregated.

These sections show how to configure the deployment to accommodate different patterns of anticipated audit log stream. It also shows how to monitor deployment resources to detect problems so that audit records are not lost.

Protect Service Function Environment Variables

PTY_CORE_FLUSHINTERVAL: Defines for how long audit logs are aggregated before they are sent to Azure Event Hub. Default value is ten seconds. Audit logs are always aggregated into one minute buckets, therefore a value greater than sixty seconds will affect mostly the batching interval.
MAX_WAIT_TIME: Defines for how long aggregated audit logs are batched before they are sent to Azure Event Hub. Default value is ten seconds.
Increasing MAX_WAIT_TIME may result in:
1. Increased latency or lag of audit logs arriving to Event Hub and therefore ESA
2. Increased throughput rates due to fewer network requests overall
3. Increased aggregation rates for values up to one minute Lowering MAX_WAIT_TIME may result in:
4. Reduced latency or lag for audit logs to arrive to Event Hub and therefore ESA
5. Reduced throughput rates due to higher number of network requests overall
6. Reduced aggregation rates for values up to one minute It is not recommended to set MAX_WAIT_TIME lower in production workloads as it may overload the Event Hubs service. Lowering MAX_WAIT_TIME may be beneficial for speeding up log delivery to ESA in dev/test environments.

Log Forwarder ARM Template Parameters

New Event Hub Namespace Sku Name and New Event Hub Namespace Sku Tier directly affect the quotas applied to new Event Hub instances. Review Azure Event Hub Quotas related to selected tier in Azure documentation: Azure Event Hub Quotas
New Event Hub Namespace Sku Capacity: Event Hubs throughput units for Basic or Standard tiers, where value should be 0 to 20 throughput units. The Event Hubs premium units for Premium tier, where value should be 0 to 10 premium units. Capacity directly controls the purchased throughput of Event Hub instance. Review details in Azure documentation: Event Hub Instance Throughput
New Event Hub Partition Count: The number of partitions represents the level of parallel log streams in the Event Hub. It is proportional to throughput capacity of the Event Hub instance. If the number of partitions is too low and the volume of audit logs is too high, a throughput ceiling may be reached on Event Hub and some audit records sent from protect function may be lost. Review details in Azure documentation: Event Hub Scalability
New Event Hub Audit Log Retention In Days: Number of days audit logs are to be available in Event Hub. Applies to both primary Event Hub instance and dead-letter queue Event Hub instance. While audit logs are processed by Log Forwarder in near-realtime, it may be beneficial to keep audit logs available in Event Hub for extended period in case Log Forwarder or ESA require maintenance.
Event Hub Name DLQ: Dead-letter Queue Event Hub name. This Event Hub will be used by Log Forwarder in case ESA is temporarily unavailable. Messages from DLQ Event Hub can be re-processed by another instance of Log Forwarder either manually or on schedule once ESA connectivity is restored.

Monitoring Log Forwarder Performance

Azure Event Hub Metrics: Any positive value in ‘Throttled Requests’ metric indicates that audit logs rate from protect function is too high. The recommended actions may include:
- Increase aggregation and batching intervals of Protect function by increasing values of PTY_CORE_FLUSHINTERVAL and MAX_WAIT_TIME
- Increase number of partitions for Event Hub
- Purchase additional capacity units for Event Hub
- Use a higher Event Hub namespace tier
Azure Event Hub Metrics for DLQ Event Hub: Any positive value in ‘Incoming Messages’ metric indicates that not all audit logs are being delivered to ESA. Review whether connection to ESA is set up in Audit Log Forwarder Installation
Protect Function Logs: If protect function is unable to send logs to Event Hub, it will log the following message:
```
Failed to forward {n} events to Event Hub
```

Count number of protect operations: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:

traces
| where timestamp >= ago(20h)
| where message has 'additional_info'
| parse message with * "cnt\":" Count: long *  ",\"correlation" *
| summarize count_sum = sum(Count)

View number of function instances on a graph: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:

requests
| summarize InstanceCount = dcount(cloud_RoleInstance) by bin(timestamp, 1s)
| where timestamp >= ago(2h)
| order by timestamp desc
| render timechart