This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Performance

Performance benchmarks and considerations.

1 - Function App Performance

Guidance on Function App Performance Performance settings and considerations.

Function App Performance

Overview

Azure Function apps offer different hosting plans that directly impact the performance, scalability, and cost of Cloud Protect deployments. Understanding these plans and their characteristics is essential for optimizing your data protection operations.

Azure Function App Service Plans

Azure Functions provides several hosting options, each with different characteristics:

Consumption Plan

The Consumption plan provides automatic scaling and charges only for compute resources used during function execution. While cost-effective for sporadic workloads, this plan has limitations:

  • Cold start latency: Functions may experience delays when starting after periods of inactivity
  • Limited execution time: Maximum execution duration of 10 minutes per function invocation
  • Shared infrastructure: Resources are shared across tenants, which can lead to variable performance
  • Memory constraints: Limited to 1.5 GB of memory per instance

The Premium plan is the recommended option for Cloud Protect on Azure. It provides enhanced performance and enterprise-grade features:

  • Pre-warmed instances: Always-ready instances eliminate cold start delays, ensuring consistent performance
  • Enhanced compute resources: Flexible compute sizing, see App Service Premium version 3 plan
  • VNET integration: Secure connectivity to on-premises resources and Azure private networks
  • Unlimited execution duration: No time limits for long-running protection operations
  • Predictable performance: Dedicated infrastructure ensures consistent throughput
  • Better scaling control: Minimum and maximum instance count configuration

The Elastic Premium plan extends the Premium plan with additional elasticity and performance optimization:

  • Rapid scale-out: Faster scaling response to demand spikes
  • Greater instance limits: Support for larger-scale deployments
  • Optimized cold start: Even faster initialization compared to standard Premium
  • Event-driven scaling: More granular scaling based on event sources
  • All Premium features: Includes VNET integration, pre-warmed instances, and unlimited execution time

Cloud Protect Recommendations

Cloud Protect on Azure recommends using either Premium or Elastic Premium plans for production deployments. These plans provide:

  1. Consistent Performance: Pre-warmed instances ensure data protection operations execute immediately without cold start delays
  2. Sufficient Resources: Memory and CPU resources adequate for cryptographic operations and high-volume data processing
  3. Reliability: Dedicated infrastructure for predictable performance and SLA compliance
  4. Security: VNET integration enables secure communication with ESA (Enterprise Security Administrator) and other protected resources
  5. Scalability: Automatic scaling handles variable workloads while maintaining performance standards

Performance Considerations

When deploying Cloud Protect on Azure Functions, consider the following factors:

Instance Sizing

Select appropriate instance sizes based on your workload:

  • EP1 (Elastic Premium 1): 1 vCPU, 3.5 GB RAM - suitable for moderate workloads
  • EP2 (Elastic Premium 2): 2 vCPU, 7 GB RAM - recommended for standard production deployments
  • EP3 (Elastic Premium 3): 4 vCPU, 14 GB RAM - for high-volume or resource-intensive operations

Scaling Configuration

Configure scaling parameters to match your protection requirements:

  • Minimum instances: Set to at least 1-2 pre-warmed instances to eliminate cold starts
  • Maximum instances: Configure based on peak load expectations and budget constraints
  • Scale-out rules: Define appropriate triggers based on CPU, memory, or queue depth metrics

Network Considerations

  • Use VNET integration to reduce latency when communicating with ESA servers
  • Enable private endpoints for secure, high-performance connectivity to Azure services (Storage, Key Vault)
  • Consider proximity placement to co-locate Function apps with dependent resources

Monitoring and Optimization

  • Monitor execution duration metrics to identify bottlenecks
  • Track instance count and scaling patterns to optimize configuration
  • Review memory and CPU utilization to right-size your plan
  • Set up Application Insights for detailed performance telemetry and diagnostics

Cost vs. Performance Trade-offs

While Premium and Elastic Premium plans have higher baseline costs compared to Consumption, they provide:

  • Predictable performance and cost structure
  • Reduced total cost for steady-state workloads (no per-execution charges)
  • Better resource utilization through persistent instances
  • Lower operational overhead from consistent behavior

For Cloud Protect deployments handling sensitive data with compliance requirements, the Premium/Elastic Premium investment ensures reliable, performant data protection operations.

2 - Log Forwarder Performance

Guidance on Log Forwarder Performance settings and considerations.

    Log Forwarder Performance

    Log forwarder architecture is optimized to minimize the amount of connections and reduce the overall network bandwidth required to send audit logs to ESA. This is achieved with batching and aggregation taking place on two levels.

    The first level is in protect function instances, where audit logs from consecutive requests to an instance are batched and aggregated. The second level of batching takes place in Azure Event Hub instance where log records from different protect function instances are additionally batched and sent to log forwarder function where they are aggregated.

    These sections show how to configure the deployment to accommodate different patterns of anticipated audit log stream. It also shows how to monitor deployment resources to detect problems so that audit records are not lost.

    Protect Service Function Environment Variables

    • PTY_CORE_FLUSHINTERVAL: Defines for how long audit logs are aggregated before they are sent to Azure Event Hub. Default value is ten seconds. Audit logs are always aggregated into one minute buckets, therefore a value greater than sixty seconds will affect mostly the batching interval.

    • MAX_WAIT_TIME: Defines for how long aggregated audit logs are batched before they are sent to Azure Event Hub. Default value is ten seconds.

      Increasing MAX_WAIT_TIME may result in:

      1. Increased latency or lag of audit logs arriving to Event Hub and therefore ESA

      2. Increased throughput rates due to fewer network requests overall

      3. Increased aggregation rates for values up to one minute Lowering MAX_WAIT_TIME may result in:

      4. Reduced latency or lag for audit logs to arrive to Event Hub and therefore ESA

      5. Reduced throughput rates due to higher number of network requests overall

      6. Reduced aggregation rates for values up to one minute It is not recommended to set MAX_WAIT_TIME lower in production workloads as it may overload the Event Hubs service. Lowering MAX_WAIT_TIME may be beneficial for speeding up log delivery to ESA in dev/test environments.

    Log Forwarder ARM Template Parameters

    • New Event Hub Namespace Sku Name and New Event Hub Namespace Sku Tier directly affect the quotas applied to new Event Hub instances. Review Azure Event Hub Quotas related to selected tier in Azure documentation: Azure Event Hub Quotas
    • New Event Hub Namespace Sku Capacity: Event Hubs throughput units for Basic or Standard tiers, where value should be 0 to 20 throughput units. The Event Hubs premium units for Premium tier, where value should be 0 to 10 premium units. Capacity directly controls the purchased throughput of Event Hub instance. Review details in Azure documentation: Event Hub Instance Throughput
    • New Event Hub Partition Count: The number of partitions represents the level of parallel log streams in the Event Hub. It is proportional to throughput capacity of the Event Hub instance. If the number of partitions is too low and the volume of audit logs is too high, a throughput ceiling may be reached on Event Hub and some audit records sent from protect function may be lost. Review details in Azure documentation: Event Hub Scalability
    • New Event Hub Audit Log Retention In Days: Number of days audit logs are to be available in Event Hub. Applies to both primary Event Hub instance and dead-letter queue Event Hub instance. While audit logs are processed by Log Forwarder in near-realtime, it may be beneficial to keep audit logs available in Event Hub for extended period in case Log Forwarder or ESA require maintenance.
    • Event Hub Name DLQ: Dead-letter Queue Event Hub name. This Event Hub will be used by Log Forwarder in case ESA is temporarily unavailable. Messages from DLQ Event Hub can be re-processed by another instance of Log Forwarder either manually or on schedule once ESA connectivity is restored.

    Monitoring Log Forwarder Performance

    • Azure Event Hub Metrics: Any positive value in ‘Throttled Requests’ metric indicates that audit logs rate from protect function is too high. The recommended actions may include:

      • Increase aggregation and batching intervals of Protect function by increasing values of PTY_CORE_FLUSHINTERVAL and MAX_WAIT_TIME
      • Increase number of partitions for Event Hub
      • Purchase additional capacity units for Event Hub
      • Use a higher Event Hub namespace tier
    • Azure Event Hub Metrics for DLQ Event Hub: Any positive value in ‘Incoming Messages’ metric indicates that not all audit logs are being delivered to ESA. Review whether connection to ESA is set up in Audit Log Forwarder Installation

    • Protect Function Logs: If protect function is unable to send logs to Event Hub, it will log the following message:

      Failed to forward {n} events to Event Hub
      
    • Count number of protect operations: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:

      traces
      | where timestamp >= ago(20h)
      | where message has 'additional_info'
      | parse message with * "cnt\":" Count: long *  ",\"correlation" *
      | summarize count_sum = sum(Count)
      
    • View number of function instances on a graph: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:

      requests
      | summarize InstanceCount = dcount(cloud_RoleInstance) by bin(timestamp, 1s)
      | where timestamp >= ago(2h)
      | order by timestamp desc
      | render timechart