Log Forwarder Performance
Log Forwarder Performance
Log forwarder architecture is optimized to minimize the amount of connections and reduce the overall network bandwidth required to send audit logs to ESA. This is achieved with batching and aggregation taking place on two levels.
The first level is in protect function instances, where audit logs from consecutive requests to an instance are batched and aggregated. The second level of batching takes place in Azure Event Hub instance where log records from different protect function instances are additionally batched and sent to log forwarder function where they are aggregated.
These sections show how to configure the deployment to accommodate different patterns of anticipated audit log stream. It also shows how to monitor deployment resources to detect problems so that audit records are not lost.
Protect Service Function Environment Variables
PTY_CORE_FLUSHINTERVAL: Defines for how long audit logs are aggregated before they are sent to Azure Event Hub. Default value is ten seconds. Audit logs are always aggregated into one minute buckets, therefore a value greater than sixty seconds will affect mostly the batching interval.
MAX_WAIT_TIME: Defines for how long aggregated audit logs are batched before they are sent to Azure Event Hub. Default value is ten seconds.
Increasing MAX_WAIT_TIME may result in:
Increased latency or lag of audit logs arriving to Event Hub and therefore ESA
Increased throughput rates due to fewer network requests overall
Increased aggregation rates for values up to one minute Lowering MAX_WAIT_TIME may result in:
Reduced latency or lag for audit logs to arrive to Event Hub and therefore ESA
Reduced throughput rates due to higher number of network requests overall
Reduced aggregation rates for values up to one minute It is not recommended to set MAX_WAIT_TIME lower in production workloads as it may overload the Event Hubs service. Lowering MAX_WAIT_TIME may be beneficial for speeding up log delivery to ESA in dev/test environments.
Log Forwarder ARM Template Parameters
- New Event Hub Namespace Sku Name and New Event Hub Namespace Sku Tier directly affect the quotas applied to new Event Hub instances. Review Azure Event Hub Quotas related to selected tier in Azure documentation: Azure Event Hub Quotas
- New Event Hub Namespace Sku Capacity: Event Hubs throughput units for Basic or Standard tiers, where value should be 0 to 20 throughput units. The Event Hubs premium units for Premium tier, where value should be 0 to 10 premium units. Capacity directly controls the purchased throughput of Event Hub instance. Review details in Azure documentation: Event Hub Instance Throughput
- New Event Hub Partition Count: The number of partitions represents the level of parallel log streams in the Event Hub. It is proportional to throughput capacity of the Event Hub instance. If the number of partitions is too low and the volume of audit logs is too high, a throughput ceiling may be reached on Event Hub and some audit records sent from protect function may be lost. Review details in Azure documentation: Event Hub Scalability
- New Event Hub Audit Log Retention In Days: Number of days audit logs are to be available in Event Hub. Applies to both primary Event Hub instance and dead-letter queue Event Hub instance. While audit logs are processed by Log Forwarder in near-realtime, it may be beneficial to keep audit logs available in Event Hub for extended period in case Log Forwarder or ESA require maintenance.
- Event Hub Name DLQ: Dead-letter Queue Event Hub name. This Event Hub will be used by Log Forwarder in case ESA is temporarily unavailable. Messages from DLQ Event Hub can be re-processed by another instance of Log Forwarder either manually or on schedule once ESA connectivity is restored.
Monitoring Log Forwarder Performance
Azure Event Hub Metrics: Any positive value in ‘Throttled Requests’ metric indicates that audit logs rate from protect function is too high. The recommended actions may include:
- Increase aggregation and batching intervals of Protect function by increasing values of PTY_CORE_FLUSHINTERVAL and MAX_WAIT_TIME
- Increase number of partitions for Event Hub
- Purchase additional capacity units for Event Hub
- Use a higher Event Hub namespace tier
Azure Event Hub Metrics for DLQ Event Hub: Any positive value in ‘Incoming Messages’ metric indicates that not all audit logs are being delivered to ESA. Review whether connection to ESA is set up in Audit Log Forwarder Installation
Protect Function Logs: If protect function is unable to send logs to Event Hub, it will log the following message:
Failed to forward {n} events to Event HubCount number of protect operations: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:
traces | where timestamp >= ago(20h) | where message has 'additional_info' | parse message with * "cnt\":" Count: long * ",\"correlation" * | summarize count_sum = sum(Count)View number of function instances on a graph: Query logs in Log Analytics workspace of Protect Service or Log Forwarder functions:
requests | summarize InstanceCount = dcount(cloud_RoleInstance) by bin(timestamp, 1s) | where timestamp >= ago(2h) | order by timestamp desc | render timechart
Feedback
Was this page helpful?