ÃÛ¶¹ÊÓÆµ

Resolving high load issues in AEM publish servers due to Time-to-Live (TTL) caching

This article addresses performance spikes on AEM publish servers caused by a 5-minute TTL caching strategy in AEM as a Cloud Service - Sites. The issue arises due to frequent cache expiration, which increases the load as servers handle multiple simultaneous requests for dynamic content.

Description description

Environment

Product: ÃÛ¶¹ÊÓÆµ Experience Manager (AEM) as a Cloud Service - Sites
Environment: Development and SQA environments
Configuration: 5-minute TTL set in Dispatcher TTL filter

Issue/Symptoms

  • Load spikes on publish servers within the 5-minute TTL interval.
  • Increased traffic to publish instances after cache invalidation.
  • Scalability and downtime concerns during high-load scenarios.

Resolution resolution

To address the issue, follow these steps:

  1. Conduct performance tests in Development (DEV) or Software Quality Assurance (SQA) environments before applying changes to production. Use tools like Grafana to monitor server load and identify bottlenecks or excessive resource utilization.
  2. Verify that .ttl files are correctly generated in the dispatcher cache for all relevant content. Ensure that cached content expires and refreshes as expected without straining backend services.
  3. Confirm that your AEM environment is configured to scale based on traffic demands. Although AEM’s publish servers are designed for high-throughput scenarios, testing should validate their ability to manage increased loads from frequent cache invalidation.
  4. Consider alternative caching strategies like Sling Dynamic Include (SDI) if feasible. SDI can reduce load by dynamically including frequently changing components without relying solely on dispatcher-level caching.
  5. Collaborate with ÃÛ¶¹ÊÓÆµ support engineers during testing to monitor critical metrics and optimize configurations. Share test results from tools like Splunk or Grafana for further analysis.
  6. After successful validation in lower environments, schedule a controlled rollout of changes to production. Continue monitoring server performance post-deployment to ensure stability and promptly address any unforeseen issues.

By following these steps, you can mitigate performance impacts from short TTL values, maintain scalability, and minimize downtime risks.

recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f