Content Delivery Performance Issues

Incident Report for Amplience

Postmortem

29/10/2025

‌

Incident Start Time 09.08 BST

Incident End Time 09:28 BST

‌

Issue Summary:

‌

On Monday, October 20, a major outage in Amazon Web Services (AWS) US-EAST-1 affected multiple global services. As a result, Amplience Content Delivery experienced elevated 5xx errors, which prevented some content from displaying on customer websites.

‌

AWS declared their incident at 08:11 BST. However, because our infrastructure is designed with caching and resiliency in mind, the impact to customers was initially delayed. Cached content continued to serve successfully until we saw elevated errors at 09:08 BST.

‌

Our engineers promptly redirected traffic away from the affected AWS region, distributing it across multiple healthy regions to restore service stability. We saw this subside at 09:28 BST.

‌

In addition to the direct impact on our services, several third-party platforms we rely on were also affected by the AWS outage. This temporarily limited our ability to share updates as quickly as usual, and we experienced delays in our Video Transcoding service.

‌

Root Cause:

‌

AWS later confirmed the root cause was a DNS resolution failure in their DynamoDB service, which disrupted downstream services hosted in the US-EAST-1 region.

‌

You can view AWS’s official summary here: https://aws.amazon.com/fr/message/101925/

‌

Corrective Actions:

‌

Once the AWS incident was confirmed as fully resolved, we reverted traffic back to the US-EAST-1 region.
Enhanced internal dashboards to provide clearer visibility of regional traffic flow and routing behaviour.
Initiated a review of our disaster recovery processes and regional redundancy configurations to further reduce future impact.

‌

21/10/2025

We understand that many of you are keen to receive our postmortem report. To ensure we provide a complete and transparent account, we need to allow AWS to complete their investigation and share their RCA. Once received, we’ll incorporate their findings into our own review, identify any actions we can take moving forward, and publish the report as soon as it’s available.

During the incident, we communicated that traffic had been routed away from the affected region. As part of our post-incident actions, we have since redirected traffic back to the original region and are not observing any ongoing issues.

Posted Oct 21, 2025 - 14:35 BST

Resolved

This incident is now resolved.

Posted Oct 20, 2025 - 10:43 BST

Monitoring

Our upstream partners have restored service, and we are monitoring.

Posted Oct 20, 2025 - 10:32 BST

Identified

We are currently seeing degraded Content Delivery performance in one region. Traffic is being redirected away from the affected region, and we are working with our upstream partners to restore normal service.

Posted Oct 20, 2025 - 10:04 BST

This incident affected: Dynamic Content.