According to the company’s Service Health Dashboard, this week’s Amazon Web Services Inc. outage was caused by a “configuration error”.
Although AWS has been quiet about the issue, which has impacted popular sites like GitHub and Heroku, and has not yet responded to multiple media requests for clarifications, the dashboard provides a running record of the various problems.
One of those issues was an incorrect diagnosis that affected Amazon S3 storage services in the US-STANDARD or US-East regions. It occurred in the early hours of Monday morning.
AWS stated in the dashboard that Amazon S3 experienced high error rates due to a configuration problem in one of the systems Amazon S3 uses for managing request traffic. “We tried restorative measures that didn’t solve our problem because we initially pursued the wrong root cause. We quickly identified the root cause and restored normal operations. The increased S3 API error rate caused significant impact on services that depended on Amazon S3 during the event. These included [Elastic MapReduce] EMR which relies upon S3 for object storage, and EC2, which relies upon S3 to store some [Amazon Machine Images] AAMIs.
Here are the complete series of dashboard notifications:
[Click on the image to see a larger version.] [RESOLVED] Elevated request errors in US-STANDARD (source : AWS).
