AWS S3 was offline from around 9:30AM PST until roughly 2PM PST.
The failure of S3 caused further disruptions to AWS EBS, EC2 and ECR as S3 stores volumes, container images and AMIs. This disruption made CircleCI unable to start new servers, or new build containers. We were not an isolated incident, our upstream resources were also severely impacted. As one example, we stopped receiving any web hooks from GitHub due to the disruption.
The immediate impact of the outage was an increase in the backlog of new jobs. As this happened during the time of the day we typically ramp to meet demand, our server count was under capacity which slowed our ability to process jobs in the queue.
While we waited for AWS to restore S3, we worked to ensure we did not lose any current servers and prepared for the influx of build jobs once web hooks were restored. Our processing of the backlog of jobs continued for approximately two hours after AWS restored services as we worked within the AWS enforced recovery limits to ramp up our server capacity.
AWS S3 Service disruption report: https://aws.amazon.com/message/41926/