Trusty fleet build queue
Incident Report for CircleCI

We were alerted by Support that multiple customers were having builds queued for long periods of time. SRE looked into the situation and realized that our Trusty fleet was unbalanced - this meant that some jobs were starved of resources even though our monitoring system showed plenty of capacity.

The root cause was determined to be an alert that triggered during a short period of time our on-call coverage was unavailable and the alert remained triggered even after full on-call coverage was restored.

We are going to review our on-call handoff process to remove the chance of this happening in the future.

Posted 9 months ago. May 21, 2017 - 12:44 PDT

During the weekend some Trusty build jobs were starved of resources and unable to be queued
Posted 9 months ago. May 21, 2017 - 12:40 PDT