Yesterday we experienced an issue that caused workflows to be degraded for some customers for about an hour starting at 20:19 UTC, with service fully restored by 21:00 UTC.
The event was triggered by a period of heightened packet loss and latency at an upstream provider. We began to notice increased API error rates within our system, which caused workflow jobs to not start. Once the upstream provider returned to normal activity, workflows began processing and we were able to clear the backlog, while continuing to closely monitor.