Final Update: Wednesday, November 8th 2017 18:20 UTC
We’ve confirmed that all systems are back to normal as of 11/8/2017 17:36. Our logs show the incident started on 11/8/2017 12:00 and that during the 5 hours and 36 minutes that it took to resolve the issue. Customers experienced a delay between when their builds completing and the release gets triggered. Sorry for any inconvenience this may have caused.
- Root Cause: The failure was due to a stuck job in one of our backend services.
- Chance of Re-occurrence: High
- Lessons Learned: We are working both minimizing resource-intensive activities in our post-deployment steps, and are also working targeting monitors specifically to detect post-deployment issues in the future.
- Incident Timeline: 5 hours & 36 minutes – 11/8/2017 12:00 UTC through 11/8/2017 17:36
Sincerely,
Randy
Initial Update: Wednesday, November 8th 2017 16:34 UTC
- We're investigating delay in builds triggering release managment releases in West Europe.
- We have investigated this back to a stuck job which processes these triggers.
- We have rebooted and collected a dump of this job in order to do further investigation of the issue.
- Currently there is a backlog of triggers to process through so customers can expect a delay between when their builds completed and the release gets triggered.
- We expect to be caught up with the backlog within 1 to 2 hours. We will update this post when complete.
Next Update: Before Wednesday, November 8th 2017 17:05 UTC
Sincerely,
Randy