Quantcast
Channel: MSDN Blogs
Viewing all articles
Browse latest Browse all 5308

Automated deployments Performance Degradation in West Europe – 11/08 – Mitigated

$
0
0

Final Update: Wednesday, November 8th 2017 18:20 UTC

We’ve confirmed that all systems are back to normal as of 11/8/2017 17:36. Our logs show the incident started on 11/8/2017 12:00 and that during the 5 hours and 36 minutes that it took to resolve the issue. Customers experienced a delay between when their builds completing and the release gets triggered. Sorry for any inconvenience this may have caused.

  • Root Cause: The failure was due to a stuck job in one of our backend services.
  • Chance of Re-occurrence: High
  • Lessons Learned: We are working both minimizing resource-intensive activities in our post-deployment steps, and are also working targeting monitors specifically to detect post-deployment issues in the future.
  • Incident Timeline: 5 hours & 36 minutes – 11/8/2017 12:00 UTC through 11/8/2017 17:36

Sincerely,
Randy


Initial Update: Wednesday, November 8th 2017 16:34 UTC

  • We're investigating delay in builds triggering release managment releases in West Europe.
  • We have investigated this back to a stuck job which processes these triggers.
  • We have rebooted and collected a dump of this job in order to do further investigation of the issue.
  • Currently there is a backlog of triggers to process through so customers can expect a delay between when their builds completed and the release gets triggered.
  • We expect to be caught up with the backlog within 1 to 2 hours. We will update this post when complete.

Next Update: Before Wednesday, November 8th 2017 17:05 UTC

Sincerely,
Randy


Viewing all articles
Browse latest Browse all 5308

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>