The last couple of months I was involved in a project of building a fresh new Remote Desktop Services environment on Microsoft Azure. One of the advantages of having your RDS environment on Azure is that you can scale the number of Session Hosts based on the user load. In this project the number of Session Hosts were defined upfront, so no dynamic scaling based on the actual user load. Based on Tags on the Azure Virtual Machines we defined the different Start/Stop Profiles and based on those profiles servers were stopped and started on pre-defined times. The tags on the virtual machines needed to rotate so that not always the same servers will stay on and every server gets a shutdown. After building this solution we faced a strange error.
When applying tags to the RD Session Hosts virtual machines in Azure in some cases this lead to an unexpected scheduled shutdown. The shutdown was initiated by the user ‘NT AUTHORITY\SYSTEM’ and the process ‘c:\windows\system32\svchost.exe’. So changing the tag on the Virtual Machine caused a reboot. Based on our own analysis this reboot was not initiated by software installed on the Session Hosts. After that conclusion we’ve logged a case at Microsoft Support. The troubleshooting process did take some time because it was not reproducible on ‘non-RDSH’ servers. Also on our test and acceptation environment we did not seeing these reboots.
Last week Microsoft communicated that they had found the bug and solved it. The following was communicated:
‘The Microsoft Azure team has concluded our investigation of the reboots. We identified that a software defect signaled a reboot command to the VM. Our engineering team has deployed a worldwide fix to prevent further unnecessary reboots caused by this bug.’
After this worldwide fix we’re back in the process of testing the runbook. The first results are promising since we don’t have any reboots. If you’re seeing these behavior please log a case at Microsoft so they can look into it. If you need more information on this case please let me know!
Thanks to Microsoft for solving this issue!