Last month I created an OpsMgr ACS Collector failover script. This script worked but was not able to failover the ACS Collector role in all scenarios. This was mainly based on the fact that I used the existing ACS collector service monitor. This monitor checks the AdtServer service and alert when this service fails. Only when the complete server is down this monitor goes into a gray state instead of critical state. So I decided to add some additional components to the failover script. An updated version of the failover script (management pack) can be at the end of this blogpost.
The first conclusion is that we can’t re-use the existing monitors targeted against an ACS collector. The reason is that those are going into a gray state when a complete host is down. I wanted to create a failover solution which is as clean as possible so no additional classes, discoveries, etc. I created a new PowerShell Failover monitor targeted against the existing ACS Collector class. It monitors the state of the collector service of the oppoisities ACS collector. The image below shows the way this monitor is working:
With the above configuration the monitor will also detect a complete down state of the host and most importantly it will failover the ACS collector functionality to the standby host/collector. The monitor determines the active collector based on a registry key which is available on both ACS collectors. The key needs to be created manually before the ACS failover management pack is imported.
The original failover scripts is now configured as a recovery for this new created monitor. Both the failover monitor and the recovery script are completely PowerShell based and uses PowerShell remoting. So PowerShell remoting must be configured before you import this management pack.
The Failover monitor can have the following health states:
|Healthy||Disabled Collector! AdtServer Service State is Disabled!||The monitored collector is standby and correctly disabled|
|Healthy||Active Collector is UP! AdtServer Service State is running!||The monitored collector is active and correctly configured|
|Warning||Disabled collector inaccessible!||The monitored collector is standby but cannot be accessed|
|Critical||Active Collector is UP! AdtServer Service State is stopped!||The monitored collector is the active collector but the service is stopped. Failover will be executed|
|Critical||Active Collector inaccessible!||The monitored collector is the active collector but not accessible. Failover will be executed|
I also updated the failover script presented in this blogpost. The following improvements are made:
- All tasks are now using PowerShell remoting (same as the monitor)
- The registry key used by the monitor to determine the active collector is changed by the failover script
- The failover script supports a failover from a complete down ACS collector
- Startup type of the ACS collector is changed from ‘Automatic’ to ‘Automatic (delayed)’
When a failover is done from a complete down ACS collector some manual actions need to be performed when this failed collector is coming back online. The following actions need to be performed:
- Manual disable the AdtServer service
- Manual change the registry key used by the monitor to determine the active collector.
The following steps need to be performed to activate this ACS collector failover solution:
- Configure both ACS Collectors as described here
- Enable PowerShell remoting between both ACS controllers
- Create the following the following registry key and value: [HKEY_LOCAL_MACHINE\SOFTWARE\SCOM\ACSCollector] -> “Active” (REG_SZ) =”<< ACTIVE COLLECTOR >>”
- Download the ACS Failover Collector Management Pack from this updated article
- Import the Management Pack in your OpsMgr environment
- Test the above solution first in your test environment
- The above ACS collector failover solution supports a failover scenario between 2 ACS collectors.
If you have any questions please regarding this management pack please let me know!