A couple of weeks ago I posted my ACS collector automatic failover script. I have implemented this solution in my current ACS implementation and based on that implementation I have 2 tips which avoids two active controllers.
It’s important to note that you don’t want to have 2 active controllers running and writing events to the database. The problem with this situation is that you don’t see any errors in your event logs but some terrible things are happening in the database. For each active collector a partition is created in the database. So for each day running 2 collectors you will end up with 2 partitions. As you can see in the picture below the partitions do not contain valid values.
The above scenario was caused by a failover of the active collector to the standby collector. After this failover the active controller was rebooted and activated. But no action was taken to disable the Collector service manually. In fact this server had a blue screen was restarted automatically. So If you have configured my ACS automatic failover script I advise you to disable the automatic reboot after a blue screen. This need to be done on both collectors:
- Open the System Properties of the server
- Go to Advaced and click on the Settings button of ‘Startup and Recovery’.
- Disable the Automatic Restart option. See Image at the right side.
Another improvement which I made to my ACS collector failover management pack was setting the SuspressionSettings value in the UnitMonitorType. This is an optional parameter which can be set. When it’s set it indicates whether multiple matches are required before a data item is output. I added this setting with a value of 2. This means that 2 critical results are needed to set the monitor in an unhealthy state. A failover will take some time but it reduces the number of false positives. An updated version can be downloaded here.