Control SCOM Maintenance Mode from the Agent with SCOMAgentHelper Management Pack



Do you need to control maintenance mode from the agent-managed computer?

UPDATE: Immediately after posting this I received a ton of excellent suggestions for additional functionality. I have since improved functionality quite a bit but it will require a rip/replace of the previous version.

Previously your options to control maintenance mode from the agent side were limited. There was a PowerShell command introduced in 2016 referenced here but it is very clunky. You have to create an override for a rule and then manually import the module from a DLL. Ewwww. The command would write an entry to the registry, then you had to wait around for the entry to become noticed by the agent. The agent would eventually enter maintenance mode. There was no way to check/verify maintenance mode status from the agent.



Introducing: SCOMAgentHelper

This management pack will provide every agent-managed computer with a set of PowerShell commands to control and verify maintenance mode for the Windows Computer object.



How does it work?

The management pack contains a PowerShell module, “SCOMAgentHelper”. Among the many useful tools in the module are two specifically for maintenance mode:

Clear-SCOMCache
Compare-String
Export-SCOMEventsToCSV
Fast-Ping
Get-SCOMAgentMaintenanceModeStatus
Get-StringHash
Ping-AllHosts
Set-SCOMAgentMaintenanceModeStatus
Show-SCOMPropertyBag
Test-Port

The PowerShell module will be deployed automatically to the standard module path:
C:\Program Files\WindowsPowerShell\Modules

Once the module is available, you simply use the PowerShell commands.

Note: as of PowerShell v3 you no longer have to explicitly import modules. They should be imported automatically whenever you attempt to use commands contained in them.

The agent and management server basically talk to each other through events written to the agent Operations Manager event log.

Here’s a description of the sequence when MM is enabled:
Example:
1) Use the PowerShell command to begin maintenance mode:
Set-SCOMAgentMaintenanceModeStatus -Start

# Example
Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 30 -Reason UnplannedOther -Comment "Emergency patch applied" -Verbose -Verify Workflows

2) A specific event is written to the OpsMan log which indicates a request to begin maintenance mode for the Windows Computer object.

3) A rule running on the agent machine detects the event, then triggers a write action on the management server which places the agent Computer object and all other contained objects into maintenance mode.

4) The PowerShell command will wait to verify that workflows have unloaded on the agent. Then it will display the current maintenance mode status to the screen.

Note: The command now includes the “-Verify” parameter which controls how the maintenance mode status is verified.
Your options are: None, Workflows, and MgmtPerspectiveOnly.

None – The event will be written to the local agent OpsMan event log. No verification will take place.

MgmtPerspectiveOnly – The Write Action will trigger the maintenance mode window and then verify that the mgmt server can detect that the agent status is truly in maintenance mode. (sometimes it takes a few seconds for the database, via the SDK, to reflect the true status.) The Write Action will then write an event to the agent OpsMan event log (with an agent task) which indicates the true status from the management server’s perspective. This does not verify if the agent altered any workflow activity. This is useful if you simply want to reboot the computer without risking Heartbeat alerts from the corresponding Health Service Watcher object(s).

Workflows – This will verify that the workflows on the agent have actually stopped/started.



Where do I start?

  1. Import the management pack .mpb file. After a few minutes the PowerShell module should appear at this path on your agent-managed computers:
    C:\Program Files\WindowsPowerShell\Modules\SCOMAgentHelper

The deployment rule interval is 86400 seconds (1 day) but no SyncTime parameter is used in so it should deploy shortly after the agent downloads the new configuration and activates it. (Look for events 1204 and 1210 in the event log.) If you experience problems or the module does not appear like it should, you can override the deployment rule (Deploy SCOMAgentHelper PowerShell Module .PSX1 Files Rule) set WriteToEventLog = true. If you still cannot solve the issue, enable the alternative deployment rule (Deploy SCOMAgentHelper PowerShell Module .ZIP Rule). These two deployment rules use different methods to write the module files to the standard path.

2) Run the command:

# Example
Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 15 -Verify MgmtPerspectiveOnly
The event is written to the log.
The management server soon initiates maintenance mode for the Computer.


Can I extend or modify an existing maintenance mode window?

Yes. Simply use the “-ForceUpdate” switch. The following example will set the new “end” time to be 90 minutes from now and will verify the change by the mgmt server.
Note: It would be inappropriate to use the “-Verify Workflows” parameter value because if the object is already in maintenance mode then no workflow activity would occur; all relevant workflows would already be stopped.

Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 90 -Verify MgmtPerspectiveOnly -ForceUpdate -Verbose



If the Computer is in maintenance mode, how can I end the maintenance mode?

#Example
Set-SCOMAgentMaintenanceModeStatus -End -Verify Workflows

This command basically performs the same steps as the start sequence above except this causes the agent to exit the maintenance window. It will also verify that the relevant workflows have resumed on the agent.



How can I check if a Computer is already in maintenance mode?

Get-SCOMAgentMaintenanceModeStatus -Verbose



How do I end a maintenance window?

Run the command:

Set-SCOMAgentMaintenanceModeStatus -End -Verbose

I highly recommend use of the -Verbose switch for your viewing pleasure.


How does the agent continue to process rules if it is in maintenance mode?

The workflows involved target a special class called:
Microsoft.SystemCenter.ManagementService
An instance of this class lives on every computer. This appears to be the only instance on the Windows Computer that does not enter maintenance mode. What I mean by this is that workflows which target this special class are not affected when the Windows Computer is placed into maintenance mode. I believe it is the purpose of this class instance to remain awake as a viable target for workflows while all other instances sit dormant during the maintenance window. Think of this instance as “the butler”. Below is a graph of how it fits into the big picture. (graph index here)

Notice that this special class is not dependent on the Windows Computer class like HealthService is.


Where can I see maintenance mode activity history from this management pack?


The Event View will display all recent SCOMAgentHelper module activity.



This is cool! What’s the catch?

This approach relies on the parent mgmt server to execute a scripted workflow. If you cause a significant number of agents to trigger maintenance mode at the same time, you could really abuse your mgmt server as PowerShell workflows can be expensive. How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.

How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.



How can I distribute the activity/load so I don’t cripple my management server(s)?

If you find that your management servers are suffering from large quantities of agents all triggering at the same time, here’s one creative idea; randomize the initialization for the agents. Use a simple command like the following to momentarily delay the start of the MM action for a random period:

# Will trigger after a random delay of 0-10 minutes.
Start-Sleep -Seconds (Get-Random -Minimum 0 -Maximum 601); Set-SCOMAgentMaintenanceModeStatus -DurationMinutes 90 -Verbose -Verify MgmtPerspectiveOnly

In addition, there’s an override for the detection rule which will pause the Write Action on the management server for a random period from 0-x seconds. This will help spread out the load on the mgmt server and OpsDB.
Rule: Detect Agent MM Toggle Event Rule
Parameter: SpreadInitilizationOverIntervalSeconds
Default Value: 0
However, this will cause the PowerShell process to remain active/open for just that much longer. In theory, if you trigger enough agents at the same time, it’s possible that you might reach the PSScriptLimit and/or PSQueueMinutes limit of the HealthService on the management server. (These limits can be modified in the registry.) How many is too many? It will depend on your management group. Test this thoroughly at your own risk.



What other cool stuff is in the SCOMAgentHelper PowerShell module?

I’m glad you asked. Have a look for yourself with these commands:

# Show available commands
Get-Command -Module SCOMAgentHelper


# Show HELP document for a command
Get-Help Set-SCOMAgentMaintenanceModeStatus -Full



Is this PowerShell module available outside of this MP?

Since the core functionality of the module requires all of the other SCOM workflows contained in the management pack, I likely won’t publish this elsewhere. However, be sure to check out the SCOMHelper module. It’s packed with many of the same useful tools and more.

Download “SCOMAgentHelper Management Pack” SCOMAgentHelper.1.0.0.10.zip – Downloaded 86 times – 50 KB

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

10 Replies to “Control SCOM Maintenance Mode from the Agent with SCOMAgentHelper Management Pack”

  1. Absolutely brilliant, Tyson! We’ve been looking for something similar to this for a couple months now. If I might ask, would it be possible to run this without unloading the workloads? The reason I’m asking is that we’ve been looking for a workable shutdown script (via GPO) which would automatically put any server into maintenance mode whenever it goes through a controlled shutdown, with the idea being that a “Failed Heartbeat” alert should ideally only trigger when a server crashes.

    Would it be possible to run the command without unloading the SCOM workloads? When we run this as a shutdown script, the server is in the process of going offline, so I wouldn’t think unloading process would be needed, and I’m also thinking that would be speed up the reboot & maintenance mode process overall.

    1. I’m approaching this solution with automation in mind and you make a good point. I’ve received a ton of helpful feedback in the brief time since I posted this first version (1.0.0.4) so I think you’ll like the next version which I am currently working on. It would be extremely easy to simply trigger maintenance mode on the mgmt server and end right there. It’s a tremendous amount of work and tricky dev to build in the ability to specify multiple degrees of verification (from both mgmt server AND agent perspective) combined with the ability to update/modify any existing/current maintenance mode window for the agent.

      Features I’m working on:

        Degrees of Verification: None, MgmtServerPerspectiveOnly, Workflows
        Option to force update of existing MM period. Make the existing MM window agree with the new DurationMinutes.
        Reduced chatter between agent/mgmtserver. Streamlined process. Faster

      Stay tuned.

  2. Hey Tyson, great work.
    I have a question though, if you trigger MM from the local agent and are setting a time for the agent to be in MM, surely you don’t need to then end the MM via the local call (unless you want to take it out before the set time)
    We are looking for a better way to put agents in MM when they are going to reboot after WSUS, we cant use maint windows in SCCM as these are every weekend and not just the weekend patching is done.
    So having this option to trigger it locally (hopefully using SCCM to initiate it once the patches are applied and reboot is due) could solve our problems of heartbeat alert storms during patch weekend…

  3. Awesome Work Tyson!

    Putting a SCOM agent into Maintenance Mode using PowerShell directly from the server itself is a big time help.

  4. Hi Tyson, this module is great and an excellent way to initiate Maintenance Mode from the Agent-managed server itself!
    I’ve been able to successfully use the Agent Maintenance Mode in my lab on recent Windows Server servers and that works well.

    Unfortunately is seems to require PowerShell v3 or higher (due to the use of the CmdletBinding Attributes ‘PositionalBinding’ and ‘HelpUri’ in both the provided SCOMAgentHelper Module and the Management Pack WriteAction Modules), which make things fail on OS versions prior to Windows Server 2012 (or prior to 2008 when Windows Management Framework 3.0 is installed).

Leave a Reply

Your email address will not be published. Required fields are marked *