Control SCOM Maintenance Mode from the Agent with SCOMAgentHelper Management Pack



Do you need to control maintenance mode from the agent-managed computer?

Previously your options to control maintenance mode from the agent side were limited. There was a PowerShell command introduced in 2016 referenced here but it is very clunky. You have to create an override for a rule and then manually import the module from a DLL. Ewwww. The command would write an entry to the registry, then you had to wait around for the entry to become noticed by the agent. The agent would eventually enter maintenance mode. There was no way to check/verify maintenance mode status from the agent.



Introducing: SCOMAgentHelper

This management pack will provide every agent-managed computer with a set of PowerShell commands to control and verify maintenance mode for the Windows Computer object.



How does it work?

The management pack contains a PowerShell module, “SCOMAgentHelper”. Among the many useful tools in the module are two specifically for maintenance mode:

Clear-SCOMCache
Compare-String
Export-SCOMEventsToCSV
Fast-Ping
Get-SCOMAgentMaintenanceModeStatus
Get-StringHash
Ping-AllHosts
Set-SCOMAgentMaintenanceModeStatus
Show-SCOMPropertyBag
Start-SCOMTrace
Test-Port

The PowerShell module will be deployed automatically to the standard module path:
C:\Program Files\WindowsPowerShell\Modules

Once the module is available, you simply use the PowerShell commands.

Note: as of PowerShell v3 you no longer have to explicitly import modules. They should be imported automatically whenever you attempt to use commands contained in them.

The agent and management server basically talk to each other through events written to the agent Operations Manager event log.

Here’s a description of the sequence when MM is enabled:
Example:
1) Use the PowerShell command to begin maintenance mode:
Set-SCOMAgentMaintenanceModeStatus -Start

# Example
Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 30 -Reason UnplannedOther -Comment "Emergency patch applied" -Verbose -Verify Workflows

2) A specific event is written to the OpsMan log which indicates a request to begin maintenance mode for the Windows Computer object.

3) A rule running on the agent machine detects the event, then triggers a write action on the management server which places the agent Computer object and (all contained objects) into maintenance mode.

4) The PowerShell command will wait to verify that workflows have unloaded on the agent. Then it will display the current maintenance mode status to the screen.

The command now includes the “-Verify” parameter which controls how the maintenance mode status is verified.
Your options are: None, Workflows, and MgmtPerspectiveOnly.

  • None – The event will be written to the local agent OpsMan event log. No verification will take place.
  • Workflows – This will verify that the workflows on the agent have actually stopped/started.
  • MgmtPerspectiveOnly – The Write Action will trigger the maintenance mode window and then verify that the mgmt server can detect that the agent status is truly in maintenance mode. (sometimes it takes a few seconds for the database, via the SDK, to reflect the true status.) The Write Action will then write an event to the agent OpsMan event log (with an agent task) which indicates the true status from the management server’s perspective. This does not verify if the agent altered any workflow activity. This is useful if you simply want to reboot the computer without risking Heartbeat alerts from the corresponding Health Service Watcher object(s).



Where do I start?

  1. Import the management pack .mpb file. After a few minutes the PowerShell module should appear at this path on your agent-managed computers:
    C:\Program Files\WindowsPowerShell\Modules\SCOMAgentHelper

The deployment rule interval is 86400 seconds (1 day) but no SyncTime parameter is used in so it should deploy shortly after the agent downloads the new configuration and activates it. (Look for events 1204 and 1210 in the event log.) If you experience problems or the module does not appear like it should, you can override the deployment rule (Deploy SCOMAgentHelper PowerShell Module .PSX1 Files Rule) set WriteToEventLog = true. If you still cannot solve the issue, enable the alternative deployment rule (Deploy SCOMAgentHelper PowerShell Module .ZIP Rule). These two deployment rules use different methods to write the module files to the standard path.

2) Run the command:

# Example
Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 15 -Verify MgmtPerspectiveOnly
The event is written to the log.
The management server soon initiates maintenance mode for the Computer.


Can I extend or modify an existing maintenance mode window?

Yes. Simply use the “-ForceUpdate” switch. The following example will set the new “end” time to be 90 minutes from now and will verify the change by the mgmt server.
Note: It would be inappropriate to use the “-Verify Workflows” parameter value because if the object is already in maintenance mode then no workflow activity would occur; all relevant workflows would already be stopped.

Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 90 -Verify MgmtPerspectiveOnly -ForceUpdate -Verbose



If the Computer is in maintenance mode, how can I end the maintenance mode?

#Example1, will verify that workflows resume
Set-SCOMAgentMaintenanceModeStatus -End -Verify Workflows

#Example2, with verbose output
Set-SCOMAgentMaintenanceModeStatus -End -Verbose

This command basically performs the same steps as the start sequence above except this causes the agent to exit the maintenance window. It will also verify that the relevant workflows have resumed on the agent.



How can I check if a Computer is already in maintenance mode?

Get-SCOMAgentMaintenanceModeStatus -Verbose

I highly recommend use of the -Verbose switch for your viewing pleasure.


How does the agent continue to process rules if it is in maintenance mode?

The workflows involved target a special class called:
Microsoft.SystemCenter.ManagementService
An instance of this class lives on every computer. This appears to be the only instance on the Windows Computer that does not enter maintenance mode. What I mean by this is that workflows which target this special class are not affected when the Windows Computer is placed into maintenance mode. I believe it is the purpose of this class instance to remain awake as a viable target for workflows while all other instances sit dormant during the maintenance window. Think of this instance as “the butler”. Below is a graph of how it fits into the big picture. (graph index here)

Notice that this special class is not dependent on the Windows Computer class like HealthService is.


Where can I see maintenance mode activity history from this management pack?


The Event View will display all recent SCOMAgentHelper module activity.



This is cool! What’s the catch?

This approach relies on the parent mgmt server to execute a scripted workflow. If you cause a significant number of agents to trigger maintenance mode at the same time, you could really abuse your mgmt server as PowerShell workflows can be expensive. How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.

How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.



How can I distribute the activity/load so I don’t cripple my management server(s)?

If you find that your management servers are suffering from large quantities of agents all triggering at the same time, here’s one creative idea; randomize the initialization for the agents. Use a simple command like the following to momentarily delay the start of the MM action for a random period:

# Will trigger after a random delay of 0-10 minutes.
Start-Sleep -Seconds (Get-Random -Minimum 0 -Maximum 601); Set-SCOMAgentMaintenanceModeStatus -DurationMinutes 90 -Verbose -Verify MgmtPerspectiveOnly

In addition, there’s an override for the detection rule which will pause the Write Action on the management server for a random period from 0-x seconds. This will help spread out the load on the mgmt server and OpsDB.
Rule: Detect Agent MM Toggle Event Rule
Parameter: SpreadInitilizationOverIntervalSeconds
Default Value: 0
However, this will cause the PowerShell process to remain active/open for just that much longer. In theory, if you trigger enough agents at the same time, it’s possible that you might reach the PSScriptLimit and/or PSQueueMinutes limit of the HealthService on the management server. (These limits can be modified in the registry.) How many is too many? It will depend on your management group. Test this thoroughly at your own risk.



What other cool stuff is in the SCOMAgentHelper PowerShell module?

I’m glad you asked. Have a look for yourself with these commands:

# Show available commands
Get-Command -Module SCOMAgentHelper


# Show HELP document for a command
Get-Help Set-SCOMAgentMaintenanceModeStatus -Full



Is this PowerShell module available outside of this MP?

Since the core functionality of the module requires all of the other SCOM workflows contained in the management pack, I likely won’t publish this elsewhere. However, be sure to check out the SCOMHelper module. It’s packed with many of the same useful tools and more.

What is the difference between SCOMAgentHelper and SCOMHelper?

SCOMAgentHelper – This is a PowerShell module which includes the maintenance mode functions in addition to a few other useful functions (that also appear in SCOMHelper). It is not available on PowerShellGallery.com. This PowerShell module is distributed through a workflow within a sealed management pack for 2 reasons:

1) This is very convenient to deliver the PowerShell module to all of your agent-managed machines.
2) The MM functions should only ever be used on an agent-managed machine, never a mgmt server.

SCOMHelper This is primarily for SCOM admin tasks and is typically installed on a management server or wherever you might have the Console/OperationsManager PowerShell Module installed. You have a couple of options for installation which are described near the bottom of the article. The easiest method is from the command line which will automatically download and install from PowerShellGallery.com (if you have the required posh version (with PowerShellGet module) and internet access from the server). Otherwise, manual installation is super easy too.


Troubleshooting

Any Errors: Multiple ambiguous overloads found for “GetMonitoringObjects” and the argument count: “1”. Exception calling “.ctor” with “2” argument(s): “Value cannot be null.
Parameter name: managementPackClass” The term ‘Get-SCClass’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. Could not load file or assembly ” or one of its dependencies. Catastrophic failure (Exception from HRESULT: 0x8000FFFF (E_UNEXPECTED))

This server simply needed a reboot.

How can you tell if a server needs a reboot. Here’s one way by using PowerShell:

Function Test-PendingReboot 
{
  if (Get-ChildItem 'HKLM:\Software\Microsoft\Windows\CurrentVersion\Component Based Servicing\RebootPending' -EA Ignore) 
  {
    return $true 
  }
  if (Get-Item 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired' -EA Ignore) 
  {
    return $true 
  }
  if (Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager' -Name PendingFileRenameOperations -EA Ignore) 
  {
    return $true 
  }
  try 
  { 
    $util = [wmiclass]'\\.\root\ccm\clientsdk:CCM_ClientUtilities'
    $status = $util.DetermineIfRebootPending()
    if (($status -ne $null) -and $status.RebootPending) 
    {
      return $true
    }
  }
  catch 
  {
 
  }

  return $false
}

Test-PendingReboot



Version History:

2023.01.23.1044 – SCOMTrace.ps1: Fixed CopyToThisRemotePath copy failure. (cast to boolean False if no value provided)
1.0.0.18 – 2022.04.29.1511 – Modified zip procedure to include only .log files
1.0.0.17 – 2022.04.20.1427 – Added FileTransport integration to SCOMTrace.ps1 and SCOMTrace.WA
1.0.0.15? – 2022.03.04 – Modified SCOMTrace.ps1 slightly to only delete trace files upon successful copy to destination.

1.0.0.13 – Small tweaks to tracing. Fixed ‘DurationMinutesMaxAllowed’ selector name in override.
1.0.0.11 –
Added tracing function.
Added override to control max MM duration allowed.
1.0.0.10 –
Added agent status verification functionality.
Added ability to update existing/active MM window.
Improved logging.
1.0.0.4 – 2020.07.31 – small tweaks
1.0.0.2 – 2020.07.31 – small tweaks
1.0.0.1 – 2020.07.31 – small tweaks
1.0.0.0 – v1 – 2020.07.30

Note: Please don’t automate downloads.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

16 Replies to “Control SCOM Maintenance Mode from the Agent with SCOMAgentHelper Management Pack”

  1. Absolutely brilliant, Tyson! We’ve been looking for something similar to this for a couple months now. If I might ask, would it be possible to run this without unloading the workloads? The reason I’m asking is that we’ve been looking for a workable shutdown script (via GPO) which would automatically put any server into maintenance mode whenever it goes through a controlled shutdown, with the idea being that a “Failed Heartbeat” alert should ideally only trigger when a server crashes.

    Would it be possible to run the command without unloading the SCOM workloads? When we run this as a shutdown script, the server is in the process of going offline, so I wouldn’t think unloading process would be needed, and I’m also thinking that would be speed up the reboot & maintenance mode process overall.

    1. I’m approaching this solution with automation in mind and you make a good point. I’ve received a ton of helpful feedback in the brief time since I posted this first version (1.0.0.4) so I think you’ll like the next version which I am currently working on. It would be extremely easy to simply trigger maintenance mode on the mgmt server and end right there. It’s a tremendous amount of work and tricky dev to build in the ability to specify multiple degrees of verification (from both mgmt server AND agent perspective) combined with the ability to update/modify any existing/current maintenance mode window for the agent.

      Features I’m working on:

        Degrees of Verification: None, MgmtServerPerspectiveOnly, Workflows
        Option to force update of existing MM period. Make the existing MM window agree with the new DurationMinutes.
        Reduced chatter between agent/mgmtserver. Streamlined process. Faster

      Stay tuned.

  2. Hey Tyson, great work.
    I have a question though, if you trigger MM from the local agent and are setting a time for the agent to be in MM, surely you don’t need to then end the MM via the local call (unless you want to take it out before the set time)
    We are looking for a better way to put agents in MM when they are going to reboot after WSUS, we cant use maint windows in SCCM as these are every weekend and not just the weekend patching is done.
    So having this option to trigger it locally (hopefully using SCCM to initiate it once the patches are applied and reboot is due) could solve our problems of heartbeat alert storms during patch weekend…

    1. @Anthony,
      Check it out now. v 1.0.0.10. Use parameter -Verify MgmtPerspectiveOnly Be sure to test thoroughly in non-prod.

  3. Awesome Work Tyson!

    Putting a SCOM agent into Maintenance Mode using PowerShell directly from the server itself is a big time help.

  4. Hi Tyson, this module is great and an excellent way to initiate Maintenance Mode from the Agent-managed server itself!
    I’ve been able to successfully use the Agent Maintenance Mode in my lab on recent Windows Server servers and that works well.

    Unfortunately is seems to require PowerShell v3 or higher (due to the use of the CmdletBinding Attributes ‘PositionalBinding’ and ‘HelpUri’ in both the provided SCOMAgentHelper Module and the Management Pack WriteAction Modules), which make things fail on OS versions prior to Windows Server 2012 (or prior to 2008 when Windows Management Framework 3.0 is installed).

  5. Does it work with agents behind Gateway Servers?
    Especially stopping Maintenance Mode is impossible with the built-in commands.

  6. Hi Tyson,

    thank you very much for this great PS module! But I also have a question: Is it possible to put other systems in maintenance mode with the command Set-SCOMAgentMaintenanceModeStatus? So not only the server where the cmdlet is executed? For example, if you want to put a whole bunch of systems into maintenance from one server? Unfortunately it is not an option to start the maintenance directly on the SCOM Mgmt. servers and there is no direct TCP connection from the agents to them. In addition, SCOM gateway servers are used in this construct.

    Best regards
    Niels

    1. @Niels,
      Technically you CAN initiate maintenance mode in this way by using PowerShell remoting. See example below. I’ve initiated maintenance mode from a domain member server to 3 servers total.

      $cred = Get-Credential
      $session = New-PSSession -ComputerName "LOCALHOST","DevDB01.contoso.com","2016b.contoso.com" -Credential $cred
      Invoke-Command -Session $session -ScriptBlock { Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 10 -Reason PlannedApplicationMaintenance }

      PowerShell Remoting

      1. Hi Tyson,

        thanks for your super fast feedback! Something like that I had already thought of as a solution in my mind. Thanks also for the code example. Works like a charm for me and I really appreciate your support!

        Best regards
        Niels

  7. Hi Tyson,

    Not sure if this is still an active page your monitoring, but if you are wondering if you might be able to help or have a suggestion. I have tried out your MP and it works great, love the verify part. I currently have a use case of needing to shutdown servers and place them into MM as well which your MP works great for, however when starting back up we have servers with services that are placed into MM long term and need them to stay in MM, but the end function for your MP is recursive and pull everything out. Wondering if you have any suggestions?

    1. @Steven, thanks for the feedback. I really do appreciate it. I had seen your comment a while ago but I didn’t get a chance to focus on a solution. Turns out, I still don’t have much free time. I’m open to suggestions.

Leave a Reply

Your email address will not be published. Required fields are marked *