Do you need to control maintenance mode from the agent-managed computer?
UPDATE: Immediately after posting this I received a ton of excellent suggestions for additional functionality. I have since improved functionality quite a bit but it will require a rip/replace of the previous version.
Previously your options to control maintenance mode from the agent side were limited. There was a PowerShell command introduced in 2016 referenced here but it is very clunky. You have to create an override for a rule and then manually import the module from a DLL. Ewwww. The command would write an entry to the registry, then you had to wait around for the entry to become noticed by the agent. The agent would eventually enter maintenance mode. There was no way to check/verify maintenance mode status from the agent.
This management pack will provide every agent-managed computer with a set of PowerShell commands to control and verify maintenance mode for the Windows Computer object.
How does it work?
The management pack contains a PowerShell module, “SCOMAgentHelper”. Among the many useful tools in the module are two specifically for maintenance mode:
The PowerShell module will be deployed automatically to the standard module path:
Once the module is available, you simply use the PowerShell commands.
Note: as of PowerShell v3 you no longer have to explicitly import modules. They should be imported automatically whenever you attempt to use commands contained in them.
The agent and management server basically talk to each other through events written to the agent Operations Manager event log.
Here’s a description of the sequence when MM is enabled:
1) Use the PowerShell command to begin maintenance mode:
# Example Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 30 -Reason UnplannedOther -Comment "Emergency patch applied" -Verbose -Verify Workflows
2) A specific event is written to the OpsMan log which indicates a request to begin maintenance mode for the Windows Computer object.
3) A rule running on the agent machine detects the event, then triggers a write action on the management server which places the agent Computer object and all other contained objects into maintenance mode.
4) The PowerShell command will wait to verify that workflows have unloaded on the agent. Then it will display the current maintenance mode status to the screen.
Note: The command now includes the “-Verify” parameter which controls how the maintenance mode status is verified.
Your options are: None, Workflows, and MgmtPerspectiveOnly.
None – The event will be written to the local agent OpsMan event log. No verification will take place.
MgmtPerspectiveOnly – The Write Action will trigger the maintenance mode window and then verify that the mgmt server can detect that the agent status is truly in maintenance mode. (sometimes it takes a few seconds for the database, via the SDK, to reflect the true status.) The Write Action will then write an event to the agent OpsMan event log (with an agent task) which indicates the true status from the management server’s perspective. This does not verify if the agent altered any workflow activity. This is useful if you simply want to reboot the computer without risking Heartbeat alerts from the corresponding Health Service Watcher object(s).
Workflows – This will verify that the workflows on the agent have actually stopped/started.
Where do I start?
- Import the management pack .mpb file. After a few minutes the PowerShell module should appear at this path on your agent-managed computers:
The deployment rule interval is 86400 seconds (1 day) but no SyncTime parameter is used in so it should deploy shortly after the agent downloads the new configuration and activates it. (Look for events 1204 and 1210 in the event log.) If you experience problems or the module does not appear like it should, you can override the deployment rule (Deploy SCOMAgentHelper PowerShell Module .PSX1 Files Rule) set WriteToEventLog = true. If you still cannot solve the issue, enable the alternative deployment rule (Deploy SCOMAgentHelper PowerShell Module .ZIP Rule). These two deployment rules use different methods to write the module files to the standard path.
2) Run the command:
# Example Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 15 -Verify MgmtPerspectiveOnly
Can I extend or modify an existing maintenance mode window?
Yes. Simply use the “-ForceUpdate” switch. The following example will set the new “end” time to be 90 minutes from now and will verify the change by the mgmt server.
Note: It would be inappropriate to use the “-Verify Workflows” parameter value because if the object is already in maintenance mode then no workflow activity would occur; all relevant workflows would already be stopped.
Set-SCOMAgentMaintenanceModeStatus -Start -DurationMinutes 90 -Verify MgmtPerspectiveOnly -ForceUpdate -Verbose
If the Computer is in maintenance mode, how can I end the maintenance mode?
#Example Set-SCOMAgentMaintenanceModeStatus -End -Verify Workflows
This command basically performs the same steps as the start sequence above except this causes the agent to exit the maintenance window. It will also verify that the relevant workflows have resumed on the agent.
How can I check if a Computer is already in maintenance mode?
How do I end a maintenance window?
Run the command:
Set-SCOMAgentMaintenanceModeStatus -End -Verbose
I highly recommend use of the -Verbose switch for your viewing pleasure.
How does the agent continue to process rules if it is in maintenance mode?
The workflows involved target a special class called:
An instance of this class lives on every computer. This appears to be the only instance on the Windows Computer that does not enter maintenance mode. What I mean by this is that workflows which target this special class are not affected when the Windows Computer is placed into maintenance mode. I believe it is the purpose of this class instance to remain awake as a viable target for workflows while all other instances sit dormant during the maintenance window. Think of this instance as “the butler”. Below is a graph of how it fits into the big picture. (graph index here)
Notice that this special class is not dependent on the Windows Computer class like HealthService is.
Where can I see maintenance mode activity history from this management pack?
The Event View will display all recent SCOMAgentHelper module activity.
This is cool! What’s the catch?
This approach relies on the parent mgmt server to execute a scripted workflow. If you cause a significant number of agents to trigger maintenance mode at the same time, you could really abuse your mgmt server as PowerShell workflows can be expensive. How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.
How many is “too many?” This depends entirely on your environment. Be sure to test this before using in a production environment.
How can I distribute the activity/load so I don’t cripple my management server(s)?
If you find that your management servers are suffering from large quantities of agents all triggering at the same time, here’s one creative idea; randomize the initialization for the agents. Use a simple command like the following to momentarily delay the start of the MM action for a random period:
# Will trigger after a random delay of 0-10 minutes. Start-Sleep -Seconds (Get-Random -Minimum 0 -Maximum 601); Set-SCOMAgentMaintenanceModeStatus -DurationMinutes 90 -Verbose -Verify MgmtPerspectiveOnly
In addition, there’s an override for the detection rule which will pause the Write Action on the management server for a random period from 0-x seconds. This will help spread out the load on the mgmt server and OpsDB.
Rule: Detect Agent MM Toggle Event Rule
Default Value: 0
However, this will cause the PowerShell process to remain active/open for just that much longer. In theory, if you trigger enough agents at the same time, it’s possible that you might reach the PSScriptLimit and/or PSQueueMinutes limit of the HealthService on the management server. (These limits can be modified in the registry.) How many is too many? It will depend on your management group. Test this thoroughly at your own risk.
What other cool stuff is in the SCOMAgentHelper PowerShell module?
I’m glad you asked. Have a look for yourself with these commands:
# Show available commands Get-Command -Module SCOMAgentHelper # Show HELP document for a command Get-Help Set-SCOMAgentMaintenanceModeStatus -Full
Is this PowerShell module available outside of this MP?
Since the core functionality of the module requires all of the other SCOM workflows contained in the management pack, I likely won’t publish this elsewhere. However, be sure to check out the SCOMHelper module. It’s packed with many of the same useful tools and more.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.