Have you ever encountered a question on a certification test that includes the statement, “Your answer must incur the least administrative cost?” There might be a couple of responses that are technically correct, but there’s one answer that represents the “best” solution because it’s the least costly.
This Management Pack is our best effort to create a “less costly” option for managing notifications in your Operations Manager environment. The general guidance is to work through every Management Pack and create overrides to disable alerts or adjust thresholds as needed.
Introduction
In this Management Pack, we take a “bigger picture” look at the way that Operations Manager handles alerts, and especially the way that OpsMgr handles notifications. We do this by taking advantage of OpsMgr’s Alert Resolution States. Think of a Resolution State as a waypoint on an alert’s journey through OpsMgr. We’ll introduce alert workflows as a way to “guide” an Alert along that journey. Think of this Management Pack as a navigation system for alerts.
IMPORTANT: You STILL need to tune your management packs! We’re giving OpsMgr admins a tool to help manage alerts, so that you don’t end up spamming your administrators with unnecessary notifications.
This Series
This is the first of four related articles discussing the features and capabilities of the SCOM Alert Management Pack. The other articles in the series will discuss how to enable each feature in more detail.
The SCOM Alert Management Pack
The SCOM Alert Management Pack combines three different features into one overall alert management solution:
- Alert Ownership Assignment
- Alert Storm Mitigation
- Basic Alert Flows
This Management Pack is not about tuning alerts or disabling them. This Management Pack is about managing notifications and creating actionable Alert Views. By focusing on alerts that are in selected resolution states and which have been assigned to specific teams, we help administrators concentrate on only those alerts that need attention.
Alert Workflows
What is an Alert Workflow? How is it used in OpsMgr?
You’re already familiar with the default Alert Workflow used by OpsMgr:
- New alerts are created when certain conditions are met;
- When OpsMgr detects that a monitor alert is “healthy” again, it closes the alert.
If we diagram this using a flowchart, it would look like:
First, we will introduce a new Resolution State (“Assigned”) and we will add a step in the process. We will assign an owner to each alert based on rules in a configuration file. Now, our alert workflow looks like:
Next, we’ll add some analysis for our open alerts. We’ll try to correlate alerts to see if there is an Alert Storm currently underway. We will then assign those alerts to a new Resolution State (“Alert Storm”), tag them with an identifier and we generate a master alert that can be used to track the other alerts. Now, our Alert Workflow will look like:
Finally, we will add some additional Resolution States and some new alert workflows to the process. The goal is to try to verify an alert before we trigger a notification. To accomplish this, we are going to add a new Resolution State (“Verified“) and we’re going to add some additional analysis after we assign an alert owner. We are also going to leverage an existing Resolution State (“Awaiting Evidence“) for certain alerts. When we’re done, the alert workflow will look like this:
Verified is the New “New”
Now, instead of using the “New” resolution state as the criteria for notifications, we’ll use the “Verified” resolution state. By using the “Verified” resolution state, we will NOT send notifications for:
- Alert Storm alerts
- Alerts that are open for less than a minimum threshold of time (typically 10 minutes)
- Transient rule-based alerts that occur once, but never repeat
Now, when we add our notifications step into our Workflow, it looks like this:
IMPORTANT!!!! This process introduces an automatic five-minute delay into your notifications. If you are in a time-sensitive environment where notifications cannot be delayed, then this solution is NOT for you.
Results
The combined impact of implementing these alert workflows at our pilot customer has been very interesting.
- A large percentage (44%) of our monitor-based alerts open and close within a 10 minute window. By introducing a slight delay, we eliminated a large number of notifications.
- We automatically closed out a large number of non-recurring Rule-based alerts as “transient” alerts.
- While alert storms were infrequent, they generated a large number of alerts (and notifications) which had an outsized impact on the negative perception of OpsMgr as being “noisy”.
- Ultimately, we ended up with a pool of alerts in Verified state that was about 60% smaller than when we started.
And we achieved that without having to tune a single Alert.
Benefits
There are two primary benefits to implementing this solution for customers:
- By decreasing the number of outbound notifications sent by OpsMgr, we reduce “alert fatigue” and increase the relevance of individual notifications that systems administrators see;
- By creating “focused” views (filtered to “Verified” alerts that belong to specific teams), we increase the relevance of the user experience.
Does it Scale?
We implemented an early version of these Worklows at a customer with 2,500 Windows agents. The customer typically averaged between 2,000 to 3,000 open alerts at any given point in time. It was NOT a well-tuned environment, but some work had been done. After implementing these Alert Workflows, we reduced the number of open alerts to between 1,000 and 1,500. We reduced the number of outbound notifications generated by about 75%.
Conclusion
We believe that implementing the SCOM Alert Management Pack and enabling Alert Assignment and Alert Escalation will make Operations Manager more relevant for your organization and for the administrators responsible for managing systems in the environment. The Management Pack does not replace the need to tune your alerts. But it can be a helpful first step in controlling the flow of notifications.
Acknowledgements
There are a lot of people who have helped make this management pack real. It would be impossible to thank all of them, but I would be remiss for not mentioning:
- Dan Reist
- Shane Hutchens
- Tyson Paul