The SCOM Maintenance MP that I published in 2022 made purging discovery data and config synch effortless. Newly added to the MP are the following agent tasks:
- Show StateChange Age
- Clean StateChange Data
State change history is recorded for every monitored object when the health state for any monitor changes between Healthy, Warning, and Critical. If the object is in an unhealthy state when the nightly grooming procedure occurs, the state data doesn’t get groomed and this can build up forever, filling the OperationsManager database. This condition is documented pretty well here. Years ago, Tao Yang created the Self Maintenance MP (now maintained by Cookdown here) with a rule to detect when stale data exists. This is great for awareness but it doesn’t solve the issue. You can now use this agent task to run the SQL script to clean up the data.
Examples
In the example screenshot above you can see by the results of the “Show StateChange Age” task that I have a tremendous amount of stale state change data and also my OperationsManager database is unhealthy due to low space. In my case I happen to know that the low free space condition is largely due to aggressive performance collections but in some cases this can be due to accumulation of old state change.
Below you can see the result of a cleaning moved the needle over 4%. The DB is still in Warning state so I’ve got more tuning to do.
Below is another example of a management group with over 86,000,000 stale state change records that was recently cleaned. You can see how the space measurements shifted dramatically by removing the old records. This doesn’t affect the current health state or any active monitors or alerts for the objects; it merely cleans out old state change history that is outside of your configured grooming window.
Below is the query used in the task. It’s similar to the version found here but with a few modifications to make it slightly faster, at least according to the nerd who modified it.
USE [OperationsManager]
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
BEGIN
SET NOCOUNT ON
DECLARE @Err INT
DECLARE @Ret INT
DECLARE @DaysToKeep TINYINT
DECLARE @GroomingThresholdLocal DATETIME
DECLARE @GroomingThresholdUTC DATETIME
DECLARE @TimeGroomingRan DATETIME
DECLARE @MaxTimeGroomed DATETIME
DECLARE @RowCount INT
SET @TimeGroomingRan = getutcdate()
SELECT @GroomingThresholdLocal = dbo.fn_GroomingThreshold(DaysToKeep, getdate())
FROM dbo.PartitionAndGroomingSettings
WHERE ObjectName = 'StateChangeEvent'
EXEC dbo.p_ConvertLocalTimeToUTC @GroomingThresholdLocal
,@GroomingThresholdUTC OUT
SET @Err = @@ERROR
IF (@Err <> 0)
BEGIN
GOTO Error_Exit
END
SET @RowCount = 1
-- This is to update the settings table
-- with the max groomed data
SELECT @MaxTimeGroomed = MAX(TimeGenerated)
FROM dbo.StateChangeEvent
WHERE TimeGenerated < @GroomingThresholdUTC
IF @MaxTimeGroomed IS NULL
GOTO Success_Exit
-- Instead of the FK DELETE CASCADE handling the deletion of the rows from
-- the MJS table, do it explicitly. Performance is much better this way.
DELETE MJS
FROM dbo.MonitoringJobStatus MJS
JOIN dbo.StateChangeEvent SCE ON SCE.StateChangeEventId = MJS.StateChangeEventId
JOIN dbo.STATE S
WITH (NOLOCK) ON SCE.[StateId] = S.[StateId]
WHERE SCE.TimeGenerated < @GroomingThresholdUTC
AND S.[HealthState] IN (
0
,1
,2
,3
)
SELECT @Err = @@ERROR
IF (@Err <> 0)
BEGIN
GOTO Error_Exit
END
-- create temp table to hold StateChangeEvent row ids to be deleted
IF (OBJECT_ID('tempdb..#StateChangeEventToGroom') IS NOT NULL)
DROP TABLE #StateChangeEventToGroom
SELECT @Err = @@ERROR
IF (@Err <> 0)
GOTO Error_Exit
CREATE TABLE #StateChangeEventToGroom (StateChangeEventId UNIQUEIDENTIFIER NOT NULL)
SELECT @Err = @@ERROR
IF (@Err <> 0)
GOTO Error_Exit
WHILE (@RowCount > 0)
BEGIN
-- Delete StateChangeEvents that are older than @GroomingThresholdUTC
-- We are doing this in chunks in separate transactions on
-- purpose: to avoid the transaction log to grow too large.
-- NOTE: The method when we select ids into temp table
-- and then groom in a separate statement works faster than just DELETE TOP...
INSERT #StateChangeEventToGroom (StateChangeEventId)
SELECT TOP (10000) SCE.StateChangeEventId
FROM dbo.StateChangeEvent SCE
JOIN dbo.STATE S
WITH (NOLOCK) ON SCE.[StateId] = S.[StateId]
WHERE TimeGenerated < @GroomingThresholdUTC
AND S.[HealthState] IN (
0
,1
,2
,3
)
SELECT @Err = @@ERROR
,@RowCount = @@ROWCOUNT
WAITFOR DELAY '00:00:01' -- waiting 1 seconds
IF (@Err <> 0)
BEGIN
GOTO Error_Exit
END
SET STATISTICS TIME ON;
DELETE dbo.StateChangeEvent
WHERE StateChangeEventId IN (
SELECT StateChangeEventId
FROM #StateChangeEventToGroom
)
SET STATISTICS TIME OFF
SELECT @Err = @@ERROR
IF (@Err <> 0)
GOTO Error_Exit
TRUNCATE TABLE #StateChangeEventToGroom
SELECT @Err = @@ERROR
IF (@Err <> 0)
GOTO Error_Exit
END
UPDATE dbo.PartitionAndGroomingSettings
SET GroomingRunTime = @TimeGroomingRan
,DataGroomedMaxTime = @MaxTimeGroomed
WHERE ObjectName = 'StateChangeEvent'
SELECT @Err = @@ERROR
,@RowCount = @@ROWCOUNT
IF (@Err <> 0)
BEGIN
GOTO Error_Exit
END
Success_Exit:
IF (OBJECT_ID('tempdb..#StateChangeEventToGroom') IS NOT NULL)
DROP TABLE #StateChangeEventToGroom
Error_Exit:
END