Discovery Data Purge and Snapshot Synchronization Agent Tasks for SCOM



If you do any management pack authoring then you are probably familiar with the headaches that can result from this sequence of activity:

  1. importing a management pack (which includes class discovery)
  2. removing that management pack
  3. changing the service model (classes and/or their existing properties and/or taxonomy)
  4. importing that management pack again

The agents can appear to get confused and refuse to discover your instances again or if they do get discovered, the rules/monitors may refuse to run for the instances (at least until the agents get their marching orders corrected by the next snapshot synchronization). This has to do with how the class instances get flagged as deleted when the MP is removed but when the MP is imported again the classes may be technically different because of the taxonomy changes but it shouldn’t matter because it’s a new MP but it does matter because obviously something is wrong but, but, but…

…headache.

I won’t attempt to explain the nerd science behind this behavior because I will probably butcher the details.
What I will do is provide a management pack with agent tasks to allow you to run these normally scheduled maintenance activities manually:

  • Discovery Data Purge – Force removal of items flagged as ‘IsDeleted=1’
  • Snapshot Synchronization – A config update for everything in the management group

These are maintenance activities that your management servers already perform automatically on an unchangeable schedule. However, the discovery purge normally happens at 2am and when you are in the middle of management pack development you may not want to wait around until the next day to test your MP. You may need those cobwebs cleaned out immediately to continue your testing/dev.

If my memory serves me correctly, the snapshot synchronization procedure occurs daily but on its own synch schedule. It can take 30 seconds in small environments but up to 30+ minutes in large environments.

Upon installation of this management pack, a new folder called “Maintenance” will appear in your Monitoring workspace. In this folder, there is a single state view called “OpsDB Watcher” which will display a single entity (the Operational Database Watcher is a system class). This class instance has four new tasks associated which will appear in the “Tasks” pane (screenshot below). These tasks will all use the existing “Operational Database Account” security profile. By default, this profile is empty and therefore the tasks will typically use the “Management Server Action Account“.


The Tasks

The tasks leverage a PowerShell datasource to run the SQL queries. They get executed on the management server where the OpsDB Watcher instance lives; in this lab: ms02.contoso.com.

OpsDB – 1) Execute DiscoveryDataPurge

This will purge entries from the OpsDB BaseManagedEntity table where “IsDeleted = 1”.

/* 
Script: DiscoveryDataPurge.SQL
Author: Tyson Paul (this is a fancier version of the classic purge procedure)
Version History:
2021.11.23.1703 - v1
*/

DECLARE @Err int
DECLARE @Ret int
DECLARE @Command nvarchar(MAX)
DECLARE @GroomHistoryId bigint
DECLARE @ItemsToDelete	INT
DECLARE @ItemsRemaining	INT
DECLARE @ItemsDeleted INT

-- Custom job memo
SET @Command = N'Manual Purge: dbo.p_DiscoveryDataPurgingByTypedManagedEntityInBatches, dbo.p_DiscoveryDataPurgingByRelationshipInBatches, dbo.p_DiscoveryDataPurgingByBaseManagedEntityInBatches'

-- Call the grooming history insert sproc. This will add a default job timestamp with 'Failed' Status. Only upon successful completion will the Status become updated to "Success"
EXEC @Ret = dbo.p_InternalJobHistoryInsert @Command, @GroomHistoryId OUT

SET @ItemsToDelete = (
	SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1
)
-- This only purges 10000 records.  If you have more it will require multiple runs
-- If it fails with an error set BatchSize to 500 and run multiple times
DECLARE @TimeGenerated DATETIME, @BatchSize INT, @RowCount INT
SET @BatchSize = 10000
-- These 3 procedures below require a DateTime param which is basically a watermark to delete items OLDER than the timestamp. 
-- Normally this timestamp is 2-ish days. Here we are setting @TimeGenerated to RIGHT NOW so all entries will be older than right now and will be deleted.
SET @TimeGenerated = GETUTCDATE()

EXEC @Err = p_DiscoveryDataPurgingByRelationship @TimeGenerated, @BatchSize, @RowCount
IF (@Err <> 0)
    GOTO Error_Exit

EXEC @Err = p_DiscoveryDataPurgingByTypedManagedEntity @TimeGenerated, @BatchSize, @RowCount
IF (@Err <> 0)
    GOTO Error_Exit

EXEC @Err = p_DiscoveryDataPurgingByBaseManagedEntity @TimeGenerated, @BatchSize, @RowCount 
IF (@Err <> 0)
    GOTO Error_Exit

-- Call the grooming history insert sproc to update status to "Success"
EXEC @Ret = dbo.p_InternalJobHistoryUpdate @GroomHistoryId, 1

SET @ItemsRemaining = (
	SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1
)
SET @ItemsDeleted = (
	SELECT @ItemsToDelete - @ItemsRemaining
)

Select @ItemsToDelete as 'ItemsToPurge'
	,@ItemsDeleted as 'ItemsDeleted'
	, @ItemsRemaining as 'ItemsRemaining'

GOTO _Exit

Error_Exit:
    Select 'Failed' as 'Status'

_Exit:


OpsDB – 2) Show DiscoveryDataPurgeHistory

This will show the discovery purge history; both scheduled and manual executions.

/* 
Script: ShowDiscoveryDataPurgeHistory.SQL
Author: Tyson Paul
Version History:
2021.11.23.1703 - v1
*/

-- build table list to contain friendly names for Status codes
IF (OBJECT_ID('tempdb..#tmpStatus') IS NOT NULL)
DROP TABLE #tmpStatus

CREATE TABLE #tmpStatus (
    StatusCode          int     NOT NULL
	,StatusName         sysname NOT NULL
)
      
INSERT #tmpStatus VALUES ( 0, 'Failed')
INSERT #tmpStatus VALUES ( 1, 'Success')
INSERT #tmpStatus VALUES ( 2, 'FailedOther')

select top 20 InternalJobHistoryId
,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeStarted) AS TimeStartedInLocalTime
,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeFinished) AS TimeFinishedInLocalTime
,DATEDIFF(second, (DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeStarted) ), (DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeFinished) )) as JobDurationSeconds
,InternalJobHistory.StatusCode
,StatusName
,Command
from InternalJobHistory
INNER JOIN #tmpStatus ON InternalJobHistory.StatusCode = #tmpStatus.StatusCode
--Where Command like '%p_DataPurging%'
order by InternalJobHistoryId DESC

IF (OBJECT_ID('tempdb..#RegularTableToGroom') IS NOT NULL)
DROP TABLE #RegularTableToGroom


OpsDB – 3) Execute Snapshot Synchronization

This triggers a configuration update for everything in the management group.

/* 
Script: SnapshotSynchForce.SQL
Author: Tyson Paul (this is a fancy wrapper for CS.SnapshotSynchronizationForce )
Version History:
2021.11.29.1457 - Improved output
2021.11.23.1703 - v1
*/

DECLARE 
     @ErrorInd        bit
    ,@ErrorMessage    nvarchar(max)
    ,@ErrorNumber     int
    ,@ErrorSeverity   int
    ,@ErrorState      int
    ,@ErrorLine       int
    ,@ErrorProcedure  nvarchar(256)
    ,@ErrorMessageText nvarchar(max)


-- Calculate the average time required to run the procedure. This is good info for the user to help manage expectations. 
DECLARE @AvgRunTime	INT
SET @AvgRunTime = (
	SELECT AVG([DurationSeconds]) as AverageRunTime
	FROM CS.WorkItem
	INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId
	WHERE WorkItemName like '%snap%'
)

BEGIN TRY
	-- Try to detect if the job is already running
	IF NOT EXISTS (
		SELECT TOP 1 1
		FROM CS.WorkItem
		INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId
		WHERE WorkItemName like '%snap%'
		AND WorkItemStateName like 'Running'
		ORDER BY WorkItemRowId DESC
	)
		BEGIN
			-- If not already running, run the procedure
			EXEC CS.SnapshotSynchronizationForce
			-- It often takes a few seconds for the job status to appear in the WorkItem table.
			WAITFOR DELAY '00:00:5';
		END 

	--snapshot sync
	SELECT [WorkItemRowId]
			,[WorkItemName]
			,CS.WorkItem.WorkItemStateId
			,WorkItemState.WorkItemStateName
			,[ServerName]
			,[InstanceName]
			,[StartedDateTimeUtc]
			,[LastActivityDateTimeUtc]
			,[CompletedDateTimeUtc]
			,[DurationSeconds]
			,@AvgRunTime AS 'Average Runtime (seconds)'
			,[ErrorMessage]
	FROM CS.WorkItem
	INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId
	WHERE WorkItemName like '%snap%'
	ORDER BY WorkItemRowId DESC

END TRY

BEGIN CATCH
	SELECT
    @ErrorNumber = ERROR_NUMBER()
    ,@ErrorSeverity = ERROR_SEVERITY()
    ,@ErrorState = ERROR_STATE()
    ,@ErrorLine = ERROR_LINE()
    ,@ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-')
    ,@ErrorMessageText = ERROR_MESSAGE()

    SET @ErrorInd = 1
END CATCH

  -- report error if any
IF (@ErrorInd = 1)
BEGIN
	DECLARE @AdjustedErrorSeverity int

	SET @AdjustedErrorSeverity = CASE
		WHEN @ErrorSeverity > 18 THEN 18
		ELSE @ErrorSeverity
		END
    
	RAISERROR ('Sql execution failed. Error %d, Level %d, State %d, Procedure %s, Line %d, Message: %s', @AdjustedErrorSeverity, 1
		,@ErrorNumber
		,@ErrorSeverity
		,@ErrorState
		,@ErrorProcedure
		,@ErrorLine
		,@ErrorMessageText
	)
END


OpsDB – 4) Show Snapshot Synchronization History

This will show the snapshot synch history; both scheduled and manual executions.

/* 
Script: ShowSnapshotSynchHistory.SQL
Author: Tyson Paul
Version History:
2021.11.23.1703 - v1
*/

 --Snapshot sync history
SELECT [WorkItemRowId]
      ,[WorkItemName]
      ,CS.WorkItem.WorkItemStateId
	  ,WorkItemState.WorkItemStateName
      ,[ServerName]
      ,[InstanceName]
      ,[DurationSeconds]
      ,[StartedDateTimeUtc]   
	  ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), [StartedDateTimeUtc]) AS TimeStartedLocalTime
      ,[LastActivityDateTimeUtc]
      ,[CompletedDateTimeUtc]
	  ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), [CompletedDateTimeUtc]) AS TimeStartedLocalTime
      ,[ErrorMessage]
FROM CS.WorkItem
INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId
WHERE WorkItemName like '%snap%'
ORDER BY WorkItemRowId DESC



Examples

In this example I removed a management pack which resulted in several class instances discovered by that pack being marked for deletion. 30 entities were purged. The stored procedures used to do the heavy lifting will remove flagged items in batches of 10,000. If more flagged items exist than the batch limit, the remaining number of flagged items will be shown in the task result. If items remain, simply run the task again until no items remain.



In this example, the Snapshot Synchronization task triggers the “CS.SnapshotSynchronizationForce” procedure and returns the procedure history which includes the current “running” procedure status.


Extra Info

A stored procedure is a prepared SQL snippet (sortof like a function) that you can save, so the code can be reused over and over again. CS.SnapshotSynchronizationForce is a stored procedure found in folder shown below.


Download

ALWAYS CREATE (AND VERIFY) A FULL BACKUP OF YOUR DATABASES BEFORE ATTEMPTING TO MODIFY YOUR DATABASE(S) MANUALLY.

The terms you agree to by using this offering:
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Download “SCOM Maintenance”

SCOMMaintenance_1.0.0.43.zip – Downloaded 408 times – 23 KB

Leave a Reply

Your email address will not be published. Required fields are marked *