If you do any management pack authoring then you are probably familiar with the headaches that can result from this sequence of activity:
- importing a management pack (which includes class discovery)
- removing that management pack
- changing the service model (classes and/or their existing properties and/or taxonomy)
- importing that management pack again
The agents can appear to get confused and refuse to discover your instances again or if they do get discovered, the rules/monitors may refuse to run for the instances (at least until the agents get their marching orders corrected by the next snapshot synchronization). This has to do with how the class instances get flagged as deleted when the MP is removed but when the MP is imported again the classes may be technically different because of the taxonomy changes but it shouldn’t matter because it’s a new MP but it does matter because obviously something is wrong but, but, but…
I won’t attempt to explain the nerd science behind this behavior because I will probably butcher the details.
What I will do is provide a management pack with agent tasks to allow you to run these normally scheduled maintenance activities manually:
- Discovery Data Purge – Force removal of items flagged as ‘IsDeleted=1’
- Snapshot Synchronization – A config update for everything in the management group
These are maintenance activities that your management servers already perform automatically on an unchangeable schedule. However, the discovery purge normally happens at 2am and when you are in the middle of management pack development you may not want to wait around until the next day to test your MP. You may need those cobwebs cleaned out immediately to continue your testing/dev.
If my memory serves me correctly, the snapshot synchronization procedure occurs daily but on its own synch schedule. It can take 30 seconds in small environments but up to 30+ minutes in large environments.
Upon installation of this management pack, a new folder called “Maintenance” will appear in your Monitoring workspace. In this folder, there is a single state view called “OpsDB Watcher” which will display a single entity (the Operational Database Watcher is a system class). This class instance has four new tasks associated which will appear in the “Tasks” pane (screenshot below). These tasks will all use the existing “Operational Database Account” security profile. By default, this profile is empty and therefore the tasks will typically use the “Management Server Action Account“.
The tasks leverage a PowerShell datasource to run the SQL queries. They get executed on the management server where the OpsDB Watcher instance lives; in this lab: ms02.contoso.com.
OpsDB – 1) Execute DiscoveryDataPurge
This will purge entries from the OpsDB BaseManagedEntity table where “IsDeleted = 1”.
/* Script: DiscoveryDataPurge.SQL Author: Tyson Paul (this is a fancier version of the classic purge procedure) Version History: 2021.11.23.1703 - v1 */ DECLARE @Err int DECLARE @Ret int DECLARE @Command nvarchar(MAX) DECLARE @GroomHistoryId bigint DECLARE @ItemsToDelete INT DECLARE @ItemsRemaining INT DECLARE @ItemsDeleted INT -- Custom job memo SET @Command = N'Manual Purge: dbo.p_DiscoveryDataPurgingByTypedManagedEntityInBatches, dbo.p_DiscoveryDataPurgingByRelationshipInBatches, dbo.p_DiscoveryDataPurgingByBaseManagedEntityInBatches' -- Call the grooming history insert sproc. This will add a default job timestamp with 'Failed' Status. Only upon successful completion will the Status become updated to "Success" EXEC @Ret = dbo.p_InternalJobHistoryInsert @Command, @GroomHistoryId OUT SET @ItemsToDelete = ( SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1 ) -- This only purges 10000 records. If you have more it will require multiple runs -- If it fails with an error set BatchSize to 500 and run multiple times DECLARE @TimeGenerated DATETIME, @BatchSize INT, @RowCount INT SET @BatchSize = 10000 -- These 3 procedures below require a DateTime param which is basically a watermark to delete items OLDER than the timestamp. -- Normally this timestamp is 2-ish days. Here we are setting @TimeGenerated to RIGHT NOW so all entries will be older than right now and will be deleted. SET @TimeGenerated = GETUTCDATE() EXEC @Err = p_DiscoveryDataPurgingByRelationship @TimeGenerated, @BatchSize, @RowCount IF (@Err <> 0) GOTO Error_Exit EXEC @Err = p_DiscoveryDataPurgingByTypedManagedEntity @TimeGenerated, @BatchSize, @RowCount IF (@Err <> 0) GOTO Error_Exit EXEC @Err = p_DiscoveryDataPurgingByBaseManagedEntity @TimeGenerated, @BatchSize, @RowCount IF (@Err <> 0) GOTO Error_Exit -- Call the grooming history insert sproc to update status to "Success" EXEC @Ret = dbo.p_InternalJobHistoryUpdate @GroomHistoryId, 1 SET @ItemsRemaining = ( SELECT count(*) FROM BaseManagedEntity WHERE IsDeleted = 1 ) SET @ItemsDeleted = ( SELECT @ItemsToDelete - @ItemsRemaining ) Select @ItemsToDelete as 'ItemsToPurge' ,@ItemsDeleted as 'ItemsDeleted' , @ItemsRemaining as 'ItemsRemaining' GOTO _Exit Error_Exit: Select 'Failed' as 'Status' _Exit:
OpsDB – 2) Show DiscoveryDataPurgeHistory
This will show the discovery purge history; both scheduled and manual executions.
/* Script: ShowDiscoveryDataPurgeHistory.SQL Author: Tyson Paul Version History: 2021.11.23.1703 - v1 */ -- build table list to contain friendly names for Status codes IF (OBJECT_ID('tempdb..#tmpStatus') IS NOT NULL) DROP TABLE #tmpStatus CREATE TABLE #tmpStatus ( StatusCode int NOT NULL ,StatusName sysname NOT NULL ) INSERT #tmpStatus VALUES ( 0, 'Failed') INSERT #tmpStatus VALUES ( 1, 'Success') INSERT #tmpStatus VALUES ( 2, 'FailedOther') select top 20 InternalJobHistoryId ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeStarted) AS TimeStartedInLocalTime ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeFinished) AS TimeFinishedInLocalTime ,DATEDIFF(second, (DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeStarted) ), (DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), TimeFinished) )) as JobDurationSeconds ,InternalJobHistory.StatusCode ,StatusName ,Command from InternalJobHistory INNER JOIN #tmpStatus ON InternalJobHistory.StatusCode = #tmpStatus.StatusCode --Where Command like '%p_DataPurging%' order by InternalJobHistoryId DESC IF (OBJECT_ID('tempdb..#RegularTableToGroom') IS NOT NULL) DROP TABLE #RegularTableToGroom
OpsDB – 3) Execute Snapshot Synchronization
This triggers a configuration update for everything in the management group.
/* Script: SnapshotSynchForce.SQL Author: Tyson Paul (this is a fancy wrapper for CS.SnapshotSynchronizationForce ) Version History: 2021.11.29.1457 - Improved output 2021.11.23.1703 - v1 */ DECLARE @ErrorInd bit ,@ErrorMessage nvarchar(max) ,@ErrorNumber int ,@ErrorSeverity int ,@ErrorState int ,@ErrorLine int ,@ErrorProcedure nvarchar(256) ,@ErrorMessageText nvarchar(max) -- Calculate the average time required to run the procedure. This is good info for the user to help manage expectations. DECLARE @AvgRunTime INT SET @AvgRunTime = ( SELECT AVG([DurationSeconds]) as AverageRunTime FROM CS.WorkItem INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId WHERE WorkItemName like '%snap%' ) BEGIN TRY -- Try to detect if the job is already running IF NOT EXISTS ( SELECT TOP 1 1 FROM CS.WorkItem INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId WHERE WorkItemName like '%snap%' AND WorkItemStateName like 'Running' ORDER BY WorkItemRowId DESC ) BEGIN -- If not already running, run the procedure EXEC CS.SnapshotSynchronizationForce -- It often takes a few seconds for the job status to appear in the WorkItem table. WAITFOR DELAY '00:00:5'; END --snapshot sync SELECT [WorkItemRowId] ,[WorkItemName] ,CS.WorkItem.WorkItemStateId ,WorkItemState.WorkItemStateName ,[ServerName] ,[InstanceName] ,[StartedDateTimeUtc] ,[LastActivityDateTimeUtc] ,[CompletedDateTimeUtc] ,[DurationSeconds] ,@AvgRunTime AS 'Average Runtime (seconds)' ,[ErrorMessage] FROM CS.WorkItem INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId WHERE WorkItemName like '%snap%' ORDER BY WorkItemRowId DESC END TRY BEGIN CATCH SELECT @ErrorNumber = ERROR_NUMBER() ,@ErrorSeverity = ERROR_SEVERITY() ,@ErrorState = ERROR_STATE() ,@ErrorLine = ERROR_LINE() ,@ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-') ,@ErrorMessageText = ERROR_MESSAGE() SET @ErrorInd = 1 END CATCH -- report error if any IF (@ErrorInd = 1) BEGIN DECLARE @AdjustedErrorSeverity int SET @AdjustedErrorSeverity = CASE WHEN @ErrorSeverity > 18 THEN 18 ELSE @ErrorSeverity END RAISERROR ('Sql execution failed. Error %d, Level %d, State %d, Procedure %s, Line %d, Message: %s', @AdjustedErrorSeverity, 1 ,@ErrorNumber ,@ErrorSeverity ,@ErrorState ,@ErrorProcedure ,@ErrorLine ,@ErrorMessageText ) END
OpsDB – 4) Show Snapshot Synchronization History
This will show the snapshot synch history; both scheduled and manual executions.
/* Script: ShowSnapshotSynchHistory.SQL Author: Tyson Paul Version History: 2021.11.23.1703 - v1 */ --Snapshot sync history SELECT [WorkItemRowId] ,[WorkItemName] ,CS.WorkItem.WorkItemStateId ,WorkItemState.WorkItemStateName ,[ServerName] ,[InstanceName] ,[DurationSeconds] ,[StartedDateTimeUtc] ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), [StartedDateTimeUtc]) AS TimeStartedLocalTime ,[LastActivityDateTimeUtc] ,[CompletedDateTimeUtc] ,DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), [CompletedDateTimeUtc]) AS TimeStartedLocalTime ,[ErrorMessage] FROM CS.WorkItem INNER JOIN CS.WorkItemState on WorkItem.WorkItemStateId = WorkItemState.WorkItemStateId WHERE WorkItemName like '%snap%' ORDER BY WorkItemRowId DESC
In this example I removed a management pack which resulted in several class instances discovered by that pack being marked for deletion. 30 entities were purged. The stored procedures used to do the heavy lifting will remove flagged items in batches of 10,000. If more flagged items exist than the batch limit, the remaining number of flagged items will be shown in the task result. If items remain, simply run the task again until no items remain.
In this example, the Snapshot Synchronization task triggers the “CS.SnapshotSynchronizationForce” procedure and returns the procedure history which includes the current “running” procedure status.
A stored procedure is a prepared SQL snippet (sortof like a function) that you can save, so the code can be reused over and over again. CS.SnapshotSynchronizationForce is a stored procedure found in folder shown below.
ALWAYS CREATE (AND VERIFY) A FULL BACKUP OF YOUR DATABASES BEFORE ATTEMPTING TO MODIFY YOUR DATABASE(S) MANUALLY.
The terms you agree to by using this offering:
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Download “SCOM Maintenance”SCOMMaintenance_188.8.131.52.zip – Downloaded 594 times – 22.70 KB