Workflow Analyzer (v2) for SCOM




Early last year (March 2021) the System Center product group released a long-awaited reboot of a priceless tool called the Workflow Analyzer for SCOM. After bunch of hard work by the PG folks and a few weeks of testing/feedback by this author and a few other lucky engineers, the PG has released the next iteration. They didn’t just update it, they really went out of their way to polish it and even included fancy graphs to reveal CPU and memory data for the traced workflow process. It can even analyze saved traces giving the user a “time slider” experience of the tool which makes debugging easier.

I was planning to include a walkthrough and tutorial but the PG did such a good job on the System Center WFAnalyzer documentation that I simply included it below (with permission, of course).

DOWNLOAD


System Center WFAnalyzer documentation – (June, 2022)

Workflow Analyzer

What’s New!

  • Enhanced UI to give better user experience.
  • Now you can view historical data. (User has the option to export the traces and store it for future analysis.)
  • Now the user can see the CPU and Memory usage of a particular workflow by looking at the live charts. Also, the user can see the avg. CPU utilization, handle count, and active threads of the processes running.
  • Now the user has the option to refresh the workflows on Workflow Selection window.

Overview

Workflow Analyzer (WFAnalyzer) was developed to view how a workflow passes data between the modules. This tool is very useful as it helps in quickly troubleshooting and determining the issues in a workflow and hence helping decide what to change in a management pack to get a workflow working. WFAnalyzer allows you to:

  • Understand the data flow within a workflow and read through traces produced by each module in a workflow: Learning how workflows are composed is typically a process of trial and error, combined with researching countless blogs, forum posts and search results. This is due to the fact that workflows are effectively a “black box”. While it is true that we want our users to be able to take “how workflows work” for granted, they need to be able to dig in deeper and learn or triage where appropriate. WFAnalyzer will give you the means to investigate workflows in your production and development environments both for the purposes of troubleshooting and learning. Authors will also become more knowledgeable in building powerful module compositions, which encourages MP production and innovation.
  • Quicker troubleshooting: Currently when writing an MP, you must write the MP, import it into a management group, run it against an agent, and review event logs and state views to see if they implemented their rule to function correctly. A manual process such as this is very error-prone, and in many cases may be skipped entirely when time is critical. WFAnalyzer provides deeper insight into these workflows in a live environment.


Installation

(Installation screenshots added by MonitoringGuys)

Note: The download page indicates version “3.0.0” but I’ve been informed by the PG that this is version 2. The 2021 release of this tool was more of a “revival”.

Notice the installer indicates “V2”.

See CPU/Memory charts below!


Running Workflow Analyzer

WFAnalyzer can analyze workflows running on the Management Server (MS) and workflows running on the Operations Manager agent.


Analyzing Workflows on a Management Server

You can analyze workflows on an MS if the Workflow Analyzer is installed and the Operations console is present on the MS.


To analyze the workflows on an MS

  1. Install WFAnalyzer on MS
  2. Start the WorkflowAnalyzer installed at the following path ”%Program Files%/Microsoft System Center/Management Pack Tools”
  3. Choose the MS that the Agent is monitored by and Click ‘Fetch’. Then, In Select HealthService to analyze, select the Health Service that has the workflow you want to analyze and click Start.
  1. The Workflow Analyzer window displays a list of workflows on that Health Service. Right-click a Failed or Running workflow in the list and click Trace.
  2. In the Select instance to trace window, select an instance and click Start.

NOTE: If there are multiple instances of an object (such as SQL Database), there may be multiple instances of a workflow (one for each object instance). You can only trace one instance of the workflow at a time.


Analyzing Workflows on an Agent

WFAnalyzer can now be directly run from the Agent. To analyze workflows running on an agent, .NET >=3.0 must be installed on the computer on which the agent is installed.

There are two methods to do this:

  1. On the MS, start WFAnalyzer tool. Connect to the Agent Health Service after selecting it from the dropdown menu.
  2. The Workflow Analyzer window displays a list of workflows on that Health Service. Right-click a Failed or Running workflow in the list and click Trace.
  3. In the Select instance to trace window, select an instance and click Start.
  4. In the TraceWorkflow window, you will see two traces: submitting trace override and trace override has been submitted successfully. All subsequent trace outputs are output to an ETL file on the computer on which the agent is running.
  5. You have two options for the TraceWorkflow window on the MS:
    • You can leave the TraceWorkflow window open (recommended). Close the window after you are done troubleshooting the workflow.
    • You can close the TraceWorkflow window. A Stop window appears, asking if you want to stop debugging the current session. Click No. The override will remain intact, and the workflow will continue to trace until the WFAnalyzer is launched again and another workflow is traced. There can only be one workflow traced for a management group at a time. Since the override persists in the database, it will survive Agent Health Service restarts and will always trace.
      Warning: If you click Yes, workflow tracing on the agent will stop. When you attempt to connect to an existing trace session on the agent, WFAnalyzer will fail to show any traces or the traces will be incomplete.
  6. Open WFAnalyzer.exe on the computer on which the agent is running.
  7. On the computer on which the agent is running, run WFAnalyzer.
  8. In the Start a new session window, choose Connect to an existing Workflow Analysis. Click Start.
  9. The TraceWorkflow window appears and displays all the traces that have been logged, even if the workflow has already hit the point where the bug happened.


Analyzing Historical Data

  1. WF Analyzer can also be used to view historical data.
  2. You can start the trace using the above steps. Upon successful run you can see the traces on your screen. Leave the tool running and export the trace using “Export Traces” button whenever you want.
  3. Select the folder where you want to store the exported file. Now you can close the tool if you want to analyse the exported file or can continue for further traces.
  4. Start the tool once again and select the third radio button (Import existing trace log file for analysis). Select the exported trace file(filename.wftrace) and you can view the traces using the time slider.

5. User can use the slider to see the traces, CPU graph and the running threads on that particular time. Also, by clicking on any trace, user can see the CPU utilization, memory utilization and running threads while that trace was getting executed.


Starting a New Workflow Trace Session

When you run WFAnalyzer, when you click on Close or choose to Exit, a Stop window appears, asking if you want to stop debugging the current session, that means a trace is currently running.

Graphical user interface, application

Description automatically generated
Graphical user interface, text, application

Description automatically generated

If you click YES, you can choose to Export the Trace Data and view it later by using Import existing trace log file for analysis from the Trace Start Form.

If you click CANCEL, the current Workflow trace session is resumed.

If you click NO, to Stop Debugging Current Session, the window is closed, and you can choose the Connect to an existing Workflow Trace session option from the Trace Start Form. When choosing this option, the Workflow Trace Viewer window is opened and connected to the existing trace session. Data will only be shown if data is coming through the workflow.

Graphical user interface, application

Description automatically generated

If this is done on the MS and the current workflow trace session was for an agent workflow, no traces will be shown. There is no indication that the current workflow session is for a remote agent Health Service.


Troubleshooting a Workflow That Is Not Running

Assume you have a discovery that is not running.

  1. On the computer, open Workflow Analyzer

NOTE: You will need Operations Manager Admin User Role rights and Administrator rights on the machine where the workflow is not running.

  1. In the Start a new session window, select Start a new Workflow Analysis session and enter the MS name. Then Click Fetch HS button In Select HealthService to analyze, select the Health Service where the discovery is not running and click Start.
Graphical user interface, text, application, chat or text message

Description automatically generated

NOTE: It may take a few seconds for the Select HealthService to analyze drop-down menu to populate if you have a large number of agents.

  1. The Workflow Analyzer window displays a list of workflows on that Health Service. In the Filter field, type in the name of the discovery to find it.
  1. When you find the discovery and confirm that it is in a Not Running state, check if the Enabled column is set to true. If it is not set to true, you must enable the discovery. Although Enabled can be set to true here, the actual discovery when downloaded to the agent may have an override that disables it. You can check for this by right clicking the discovery and choosing Analyze. A new window opens and shows you relevant information about this discovery, such as whether this discovery has been disabled through an override targeted to a group.

Another area to troubleshoot is trying to figure out why your monitor or rule workflow is in a Not Running state.


To troubleshoot monitor or rule workflow

  1. In the WFAnalyzer, choose the Health Service where the rule or monitor is supposed to be running.
  2. Search for the Rule or Monitor name and verify the Status column. If the status is Not Running, right-click and click Analyse. A new window will appear with details about that workflow.
  3. Verify if there are any workflow instances. If there are none, check the Target Discoveries in the Summary pane. Search for that discovery to figure out if it is enabled.
  4. After searching for the discovery, check if it is running.
  5. If the discovery is running, right-click the discovery and click Analyse to verify if there are any workflow instances.
  6. If there are workflow instances, close the Workflow Details window. In Workflow Analyzer, right-click the discovery and click Trace. At this point, you want to confirm if the discovery actually discovered anything.
  7. In the TraceWorkflow window, you can view what is being run in the discovery and see the resulting output.
  8. Eventually, you will see something like the following if nothing was discovered because that server does not have the application installed or there is a bug in the discovery logic:
Module column: DiscoveryFilter

Method column: OnNewDataItems

Trace column: Received DataItem <DataItem type="System.DiscoveryData" time="2009-09-24T09:30:56.1004335-07:00" sourceHealthServiceId="BF70F9EF-FB93-E9F5-DE1A-0139E12012E2"><DiscoveryType>0</DiscoveryType><DiscoverySourceType>0</DiscoverySourceType><DiscoverySourceObjectId>{767A7E59-3E04-3946-6FAD-642143A7FEBD}</DiscoverySourceObjectId><DiscoverySourceManagedEntity>{EEA5AF38-AECC-6FFC-A828-976825162E7B}</DiscoverySourceManagedEntity></DataItem>

If there is no <CreateInstance/> XML element in the Trace column output, then an instance will not be created.

9. Going up in the trace, you can see what is performed by the discovery. Below is an example from the VMM MP:

Module column: BatchResponse

Method column: RunProcess

Trace column: BatchResponseRunProcessCreating application 'C:\WINDOWS\System32\cscript.exe' with command line '"C:\WINDOWS\System32\cscript.exe" /nologo "C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 42\6132\DiscoverVMMEngine.vbs" {767A7E59-3E04-3946-6FAD-642143A7FEBD} {EEA5AF38-AECC-6FFC-A828-976825162E7B} learnopsmgr.redmond.corp.microsoft.com' in working directory 'C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 42\6132\'
  • In the above, you must now figure out what is happening in the script that is causing this failure on this computer. You can debug the script locally on that computer by launching the above script via cscript.exe with the //x flag to launch it under the debugger.


Troubleshooting an Unloaded or Failed Workflow

If your workflow is enabled but it is being unloaded, that workflow will be displayed in the WFAnalyzer as Failed. For failed workflows, you will have to trace them to determine why they are failing and being unloaded.

  1. On the computer, open Workflow Analyzer installed at the following path: “%Program Files%/Microsoft System Center/Management Pack Tools”
  2. Locate the workflow where the status is Failed.
  3. Right-click the workflow and click Trace. You will be presented with a list of instances of the class that the monitor is running against, as well as their status. For example, if there are multiple databases discovered, you will see a workflow for each instance.
  4. Click the failed workflow instance and click Start. The TraceWorkflow window appears and displays all the traces that have been logged, even if the workflow has already hit the point where the bug happened.

You can now export the data to a file(select the delimiter by changing the delimiter setting in the config file. Default delimiter is comma, for csv generation. Make sure you do the changes before starting WFAnalyzer) and share it with Microsoft Support for further Analysis.


Capturing Trace for a Failing Workflow

To capture a trace

  1. On the computer, open Workflow Analyzer installed at the following path: ”%Program Files%/Microsoft System Center/Management Pack Tools”
  2. Locate the workflow that you want to trace.
  3. Right-click the workflow and click Trace. You will be presented with a list of instances of the class that the monitor is running against, as well as their status. For example, if there are multiple databases discovered, you will see a workflow for each instance.
  4. Click the workflow instance that you want to trace and click Start.
  5. The TraceWorkflow window appears and displays all the traces that have been logged. Allow the trace to continue until you see a NotifyError.

6. Export the trace logs and send this to Microsoft Customer Service and Support or your management pack developer.

7. Close WFAnalyzer on the computer on which the agent is running, and then on the MS as well if you triggered it from there.

NOTE: To any prompts that ask if you want to stop the session or stop debugging, click Yes or OK.


Troubleshooting a Running Workflow That Has a Bug

In some cases, there is no failed workflow, but there is a bug in the logic of the workflow that may prevent that workflow from discovering something or even prevent it from monitoring. The WFAnalyzer is key to understanding the data and how it is being processed by each module in the workflow. Below steps will walk you through troubleshooting a Monitor


To troubleshoot a monitor

  1. On the computer, open Workflow Analyzer installed at the following path:” %Program Files%/Microsoft System Center/Management Pack Tools.”
  2. Locate the workflow that you want to trace.
  3. Right-click the workflow and click Trace.

You will be presented with a list of instances of the class that the monitor is running against, as well as their status. For example, if there are multiple databases discovered, you will see a workflow for each instance.

4. Click the workflow instance that you want to trace and click Start.

5. In the TraceWorkflow window, look for DiscoveryFilter module traces.

Module column: ExpressionEvaluatorCondition

Method column: FilterEvaluationTrace

6. Cross-reference this with the MP to check the RegularDetections section of the MonitorType for that UnitMonitor. Note the criteria for the critical state.

7. Look for the criteria in the trace and see how that criteria was evaluated against the live data on the agent.

8. Depending on the number of states that the monitor can have, you will see at least two of the following:

Data Item given to the RegularDetection workflow:

Received DataItem <DataItem type="System.PropertyBagData" time="2009-09-24T09:53:29.6247210-07:00"sourceHealthServiceId="BF70F9EF-FB93-E9F5-DE1A-0139E12012E2">
	<Property Name="__CLASS" VariantType="8">SqlServiceAdvancedProperty</Property>
	<Property Name="__DERIVATION"VariantType="8"/>
	<Property Name="__DYNASTY"VariantType="8">SqlServiceAdvancedProperty</Property>
	<Property Name="__GENUS" VariantType="3">2</Property>
	<Property Name="__NAMESPACE"VariantType="8">root\Microsoft\SQLServer\ComputerManagement</Property>
	<Property Name="__PATH" VariantType="8">\\LEARNOPSMGR\root\Microsoft\SQLServer\ComputerManagement:SqlServiceAdvancedProperty.PropertyIndex=2,PropertyName="SPLEVEL",ServiceName="MSSQLSERVER",SqlServiceType=1</Property>
	<Property Name="__PROPERTY_COUNT" VariantType="3">8</Property>
	<Property Name="__RELPATH" VariantType="8">SqlServiceAdvancedProperty.PropertyIndex=2,PropertyName="SPLEVEL",ServiceName="MSSQLSERVER",SqlServiceType=1</Property>
	<Property Name="__SERVER" VariantType="8">LEARNOPSMGR</Property>
	<Property Name="IsReadOnly" VariantType="11" Type="Boolean">true</Property>
	<Property Name="PropertyIndex" VariantType="19">2</Property>
	<Property Name="PropertyName" VariantType="8">SPLEVEL</Property>
	<Property Name="PropertyNumValue" VariantType="19">3</Property>
	<Property Name="PropertyValueType" VariantType="19">2</Property>
	<Property Name="ServiceName" VariantType="8">MSSQLSERVER</Property>
	<Property Name="SqlServiceType" VariantType="19">1</Property>
</DataItem>


Expression Trace of that given RegularDetection workflow:

Filter Instance (0000000007589040):
Evaluating Simple Expression:
<ValueExpression>
	<XPathQuery Type='Integer'>Property[@Name='PropertyNumValue']</XPathQuery>
</ValueExpression>
<Operator>GREATER_EQUAL</Operator>
<ValueExpression>
	<Value Type='Integer'>1</Value>
</ValueExpression>

Resolves to:
3
GREATER_EQUAL
1
Using Compare Type Integer

RESULT = MATCH


An example trace where the expression does not match looks like:

Filter Instance (00000000077D77F0):
Evaluating Simple Expression:
<ValueExpression>
	<XPathQuery Type='Integer'>Property[@Name='PropertyNumValue']</XPathQuery>
</ValueExpression>
<Operator>LESS</Operator>
<ValueExpression>
	<Value Type='Integer'>1</Value>
</ValueExpression>
Resolves to:
3
LESS
1
Using Compare Type Integer

RESULT = NOT A MATCH


In case of errors:

Issue

Possible solution

Format information not found

This may be the case wherein the All.tmf file is not properly constructed/not present. Navigate to the Server or Agent folder on the installation path, and delete the All.tmf file if present. If not, go ahead and open an elevated command prompt and run: StartTracing.cmd -> StopTracing.cmd -> FormatTracing.cmd. The last command may take some time, so please be patient. Once this is complete, you should have a new All.tmf file and the error should not occur.

Tracing does not seem to working for Agent:

You must wait for override to be delivered to the Agent. Wait until the WorkflowTraceOverrideMP Management Pack has been delivered (i.e. Event ID: 1201) and activated on the Agent

cdvvvv.

Changing filter causes tool to hang:

We do not recommend changing filters in between a trace session. Please choose your filter (Default or None), at the beginning of the session. This is because of the large volume of noisy logs that tend to accumulate. Hence, we recommend sticking to the default filter to avoid noise and hanging issues.

Leave a Reply

Your email address will not be published.