Launch a task, see what the agent is doing. It’s that easy.
This post is going to be short and sweet, right to the point with brief instructions on how to get agent trace data in front of your eyeballs as fast as possible.
There are basically two types of SCOM traces:
1) General – Capture any activity from a given category of agent activity. Categories such as: APM, Native, Script, Managed, ConfigService,DAS, Failover, UI, Advisor, ApmConnector, BID, and NASM.
2) Specific – Capture any activity related to a specific workflow for a specific object instance only.
- Select object. (Any object.)
- Launch task: Run Agent Trace
Set trace duration.
Specify destination path to copy trace files.
The trace runs, collects data, formats the data for your viewing pleasure, compresses the files into a .zip archive, then copies the .zip file to a location of your choosing; ideally a shared folder on your mgmt server.
1) Create an override for a specific workflow, for a specific object instance.
This is no ordinary override. It’s a special override that cannot be created directly from the Console. The override is for an little known property, “TraceEnabled“. The easiest way to create this override is to:
a) Create a new management pack in the Console. Name it something extremely simple like, “TRACE”.
b) Override a workflow for a specific object instance; set “Enabled” = True. Save it in your new MP.
c) Export your new MP to somewhere simple like, C:\Temp.
d) Open the MP with Notepad, find the “Enabled” property, change it to “TraceEnabled“.
e) Save the .XML file. Import the modified management pack.
Need more explanation? See this post: HERE
The agent that owns the specific object instance will soon receive this management pack. You will see events in the OpsMan log related to management pack receipt and digestion: 1200, 1201, 1204, 1210, 1109. Now the agent is ready.
2) Launch task: Run Agent Trace
a) Override task parameters:
TraceSeconds: When do you expect the workflow activity to occur? Set the trace duration accordingly. In addition, set the WriteActionTimeoutSeconds to be at least twice the TraceSeconds value. Not only must the trace collect the data for the specified duration, but the data logs must be formatted so humans can read them; this takes quite some time.
CopyToThisRemotePath: Ideally this is a shared folder path which has already been configured to allow the target agent machine to write to the folder.
“Specific” trace task output example:
Note: if the ‘all.tmf’ file does not yet exist, the task will automatically create it.
|AgentToolsPath||The path to your agent “Tools” folder. You would only use this if your agent tools path is nonstandard and the agent install path is incorrect in the registry. This would be highly unusual.||D:\SCOMAgent\Tools|
|CopyToThisRemotePath||Path where you want the logfile .zip archive copied. Ideally this would be a shared folder on your SCOM mgmt server. |
Note: If no value is provided for this parameter, the script will automatically attempt to copy the archive to this path on the parent management server:
Will clean up any trace files created during the task execution.
This is only applicable if the trace Type is “General”, otherwise this is ignored. This will determine the categories of activity that the trace will collect.
– Basic: Native, Scripted, Managed
– Extended: Basic + ConfigService, DAS, Failover, UI
– Full: Basic + Extended + Advisor, APM, ApmConnector, BID, NASM.
Maximum size to allow etl trace files.
|OutPath||The path where the trace files get written. The default value is typically fine.||D:\Temp|
|SpecificTraceName||This is only applicable if the trace Type is “Specific”, otherwise it is ignored. This is what the trace file will be named.||SQLAgentJobsDiscoveryTrace|
|TraceSeconds||For how long should the trace run? If you are spying on a timed workflow, override the workflow interval to something more aggressive, only temporarily. Then set the TraceSeconds to 10 seconds higher than your workflow interval to ensure you capture the activity. When done, don’t forget to revert your workflow interval override.|
Make sure you set WriteActionTimeoutSeconds for at least double this value.
(WF Interval: 120)
(WF Interval: 200)
|Type||General | Specific|
– General will capture all workflow activity from the categories you include with GeneralGuidLevel.
– Specific will capture only activity from a specific workflow for a specific object instance. Specific requires a special override to exist. See instructions above.
|WriteActionTimeoutSeconds||This is how long the workflow will run before the agent will abandon the the effort and terminate the task. This is to prevent tasks from hanging if something should fail. This value should be at least double the TraceSeconds value because the script can take a while to format all of the trace data into .log files.||<at least double the TraceSeconds value>|
|WriteToEventLog||true | false|
Controls task activity logging.