So I've been seeing some issues in our environment where the Symantec Management agents are installed but broken in some way, either WMI is broken or there are problems with the agent itself. Symptoms can vary from agent not showing any tasks ever ran, lots of tasks still in running state, patches not running, no policies assigned, no task server assigned, agent crashes in event log, etc... I found that using a combination of number of tasks still running, when the agent last checked for tasks, when agent last policy request was and which task server it was assigned to (broken agents usually show no task server assigned) gives me a good indication if the agent/WMI is broken.
I knew I wouldn't be able to rely on the CMDB to find agents that are really busted so I set out writing a PowerShell script to collect some basic data about the agents for me. What I have is a script that will take a list of computers, check if they're online and then try to collect some data about the agent via the WMI classes that it exposes. If the WMI query fails for some reason I write that out so I can go back and follow up and reinstall those agents or run the WMI fix. So far this has worked pretty well at identifying computers that are having issues. Further down the road I may include automating sending the agent reinstall job via aexschedule since we still have DS 6.9 around.
I tried to add enough comments to the script to make it easy to follow but here’s an outline of what’s being done.
First we need to setup a few things, our list of computers, 1 per line as a normal text file. Next we specify our output files, $OutFile is where everything is going to get output to as a CSV for easy import into excel and then $BrokenOutFile is going to be just the pc's that failed the WMI queries. Since we still have DS 6.9 I may at some point automate sending the agent reinstall job using aexschedule using this list but for now I want to know how/why the agent is broken.
After setting the paths for the input and output files I set a date threshold for how far back to go to check for running tasks. I have mine set to go back one week (-7 days) but you can go back as far as you want. Next we build some empty arrays to hold the data and then an old function I found that does an alternate way to query WMI. Sometimes WMI can be so broken that queries will hang indefinitely, this function fixes that with a timeout.
Now we can get into the meat of the script. First start looping through list of computers and try to ping them to make sure they’re online. If they are I start doing my WMI queries in a try-catch block so that if any of the queries fails it’ll jump down to the catch block and write out that WMI failed. The first query gets the task history and collects the ones were Status =0 and ReturnCode=-1 which corresponds to the tasks which are still in the running state and then use the count property to get the number of tasks still running. You’ll see in the script we need to convert from the WMI time format to something that’s more human readable, accomplished via some .NET wizardry.
Once all the data is gathered from WMI it’s added to a custom object that I created to hold the data while the script is running. The catch block and the else statement for if ping failed use the same technique. Once the list of computers is done, the files are written out to disc as CSV’s. At the bottom I left in some code that commented out that could be used to send the completed file in an email since sometimes the script can take a while to run against large groups of computers and it can be nice to just let it run over night or over a weekend.
Further down the road I might turn this into a server side script that will populate the data into the CMDB using a custom data class.
Let me know if there's any questions or improvements/feature enhancements.