Intermittent Application Hangs, One Machine Only

Started by DAG_P6, August 22, 2013, 11:24:34 PM

Previous topic - Next topic

DAG_P6

Several weeks ago, I posted a query, "Unchanged Application Suddenly Hangs on One Machine Only" (http://forum.winbatch.com/index.php?topic=138.0). This topic is, more or less, a continuation of that topic. Both topics concern the same machine, which is installed in an office in St. Louis, MO, where it has run without issues since 2010, until approximately 19 July.

Initially, we thought the hangs were caused by a fairly old (circa 2005) compiled WinBatch script. After running a new version, compiled with WBC 2013B, which opened with a DebugTrace statement, we discovered two things.

  • When it stalled, there was nothing in the DebugTrace log.
  • When it ran, it never had anything to do, because a newer program, written in 2009, in Microsoft QuickBasic, had taken over its work on installations that had been upgraded from Falcon 2000 to StorLogix.

After we changed the batch file that launched the problem script to skip it on machines that run StorLogix, which includes the problem machine in St. Louis, the stalls started moving around. Initially, we thought they were confined to two other WBC programs, both newer than the program that started it all. However, in the last few days, we discovered that the stalls are not confined to the constellation of WinBatch programs that run on this machine and 15 others in various cities around the US.

Although one recent incident appeared to be happening with the CPU pegged at 100% utilization, the last two documented incidents happened when the CPU was idle (The System Idle process had 99% of the CPU utilization, the LogMeIn client had 1%, and everything else had 0%.).

This morning, when we reviewed the various Windows event logs, the only significant messages logged during the six hour run of the main batch file were hourly reports about the Adobe Flash update service waking up to check for updates, of which there were none during the run of NTCOMM.BAT.

Using a complete C++ program that I copied from the MSDN article, "Enumerating All Processes," at http://msdn.microsoft.com/en-us/library/windows/desktop/ms682623(v=vs.85).aspx,  I built a plain Jane console program to enumerate all processes, and ran it, via LogMeIn, on the machine in St. Louis, piping its output to a text file, Processes_CHIPPEWA_20130823_001221.TXT, which is attached to this article. Nothing jumps out at me; perhaps one of you will notice something that I've overlooked.

Since these stalls are happening only at this location, and not at about seven others that run the same constellation of applications and services, we are considering replacing the machine, which is a fairly significant undertaking. Before we make such a rash decision, we decided to solicit additional input.

Short of replacing the machine, I am considering suggesting that we disable the Flash update service, since the machine is locked down, and nothing that runs on it needs or uses Flash.

My ears are open wide.
David A. Gray
You are more important than any technology.

DAG_P6

Last night, the troublesome application stalled again at our St. Louis office. For the first time, I saw something that may suggest the source of the problem.

In the course of investigating previous active stalls, in which I connect to the machine while the program is stalled, I have noticed that some action usually causes the stalled program, EZXFER.LOG, a compiled WinBatch script, to come unstuck, complete its assigned task, and terminate normally. For example, it will take off and finish while I cycle through the tabs of the Windows Task Manager. I have even seen it take off when I start the Task Manager.

This morning, I connected following an overnight stall, in which the program eventually completed normally, though it did so well past the ten minute window that I allow before gathering the nightly activity report from the FTP server where the file exchanges between the home office in Fort Worth and the field offices take place. Comparing the text log file generated by the problem program, EZXFER.LOG and the three core Windows event logs, I noticed a very tight correlation between two events.

The following events are reported.





TimeLogMessage
2013/09/21 23:00:00DAYEND.LOGInitiating EZXFER task GetFromHomeOffice to get new data
2013/09/22 02:31:27SystemService Control Manager: The MpKsle1022c42 service was successfully sent a start control.
2013/09/22 02:31:43EZXFER.LOGEZXFER, version 1.12.17.1 (compiled 2012/10/11 13:38:01) Begin
These logs originate from separate sources, as follows.

  • DAYEND.LOG is maintained by WWLogger.EXE, a 32 bit console mode program, implemented in C, and compiled by the Microsoft Visual C++ 6.0 compiler. It is responding to commands that it gets from a batch file, NTCOMM.BAT.
  • System is the standard Windows System Event Log.
  • EZXFER.LOG is maintained by EZXFER.EXE, the compiled WinBatch script.
The relevant section of NTCOMM.BAT is as follows.


wwlogger C:\XFERHOLD\DAYEND.LOG Initiating EZXFER task GetFromHomeOffice to get new data
md C:\XFERHOLD\HomeOfficeToRemote
EZXFER.EXE -p@C:\XFERHOLD\FTP_LIST.TXT -tGetFromHomeOffice


Directory C:\XFERHOLD\HomeOfficeToRemote is created the first time the script runs, and is never deleted. Therefore, it always exists on subsequent runs. Since EZXFER.EXE was installed in late June of this year, last night qualifies as a subsequent run.

The production version of the script runs against WIL DLL version 6.12blb. Detailed information about the machine in the St. Louis office is in [/font][/color] (copy attached), which contains two files.

  • Windows_System_Information_CHIPPEWA.TXT is the raw output of the Windows XP version of the System Information accessory. Due to its size, this file is compressed into Windows_System_Information_CHIPPEWA.ZIP, a standard ZIP file.
  • Windows_System_Information_CHIPPEWA.XLSX is selected sections of the System Information output, organized into tables for easier reading. The workbook opens in a tab named Index, which contains hyperlinks to the other sheets. A workbook scoped named range, also named INDEX, expedites navigating from any sheet to the index, even when the index tab is invisible.
Is it possible that something in the WIL start-up code is stalling, and the context switch caused by the Service Control Manager gets it unstuck?
David A. Gray
You are more important than any technology.

Deana

So the MpKsle1022c42 service might be involved. What is that service exactly?

A Google search found that MPKSL.... is maybe part of Microsoft Security Essentials protection? Have you considered whitelisting your WinBatch Exes in Microsoft Security Essentials?
Deana F.
Technical Support
Wilson WindowWare Inc.

DAG_P6

Dear Deana,

Quote from: Deana on September 23, 2013, 09:50:59 AM
So the MpKsle1022c42 service might be involved. What is that service exactly?

A Google search found that MPKSL.... is maybe part of Microsoft Security Essentials protection? Have you considered whitelisting your WinBatch Exes in Microsoft Security Essentials?

Thanks much for calling the articles about MPKSL... to my attention. I neglected to say that I had already investigated them, and found that they are apparently related to UPnP devices. It just so happened that the SCM happened to send a start command to one just as it did. Shortly thereafter, there is another message, whose source is Microsoft Antimalware; that message records a signature file update.

FWIW, I'm still leaning toward a context switch being responsible for breaking the program stall. At the time of night when it runs, there is very little else happening on the machine.
David A. Gray
You are more important than any technology.