Unchanged Application Suddenly Hangs on One Machine Only

Started by DAG_P6, June 26, 2013, 05:57:53 PM

Previous topic - Next topic

DAG_P6

This morning, I learned that an application that has been running trouble free on about a half-dozen machines in various locations around the US is suddenly hanging at one of them.

Environment details follow.

  • The application is a compiled EXE, compiled with WinBatch Compiler 2002K by David Gray (#xxxxxx). According to its file system property sheet, it was compiled Friday, 20 May, 2005, 12:59:30.

  • The operating system is Microsoft Windows XP, with Service Pack 3, fully updated through last week.

  • The same application is installed in all locations.

  • The location in question is one of two that were visited last week by home office personnel. On both visits, the machines at those locations were added to local area networks that have stealth Internet gateways, and were brought completely up to date with regard to OS and application patches.

  • This program runs frequently during the day, including numerous times during the course of the visits.

  • Beginning yesterday, the St. Louis office, which was the second of two visited last week, began reporting that it would hang, requiring an End Task to force the main program to resume, and relinquish control of the custom POS software of which the WIL script is a part to the operator.

  • This morning, we replaced the program file from a known good copy, which had no effect.

  • Almost immediately after it starts, the program opens a box, and begins writing messages into it. We were observing, via LogMeIn, and saw the box appear at launch, but no messages followed.

  • The only records in the Application Event Log are the Application Hang records, recorded by the task manager when the manager cancels the program.

As WIL scripts go, this one is very elementary. It opens a file, reads a handful of records from it, reformats the records, and writes them into a new file. It is one of maybe two of my production WIL scripts that doesn't use any extenders.

We are at a loss to explain why it is suddenly hanging, at this location, where it has run without incident since the office opened in 2010.
David A. Gray
You are more important than any technology.

Deana

These types of issues can be difficult to track down. My first recommendation is to try running the code using DebugTrace. This will require a special debug version to be compiled to run on this system. Run the script until it hangs. Then inspect the trace file for the last successful line of execution. You should then locate the very next line of code in your source code. This will tell us exactly what line of code it is hanging on. This might give us a clue.

There have been a few fixes to WinBatch since version 2002K that might address the issue:

WB 2003K  Nov 19, 2003 : If a BoxDataClear error was suppressed, it could cause the program to hang.
WB 2004B  May 19, 2004 : Fixed a problem with BoxDrawText hanging with "alignment" mode 32.
WB 2006B  Apr   3, 2006 : Add Work-around for problem in WebBrowser control that sometimes causes WIL dialogs that use the control to hang on exit.
WB 2009B  Jan 28, 2009 : Fixed problem in TimeDelay function that caused the function to sometimes hang in dialog user-defined-callback procedures.


First, Lets determine the exact line of code its hanging on...
Deana F.
Technical Support
Wilson WindowWare Inc.

DAG_P6

Quote from: Deana on June 27, 2013, 08:25:41 AM
These types of issues can be difficult to track down. My first recommendation is to try running the code using DebugTrace. This will require a special debug version to be compiled to run on this system. Run the script until it hangs. Then inspect the trace file for the last successful line of execution. You should then locate the very next line of code in your source code. This will tell us exactly what line of code it is hanging on. This might give us a clue.

To get a valid test, it seems to me that the special debug build should be compiled with the same version of the compiler that built the original. The oldest compiler to which I have ready access is 2005G, and my installed compiler is 2013B. Since having two compiler versions installed side by side is probably risky, I would need to uninstall the newer one, install the older one, create the build, then reverse the whole process. This isn't a fun prospect.

Quote from: Deana on June 27, 2013, 08:25:41 AM
First, Lets determine the exact line of code its hanging on...

Without even a debug build, I can tell you this much. The first few executable lines of the script are as follows.


BLANK_STRING_P6C = ' '
COMMA = ','
EVENT_DATE_POS = 4
EVENT_LAST_NAME_POS = 8
NULL_STRING_P6C = ''
FALCON_DIR = 'C:\F2000'
FALCON_EVENT_LOG = 'LocalEvent.tmp'
LOG_FILE_BASE_NAME = 'PTIEVTLG.LG'
NEWEST_BACKUP_LOG = 'PTIEVTLG.LOG'
FAIL_SAFE_BACKUP = StrCat ( NEWEST_BACKUP_LOG , '.BACKUP' )
OLDEST_FALCON_LOG = 'PTIEVTLG.LG9'
PIPE_CHAR = '|'
QUOTE_CHAR = '"'
STRINDEX_START_AT_BEGINNING = 0
WINDOW_CAPTION = 'PTIEVTLG'

BoxTitle ( WINDOW_CAPTION )
if !WinActivate ( NULL_STRING_P6C )
Dummy =  MessageBox_P6C ( WINDOW_CAPTION , 'Aborting: Cannot activate window.' , 16 )
exit
endif
Dummy = BoxText ( StrCat ( 'Checking for required directory ' , FALCON_DIR ) )


Although the box appears in the foreground, the first message is never displayed. This suggests to me that it hangs during the startup phase, or, perhaps, given your list of fixed bugs, in the course of displaying the first message.


Dummy = BoxText ( StrCat ( 'Checking for required directory ' , FALCON_DIR ) )


Moreover, the very same script (same build, same libraries, etc.) runs without issue at a half-dozen or so other locations, including the home office in Fort Worth, where I am working.

Quote from: Deana on June 27, 2013, 08:25:41 AM
There have been a few fixes to WinBatch since version 2002K that might address the issue:

WB 2003K  Nov 19, 2003 : If a BoxDataClear error was suppressed, it could cause the program to hang.
WB 2004B  May 19, 2004 : Fixed a problem with BoxDrawText hanging with "alignment" mode 32.
WB 2006B  Apr   3, 2006 : Add Work-around for problem in WebBrowser control that sometimes causes WIL dialogs that use the control to hang on exit.
WB 2009B  Jan 28, 2009 : Fixed problem in TimeDelay function that caused the function to sometimes hang in dialog user-defined-callback procedures.


Although it won't identify the root cause, perhaps I should just create a new release build, that runs against the 2013B interpreter, which I'll be distributing soon, or against the 2012B interpreter that is already installed at that location for use by a newer script, which also hangs, but eventually executes, when called upon to perform one of its three assigned tasks. The other two tasks run without issue. The newer script is installed at 14 other locations, where it runs without issue, and has done so for about 1 1/2 years.

What do you think of this idea?
David A. Gray
You are more important than any technology.

Deana

Yes I definitely recommend using the latest version. No point in going through the hassle of reinstalling and recompiling an older version.  If the issue is in fact a bug that has already been addressed, then you will need to update anyways.  Try recompiling using latest version of the software. Test and see if you are able to reproduce the hang. If you are then create a DebugTrace compile and contact us with the offending code and trace output.
Deana F.
Technical Support
Wilson WindowWare Inc.

DAG_P6

Will do, next week. The client and I discussed your reply and my response over lunch today, and he agreed that we should use the latest compiler.

He also provided some new information, which the location manager brought to his attention this morning. She said that the program works OK at the beginning of the day, following the required morning restart of the machine. The program becomes progressively less responsive as the day progresses. I told Kevin that this suggests to me that something else, such as a device driver or service, that runs continuously, may be leaking memory. Does that make sense to you?
David A. Gray
You are more important than any technology.

td

Symptoms like that  could be caused by almost anything.  It could be malware, software, driver or hardware. You can use task manager to check for any processes running amok. I am sure you already know how to check for malware. Hardware problems can be a bit tricky.  It could be as simple as a block or failed fan causing the system to overheat or something more difficult to diagnose like a failing hard drive, NIC, SIMM or motherboard. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

I've mentioned many of the above to Kevin, although I had forgotten about failing memory chips, and hadn't thought about a fan, although both make sense. Since the machine is locked down and pretty well shielded from the Internet, malware is pretty far down my list of suspects. However, since the machine is 4 years old, and runs 24/7, the first theory that I put forth is a failing hard drive.
David A. Gray
You are more important than any technology.

td

You have probably already done this but you could have him check the event logs for anything out of the ordinary.  I can't say that I have ever had much luck on that front but you never know. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

I've had the same experience with the event logs, and you are correct that I had already looked there. The only messages of interest were the "Application Hang" messages that get written when she uses the task manager to close it. Until we get this resolved, we've shown her how to keep the task manager open, so that she can end it quickly when it stalls.
David A. Gray
You are more important than any technology.

td

Kevin. Her. I must be showing my inner hillbilly.  I also see you mentioned the event log in your original post.  Anyway, if the system has a S.M.A.R.T. drive you can query the system for drive diagnostic info.  I believe WMI offers support for this.  You didn't mention the Windows version but on 2008/Vista and newer systems you can type "perfmon /report" from the command line and eventually get a detailed diagnostic view of the system. I can't recall if XP/2003 has an equivalent. In any event there is no guarantee it will tell you anything useful but it might be worth a try.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

Your inner hillbilly can relax. Kevin is in the Fort Worth office. The on site manager's name is Michelle.

The system is Windows XP.
David A. Gray
You are more important than any technology.

td

Quote from: DAG_P6 on July 01, 2013, 06:23:12 PM
Your inner hillbilly can relax. Kevin is in the Fort Worth office. The on site manager's name is Michelle.

Save yet again from another faux pas.

Quote
The system is Windows XP.

With all the bloat in newer versions of Windows, MSFT did managed to add some sometimes useful diagnostic features.  But you should be able to track down some diagnostic tools and techniques for the system, if necessary.   If you haven't done so yet, you could consider checking the PC's manufacturer. They are sometimes a good source.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

After close of business on Monday, Kevin installed the new version, unchanged apart from being compiled against the latest WIL runtime library. As of mid-afternoon, on a day which was as busy as it usually is at this time of the month, when rent for the storage lockers is due, it was back to its usual speedy, in-and-out behavior. Barring something unexpected, there won't be any further investigation.

Nevertheless, all of the suggestions, including your latest, are appreciated, and call attention to why this is such a great board.
David A. Gray
You are more important than any technology.

td

So a new version of WinBatch fixed the problem?   We have had a small number of reports over the last few years from users having problems with ~2005 and older versions of WinBatch on XP/2003 systems.  The issue seems to revolve around file system access in some way but we have never been able to reproduce the problem and it has always been resolved via an upgrade to a later version.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

I am relieved to know that there have been similar sporadic reports. Thank you, Sir!

The strangest part is that the program has been running since 2010 on the very same machine, sitting atop the same operating system. The difference between now and then is that, until now, the machine has been disconnected from the Internet, and hasn't received any OS updates since then. A few days before the problem surfaced, it was connected to the Internet, and the OS was brought up to date.

Although it was ostensibly up to date when Kevin left St. Louis, we all know that updates for the updates sometimes don't appear until a day or two later, on a subsequent Windows Update scan. Maybe a late update broke it; I haven't scanned their event logs with that in mind. With the problem apparently solved, I have no such plans.
David A. Gray
You are more important than any technology.

td

The signature of the events I mentioned was different in that they produced memory access violation exceptions instead of just a hung process.  So the connect with your issue is not particularly strong but it certainly isn't ruled out either. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

Quote from: td on July 05, 2013, 02:08:38 PM
The signature of the events I mentioned was different in that they produced memory access violation exceptions instead of just a hung process.  So the connect with your issue is not particularly strong but it certainly isn't ruled out either.

I understand. Thanks for the further clarification.
David A. Gray
You are more important than any technology.