Script Restart on Error?

Started by vulture, December 21, 2017, 02:05:13 PM

Previous topic - Next topic

vulture

We have a few scripts that copy many files to different network shares. Just once every so often, this fails and a script stops. We've not been able to determine why this happens but we don't suspect Winbatch of being the problem. We think that it may be a momentary lapse in connectivity. For these particular scripts it's just a matter of firing them off again and then everything completes without issue. Is there any error handling code that we could put at the beginning of the script that would cause the script to automatically restart itself should this intermittent error occur?

td

Check out the ErrorMode function and IntControl 73 in the Consolidated WIL Help file.  Both can be used for error handling.  Which one you chose depends on the error level of the error and your personal preferences.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow

I would recommend against the "general error trap" idea - that is, of trying to catch it in the script itself.  There's just too many ways a script can fail - you can't anticipate them all - and, as Marty used to say, "ERRORMODE(@OFF) turns a 5 minute debugging project into a 6 month debugging project."

Rather, I recommend the "two scripts" approach, which I've used for my own "mission critical" script projects.  What you do is have two scripts running - the first is very small and cannot fail.  All it does it run the other one repeatedly.  So, if the second script blows up for any reason, it just gets run again.  Basically, the first script consists of just:

:loop
RunWait("OtherScript.wbt","-go")
goto loop

Furthermore, you can and should make it all one script.  So, the above code becomes something like:

WHILE !IsDefined(Param1)
RunWait("ThisScript.wbt","-go")
ENDWHILE
; ... rest of script here.

Try it out.  It works quite well.


vulture

Thank you. Will do some testing when a little free time arises.

td

Quote from: snowsnowsnow on December 25, 2017, 10:40:07 AM
I would recommend against the "general error trap" idea - that is, of trying to catch it in the script itself.  There's just too many ways a script can fail - you can't anticipate them all - and, as Marty used to say, "ERRORMODE(@OFF) turns a 5 minute debugging project into a 6 month debugging project."

Rather, I recommend the "two scripts" approach, which I've used for my own "mission critical" script projects.  What you do is have two scripts running - the first is very small and cannot fail.  All it does it run the other one repeatedly.  So, if the second script blows up for any reason, it just gets run again.  Basically, the first script consists of just:

:loop
RunWait("OtherScript.wbt","-go")
goto loop

Furthermore, you can and should make it all one script.  So, the above code becomes something like:

WHILE !IsDefined(Param1)
RunWait("ThisScript.wbt","-go")
ENDWHILE
; ... rest of script here.

Try it out.  It works quite well.

Actually, using either of the WinBatch error handling mechanisms is not a "general error trap idea".   WinBatch error handling can be targeted to a specific line or even a specific error with a couple of lines added to a script.  It is also more efficient than starting multiple processes. 

Assuming that the OP's script is stopping because of a WIL error using the second process will not fix the problem because the error will still halt the script unless some form of error handling is used.  If the script is stopping because of internal logic instead of a WIL error then the logic needs to be changed but a second process is not necessary to correct that problem.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

vulture

The script usually runs fine. We believe the intermittent error to be due to something temporary in the network or Windows. It has been impossible to replicate the error manually and it only happens at most like 4 times per month on scripts that are run 5 days a week. It's been requested that since these scripts can be run over and over again without issue, it has been requested that the script be modified to just restart itself if it encounters and error. It's surely my lack of thorough knowledge of Winbatch, but I'm not seeing anything built in that will cause the script to suppress the error from the user and restart itself. Is that correct?

stanl

Quote from: td on December 26, 2017, 08:50:16 AM
WinBatch error handling can be targeted to a specific line or even a specific error with a couple of lines added to a script.  It is also more efficient than starting multiple processes. 

I agree with Tony here. of perhaps some interest... WB has a DirExists() function that handles UNC paths.  I've used it in two scripts as part of a KeepAlive() UDF I wrote. I used it in conjunction with a log file and if KeepAlive() returns false I would process accordingly. 

td

Quote from: vulture on December 26, 2017, 11:24:29 AM
The script usually runs fine. We believe the intermittent error to be due to something temporary in the network or Windows. It has been impossible to replicate the error manually and it only happens at most like 4 times per month on scripts that are run 5 days a week. It's been requested that since these scripts can be run over and over again without issue, it has been requested that the script be modified to just restart itself if it encounters and error. It's surely my lack of thorough knowledge of Winbatch, but I'm not seeing anything built in that will cause the script to suppress the error from the user and restart itself. Is that correct?

You have been offered three or four solutions - depending on how you count.  Perhaps you need to better understand the exact nature of the error when it occurs so you can choose the best solution for you.  And no you are not correct because you can use built-in WinBatch functionality to capture the error and then continue processing.  This means there is likely no reason to restart your script to recover from an error.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

ChuckC

Tossing in my own $0.02 worth of commentary...

On occasion, I've had situations where after handling an error or exception, say, in C++ code, that I considered the process to be "polluted" to the point where I wanted a complete termination of the process and re-launching of the program in a new process so that things like orphaned COM object references, leaked handles, etc..., were cleaned up for me.  Likewise, I've had similar situations happen with WinBatch scripts.  In those situations, my preferred way of handling things, especially when it comes to errors/exceptions of the fatal variety that can't be caught/handled, is to allow the process to terminate, but have the process running as a native NT service that is configured to be automatically restarted by the SCM [Service Control Manager] if it terminates [with an error].  This guarantees that the process will always be restarted no matter what caused it to terminate with an error.

td

WinBatch error handling can handle any WIL error - an error that would result in a WinBatch error message dialog and cleanup can be accomplished in a fairly straightforward fashion if the script is well written.  However, it is possible to have a WinBatch process corrupted by something like a Windows bug or some WinBatch bug introduced by your friendly local developer.  For example, Windows SMB protocol has been known to trash file handles or memory in a process.  Under those or similar circumstances your solution or what snow++ suggested seems appropriate. 

Based on the OP's description the OP may have encountered the infamous Windows share caching problem.  If that is the case, trapping the error on the offending line, adding a bit of a time delay and retrying should be sufficient.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

As a hindsight, just out of curiosity - if the WB script just simply terminates would that show up in any of the event logs?

td

It should but such events are almost always accompanied by some visual indicator like one of MSFT's famous exception message windows.  It is conceivable that a bug could corrupt a process's program counter is such a way that the process simply terminators in what appears to be a normal way.  Or a poorly written application could trap exceptions but not report the error.  In both of these circumstances,  an event might not be added to the event log.  On new versions of Windows, however, the OS may report to the event log even if the application traps the error.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade