Author Topic: Script execution speed  (Read 3733 times)

hdsouza

  • Full Member
  • ***
  • Posts: 142
Script execution speed
« on: January 21, 2018, 06:19:01 am »
I have been using the traditional approaches to read a file and filter the required lines out, ( like for example, filarray_LCExtract = ArrayFileGet(File_LCExtract)  .... Handle_PreAnalysed = FileOpen(File_PreAnalysed, "READ") ....  etc ) and they have worked fine on normal sized files.

But I have a CSV file with 400,000 lines and each line has 20 values to extract and filter on.
This makes processing extremely long.... hours. So I attempted using powershell. More specifically i used System.IO.File.ReadLines() with powershell. It returns all the lines of a file, but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory (https://msdn.microsoft.com/en-us/library/dd383503.aspx). This  approach managed to get a total processing time of 9- 12 seconds (for 100,000 lines).

I could call the powershell script from winbatch, but I was thinking , can I do something similar inside Winbatch itself?

Here is the full Powershell script for reference:
Code: PowerShell
$File_NotesDownload = "C:\temp\NotesDownload.csv"
$File_NotesGood = "C:\temp\NotesGood.txt"
if (Test-Path $File_NotesGood) { remove-item $File_NotesGood }
$Start = Get-Date

$r = [IO.File]::OpenText($File_NotesDownload)
while ($r.Peek() -ge 0)
    {
     $Note = $r.ReadLine()
    $Note = ($Note -replace '"','').ToLower()
    $Loan_Id = ($Note -split ',')[0].trim()
    $Loan_Id = $Loan_Id -as [int]
    if (($Loan_Id -is [int])  -eq $false) {Continue}

    $Loan_Status     = ($Note -split ',')[5].trim()
    $Status_Valid = 0
    if ($Loan_Status.ToLower().Contains("current") -eq "true") {$Status_Valid = 1}
    if ($Status_Valid -eq 0) {Continue}

    $NeverLate         = ($Note -split ',')[13].trim()
    if ($NeverLate -ne "true") {Continue}

    $Ask_price       = ($Note -split ',')[6].trim()
    if ($Ask_price -gt 20) {Continue}

    $Loan_Maturity     = ($Note -split ',')[15].trim()
    if ($Loan_Maturity -ne 36) {Continue}

    $Markup_Discount = ($Note -split ',')[7].trim()
    if ($Markup_Discount -gt -0.01) {Continue}

    $NotesList += $Note + "`r`n"
    }
$NotesList | out-file $File_NotesGood
$r.Dispose()
$End = Get-Date
$TotalTime = New-Timespan -Start $Start -End $End
$TotalTimeSecs = $TotalTime.seconds
$TotalTimeSecs

 

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #1 on: January 22, 2018, 07:02:43 am »
Reading files in blocks into memory instead of all at once is nothing new.  The concept has been around at least since computers first had disk drives.  It has little impact on performance on modern computers.   In fact, it is just likely than not to slow things down a bit as make processing proceed more quickly.  Also, WinBatch FileOpen related functions read large files in blocks just like the FCL stream class you are using in your Powershell script.  So "store the entire contents in memory" is not the issue.

WinBatch for and while loops tend to be a bit slow.  This is imperceptible for loops that only iterate a few thousand times but can be noticeable when the iterations approach hundreds of thousands.  However, the difference you describe here is to large to attribute to just WinBatch loop slowness.  It suggests that there is something wrong with the way your WinBatch script is written.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #2 on: January 22, 2018, 03:22:38 pm »
Nope its a simple For/while loop. See my previous message.
We are missing the point though. I am not saying that winbatch is normally slow, but when you have 400,000 lines and each line has 20 values to filter on, thats when its slow

This is just one of the several options I have already tried
Code: Winbatch
filarray_LCExtract = ArrayFileGet(File_LCExtract)
Max_LCExtract = ArrInfo(filarray_LCExtract,1)
For AA_LCExtract = 1 to Max_LCExtract
      Line_LCExtract = strtrim(filarray_LCExtract[AA_LCExtract - 1])
      Line_LCExtract = strreplace(Line_LCExtract, '"', '')
      Loan_id              = strlower(strtrim(Itemextract(1, Line_LCExtract, ";")))
      Note_id              = strlower(strtrim(Itemextract(2, Line_LCExtract, ";")))
      Order_id             = strlower(strtrim(Itemextract(3, Line_LCExtract, ";")))
      Loan_Status          = strlower(strtrim(Itemextract(4, Line_LCExtract, ";")))
      Ask_price            = strlower(strtrim(Itemextract(5, Line_LCExtract, ";")))
     ;.. More filters and then followed by evaluation
Next
 

All i am asking for .. is there anything similar or can I call something similar to System.IO.File.ReadLines().. as in my previous post.
Thanks

kdmoyers

  • Full Member
  • ***
  • Posts: 227
Re: Script execution speed
« Reply #3 on: January 23, 2018, 04:07:23 am »
it would be interesting to know how much time that spends doing the ArrayFileGet and how much time doing the loop.
-K
The mind is everything; What you think, you become.

hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #4 on: January 23, 2018, 04:43:27 am »
I ran ArrayFileGet (as well as a  Handle = FileOpen(File_name, "READ") ) on just a fraction of the data and after 30 minutes or so I terminated the job.
While with the PS command I mentioned earlier the 100,000 line file was done in in 9 seconds.
Even with using the regular file read operations with powershell  ($Notes = Get-Content $File_NotesDownload) would have taken 20 minutes for 100, 1000 lines.

Its not just the number of line (or file size) but also the number of values that have to be extracted and evaluated in each line, which adds to processing time.

stanl

  • Pundit
  • *****
  • Posts: 793
Re: Script execution speed
« Reply #5 on: January 23, 2018, 04:53:55 am »
Seems like a choice has to be made. WB can run the PS script through CLR, but not sure of your validation steps.   For the record I wrote a generic CSV parser in WB 2013 which reads parameters from a cfg file opens a given csv and places select field names into Access tables.  Currently using it for data downloaded from MicroStrategy - and process over 1 million rows daily.  The largest block is between 150-250,000 rows.  The WB script uses FileOpen(), FileRead()  and ADO for recordset inserts.  Even 250,000 rows takes around 5 minutes, which is an acceptable coffee break.

It is perhaps futile for a WB versus PS debate - and I should know having entered a few in the last couple of years.  just .02

hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #6 on: January 23, 2018, 05:07:00 am »
Stan, you are right.
My intention was not to have a debate. All I was trying to find out was "Can System.IO.File.ReadLines()  somehow be used from inside winbatch". Is thats not possible then I could could just call the PS script from WB

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #7 on: January 23, 2018, 08:13:10 am »
Nope its a simple For/while loop. See my previous message.
We are missing the point though. I am not saying that winbatch is normally slow, but when you have 400,000 lines and each line has 20 values to filter on, thats when its slow

A couple of points need to be clarified.  WinBatch's implementation of looping structures tends to be a bit slower because it is interpreted and cannot be optimized the same way native or intermediate jit compiled code can be.  That only partially explains why your WinBatch script takes longer.

A second reason why your script takes longer is that you are using ItemExtract.  Each time you call it scans the entire line from the beginning.  You are scanning each complete line 7 times - counting the first StrScan and StrTrim.

Did some file load time benchmarking using the System.IO.File class in a WinBatch script using the class's ReadLines and ReadLine methods.  Of course, the ReadLine method appears to be the fastest because the initial call only reads a small part of the file into process memory.  This is a bit misleading because the time taken to read the file is spread out over the execution of the script when processing the entire file instead entirely at the beginning. 

The WIL FileGet function is much slower than ReadLines and the WIL BinaryRead function is about the same speed as dotNet ReadLines method.   BinaryRead can consume a 90,000 record (85 MB) file in about .03 seconds on a modern system.  FileGet takes about 4 seconds.

Finally,  there is nothing magical about the System.IO.ReadLine.  It is likely splitting the task of reading the file into a couple of threads which can have performance advantages under certain conditions but can also add additional critical section overhead.   The fastest implementation of file parsing in pure WinBatch is to use the Binary function for fast loading into memory and fewer passes over the data.  The speed is not likely to match your PowerShell script but it may (or may not) be acceptable.   
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


kdmoyers

  • Full Member
  • ***
  • Posts: 227
Re: Script execution speed
« Reply #8 on: January 23, 2018, 08:45:13 am »
can you allocate a binary buffer big enough to hold the whole thing?

if so, then a few lines of code could replace all those semi colons with commas, which might render the file readable by ArrayFileGetCSV.  That would probably not be any faster than ArrayFileGet, but would make all the ItemExtracts unneeded.

or, skip the ArrayFileGet entirely and process via BinaryTag functions.  By specifying LF and CR has start and end tags, a BinaryTag loop can scan a text file line-by-line.  It will take plenty of time to do 400K lines though, not sure if the net effect would be faster.
The mind is everything; What you think, you become.

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #9 on: January 23, 2018, 09:07:17 am »
I realize this isn't an answer to your question but if looking for ways to optimize the WinBatch loop...You might consider using Arrayize() on the extracted line rather than using ItemExtract()s.   You would eliminate that part and you could also eliminate the variable assignment.   For clarity you could assign the array element to a variable name at the beginning:

      Loan_id              = 0
      Note_id              = 1
      Order_id             = 2
      Loan_Status       = 3
      Ask_price            = 4

and then in the loop

     If array[ask_price] > 45 Then....


Also, I remember reading somewhere that the use of Continue can be a big slowdown as well.  No idea if that is still the case or if I just dreamed it and it never was but maybe Tony can comment on that issue.  If it is, maybe adjust the code so it falls through naturally.

Jim

kdmoyers

  • Full Member
  • ***
  • Posts: 227
Re: Script execution speed
« Reply #10 on: January 23, 2018, 09:31:23 am »
Also, ugly tricks like the following can eliminate a big set of ItemExtracts, depending on your data:

Code: Winbatch
  a = "abc;big boy;123;querty;wxyz;87654;ax;ewrt;hhyyrf;big deal;1234"
  parsedata(strreplace(strreplace(a,' ','~'),';',' '))
  message(param2,param10)
The mind is everything; What you think, you become.

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #11 on: January 23, 2018, 09:34:51 am »
Also meant to suggest that if you try using ArrayFileGetCSV() (shouldn't need to replace semicolons unless I have missed something) rather than looping through the array just use ArraySort, find the break point and then ArrayRedim() to remove the rows you don't want.  Sort on the next criteria, rinse and repeat.    Also, if you want get fancy rather than looping through the Array looking for the breakpoint just jump halfway, or whatever makes sense, check the value and then jump back or forward as needed by a certain increment.

Jim

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #12 on: January 23, 2018, 10:12:13 am »
can you allocate a binary buffer big enough to hold the whole thing?

You *might* be able to load up to about a 500MB or even larger file using BinaryRead.  It all depends on how much memory the script is using for other purposes.   
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #13 on: January 23, 2018, 10:28:50 am »
Got around to using a more reliable bench test and corrected the times in the post above.

The results of comparing the ReadLines method with the return object in a WIL Foreach loop to FileOpen and FileRead in a WIL While loop are interesting.   The ReadLines loop processed 90,000 lines in about 1.2 seconds while the FileRead loop took about 2.1 seconds.  The difference is likely attributable to the fact that FileRead does a lot of normalization checking on the file contents.   
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


kdmoyers

  • Full Member
  • ***
  • Posts: 227
Re: Script execution speed
« Reply #14 on: January 23, 2018, 11:23:42 am »
Just for grins, I tried the BinaryTag loop on a 1 million line log file I have laying around. 
113 seconds, including doing ParseData on every line. Your file might be larger though.

Code: Winbatch
  f = "G:\COMMON\LOG\BOELOG.LOG" ; 81Mb size, 1 million lines

debugdata("","dbg:":"start")

  fs = filesize(f)          ; size of file
  bb=binaryalloc(fs+1)      ; reserve space
  binarypokestr(bb,0,@lf)   ; lead with a LF
  binaryreadEX(bb,1,f,0,fs) ;   followed by the file itself

  xx = 0 ; count chars
  yy = 0 ; count lines

debugdata("","dbg:":"loop") ; one second to read in 81Mb

  struc = binarytaginit(bb,@lf,@cr)
  while 1
    struc = binarytagfind(struc)
    if struc == "" then break ; all done
    line = binarytagextr(struc,0)

    parsedata(strreplace(strreplace(line,' ','~'),';',' '))

    xx = xx + strlen(line)
    yy = yy + 1

  endwhile

debugdata("","dbg:":"end") ; 112 seconds to scan 1M lines

  message(yy,xx)

exit
 
The mind is everything; What you think, you become.

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #15 on: January 23, 2018, 01:55:59 pm »
Out of an abundance of curiosity and have more time than good sense, I create a bench test using something similar to the OP's posted WinBatch script.  I used an old test text file with 6,166,632 lines.  For the test, I used a FileOpen handle with FileRead, ItemExtracts, StrTrims and StrReplaces inside the loop.  The script took almost exactly 9 mins to run.   That works out to about 11,419 lines per second if my arithmetic can be trusted. 
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #16 on: January 23, 2018, 04:29:07 pm »
Thanks Kd , Appreciate the script.
Just two issues:
1) If i use struc = binarytaginit(bb_NotesDownload,@lf,@cr) it breaks out of the loop at start
If i use struc = binarytaginit(bb_NotesDownload,@lf,@lf), it skips every alternate line. (I believe the LF has been used to extract the earlier line) So it ends up doing half the lines
2) time taken for half the lines in the file = 24 seconds for 50, 000 lines.
I tried to make this WBT code similar to the PS code. So comparing apples to apples  that to 9 seconds for 100,000 lines

I have saved the data file to https://we.tl/xdjgPq58t1

Code: Winbatch
File_LCExtract = "c:\temp\File_LCExtract.txt"
Handle_LCExtract = FileOpen(File_LCExtract,"write")
TimeStart_Test1 = TimeYmdHms()

File_NotesDownload = "C:\temp\SecondaryMarketAllNotes.csv"
fs_NotesDownload = filesize(File_NotesDownload)        
bb_NotesDownload =binaryalloc(fs_NotesDownload+1)    
binarypokestr(bb_NotesDownload,0,@lf)   ; lead with a LF
binaryreadEX(bb_NotesDownload,1,File_NotesDownload,0,fs_NotesDownload)

struc = binarytaginit(bb_NotesDownload,@lf,@lf)
;struc = binarytaginit(bb_NotesDownload,@lf,@cr)
while 1
    struc = binarytagfind(struc)
    if struc == "" then break ; all done
    Line_LCDownload = binarytagextr(struc,0)
    Line_LCDownload = strreplace(strreplace(Line_LCDownload, '"', ''), ',', ';')
    parsedata(strreplace(strreplace(Line_LCDownload,' ','#~#'),';',' '))
    Is_Number = IsNumber(param1)
    if Is_Number == 0 then continue

    Loan_Status          = strreplace(param6, '#~#', " ")
    if Loan_Status != "current" then Continue

    NeverLate            = param14
    if NeverLate != "true" then Continue

    Ask_price            = param7
    if Ask_price > 25 then Continue

    Loan_Maturity        = param16
    if Loan_Maturity != 36 then Continue

    Markup_Discount      = param8
    if Markup_Discount > -0.01 then Continue

    FileWrite (Handle_LCExtract, Line_LCDownload)
endwhile
BinaryFree(bb_NotesDownload)
fileclose (Handle_LCExtract)

TimeEnd_Test1 = TimeYmdHms() ; only for testing
TimeDiff_Test1 = TimeDiff( TimeEnd_Test1, TimeStart_Test1) ; only for testing
Message("Time Diff", TimeDiff_Test1)
 

I removed all the assignments (ex: NeverLate            = param14) and the time dropped from 24 secs to 21 secs

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #17 on: January 23, 2018, 07:33:31 pm »
Your file does not have a Carriage Returns...only Line Feeds.

Jim

hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #18 on: January 23, 2018, 07:43:56 pm »
Your file does not have a Carriage Returns...only Line Feeds.

Jim
Thats correct. that the feature of a CSV file. I cannot change the source as its downloaded from a site.
I could add a CR to the end of each line, but thats increasing the processing time.

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #19 on: January 23, 2018, 07:46:06 pm »
A BinaryReplace() would probably be fast.   Just make sure you increase Buffer size as needed.

Jim

hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #20 on: January 23, 2018, 08:57:07 pm »
A BinaryReplace() would probably be fast.   Just make sure you increase Buffer size as needed.

Jim
That was good one. Thanks Jim. 
Got all the lines in now.. But time taken is still long --- 44 secs for 100,000 lines.
That includes the time taken by the first binbuf  operation ---  6 secs

Code: Winbatch
File_LCExtract = "c:\temp\File_LCExtract.txt"
Handle_LCExtract = FileOpen(File_LCExtract,"write")
TimeStart_Test1 = TimeYmdHms()
;------------------------------------------
; adding a CR to the end of each line.. and removing unnecessary chars
File_NotesDownload = "C:\temp\SecondaryMarketAllNotes.csv"
File_NotesDownMod = "C:\temp\SecondaryMarketMod.txt"
Filedelete(File_NotesDownMod)

fs_NotesDownload = filesize(File_NotesDownload)        
bb_NotesDownload =binaryalloc(fs_NotesDownload + 100000)
BinaryRead( bb_NotesDownload, File_NotesDownload )

BinaryReplace( bb_NotesDownload, @lf, @crlf ,0)
BinaryReplace( bb_NotesDownload, '","', ';' ,0)
BinaryReplace( bb_NotesDownload, '"', '' ,0)

BinaryWrite(bb_NotesDownload, File_NotesDownMod)
BinaryFree(bb_NotesDownload)
;------------------------------------------

fs_NotesDownMod = filesize(File_NotesDownMod)        
bb_NotesDownMod =binaryalloc(fs_NotesDownMod+1)    
binarypokestr(bb_NotesDownMod,0,@lf)   ; lead with a LF
binaryreadEX(bb_NotesDownMod,1,File_NotesDownMod,0,fs_NotesDownMod)

struc = binarytaginit(bb_NotesDownMod,@lf,@cr)
while 1
    struc = binarytagfind(struc)
    if struc == "" then break ; all done
    Line_LCDownload = strlower(binarytagextr(struc,0))
    parsedata(strreplace(strreplace(Line_LCDownload,' ','#~#'),';',' '))

    if IsNumber(param1) == 0 then continue

    Loan_Status          = strreplace(param6, '#~#', " ")
    if Loan_Status != "current" then Continue
    if param14 != "true" then Continue
    if param7 > 25 then Continue
    if param16 != 36 then Continue
    if param8 > -0.01 then Continue

    FileWrite (Handle_LCExtract, Line_LCDownload)
endwhile
BinaryFree(bb_NotesDownMod)
fileclose (Handle_LCExtract)

TimeEnd_Test1 = TimeYmdHms() ; only for testing
TimeDiff_Test1 = TimeDiff( TimeEnd_Test1, TimeStart_Test1) ; only for testing
Message("Time Diff", TimeDiff_Test1)
 

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #21 on: January 23, 2018, 09:04:43 pm »
You don't need to modify the file to use the BinaryTagInit.  For example:

Code: Winbatch
hBuf = BinaryAlloc(FileSize('C:\Temp\SecondaryMarketAllNotes.csv'))
BinaryRead(hBuf,'C:\Temp\SecondaryMarketAllNotes.csv')

;; Strip the column headers
Struct=BinaryTaginit(hBuf,'"', @Lf)
Struct=BinaryTagFind(Struct)

While 1
   Struct=BinaryTagFind(Struct)
   if Struct=="" then break
   strLine=binarytagextr(struct,0)
   parsedata(strreplace(strreplace(strline,' ','~'),';',' '))  
Endwhile

Using the above with your parsing produced a 22 second execution time on my system.  I think that could be reduced a bit but since your only concern seems to be the speed of execution you should stick with your Powershell script. As mentioned previously it is compiled before execution- at least as of V3.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #22 on: January 23, 2018, 09:44:08 pm »
Yeah...I think I can beat 44 seconds (It's on Kirby  ;) ) if I can figure out why ArraySort() never ends on the third Sort but not sure if I will hit 9 seconds, although I may come close.   I thought going from terminating after 30 minutes to 44 seconds was a viable option if you wanted to keep it native WinBatch but if 9 seconds or better is the only acceptable solution then, as Tony says, you probably just want to go that route and we can stop hijacking your thread with other suggestions :)

Always like to help but this also interested me as I do a HUGE amount of data processing and am always looking for ways to improve things so thanks for the posting even though your thread has been somewhat hijacked.

Jim

kdmoyers

  • Full Member
  • ***
  • Posts: 227
Re: Script execution speed
« Reply #23 on: January 24, 2018, 04:41:05 am »
<< I think I can beat 44 seconds (It's on Kirby >>
Go for it man, I'm out of ideas. :-)
The mind is everything; What you think, you become.

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #24 on: January 24, 2018, 08:36:17 am »
Yeah...I think I can beat 44 seconds (It's on Kirby  ;) ) if I can figure out why ArraySort() never ends on the third Sort but not sure if I will hit 9 seconds, although I may come close.   I thought going from terminating after 30 minutes to 44 seconds was a viable option if you wanted to keep it native WinBatch but if 9 seconds or better is the only acceptable solution then, as Tony says, you probably just want to go that route and we can stop hijacking your thread with other suggestions :)

Always like to help but this also interested me as I do a HUGE amount of data processing and am always looking for ways to improve things so thanks for the posting even though your thread has been somewhat hijacked.

Jim

I find this type of algorithm analysis discussion endlessly fascinating.  I always learn something and WinBatch enhancements have been known to come out of them.  For example, this discussion has provided some rationale for migrating the WIL interpreter to a JIT compiler model someday. 

It will be interesting to see if anyone comes up with a faster algorithm.  I have gotten it down to about 20 seconds on my systems using the binary buffer + ParseData approach and the OP's dataset but I suspect that more could be done.       
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #25 on: January 24, 2018, 08:54:19 am »
Take back the 20 seconds.  Using Arrayize and few other stupid programmer tricks was able to get it down to a little under 10 seconds.  Of course, your results may vary depending on your hardware and a few other variables.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #26 on: January 24, 2018, 09:04:29 am »
Cool...thought Arrayize() might be a good option.

Is it expected for multiple ArraySort()s to get progressively slower and then crash WinBatch?

The first two are fast as expected.  The third is MUCH slower and the fourth eventually crashes WinBatch or gives an error about not being able to access array contents.

Jim

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #27 on: January 24, 2018, 09:10:22 am »
The error is usually the result of passing in a zero element array but can be caused by other untoward events.  If that is not the case and you have a simple example that reproduces the problem, we will look into it a bit.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #28 on: January 24, 2018, 09:23:32 am »
Here is what I was attempting.  Sounds like you have the data file.  I just renamed it.  Thanks.

Jim

Code: Winbatch

File_LCExtract = ".\File_LCExtract.txt"
TimeStart_Test1 = TimeYmdHms()

File_NotesDownload = ".\td.csv"

#DefineFunction For_Loop(arr, filter, operator_x)

  rows = ArrInfo(arr,1)-1
  For x = rows to 1 by -1
    If arr[x,filter] %operator_x% Then
      If rows != x Then ArrayRedim(arr,x+1,-1,-1,-1,-1)
      x = 0
    EndIf
  Next
  Message(Filter:" ":ArrInfo(arr,1), ArrInfo(arr,2))

#EndFunction

arr = ArrayFileGetCSV(File_NotesDownload,0)
;Message(ArrInfo(arr,1), ArrInfo(arr,2))

ask_price       = 6
loan_status     = 5
neverlate       = 13
loan_maturity   = 15
markup_discount = 7

;   Is_Number = IsNumber(param1)
;   if Is_Number == 0 then continue

    ArraySort(arr, @LOGICALSORT, ask_price, 1)
    For_Loop(arr, ask_price, "<= 25")

    ArraySort(arr, @LOGICALSORT, markup_discount, 1)
    For_Loop(arr, markup_discount, "<= -0.01")

    ArraySort(arr, @STRINGSORT, loan_status, 1)
    For_Loop(arr, loan_status, '== "current"')

    ArraySort(arr, @STRINGSORT|@DESCENDING, loan_status, 1)
    For_Loop(arr, loan_status, '== "current"')

    ArraySort(arr, @STRINGSORT|@DESCENDING, neverlate, 1)
    For_Loop(arr, neverlate, "== 'true'")

    ArraySort(arr, @LOGICALSORT, loan_maturity, 1)
    For_Loop(arr, loan_maturity, "<= 36")

    ArraySort(arr, @LOGICALSORT|@DESCENDING, loan_maturity, 1)
    For_Loop(arr, loan_maturity, "== 36")


    ArrayFilePutCSV(File_LCExtract, arr)

TimeEnd_Test1 = TimeYmdHms() ; only for testing
TimeDiff_Test1 = TimeDiff( TimeEnd_Test1, TimeStart_Test1) ; only for testing
Message("Time Diff", TimeDiff_Test1)
 

 

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #29 on: January 24, 2018, 09:38:28 am »
It is error# 1637.    Behavior described above is very consistent.  I have even changed the order of the columns being sorted but doesn't seem to make a difference.

Jim

td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #30 on: January 24, 2018, 11:56:21 am »
Reduced the script to this line:

ArraySort(arr, @STRINGSORT|@DESCENDING, loan_status, 1)

to reproduce the problem.   The combination of dataset size and the repetitive nature of the data in the column is more than the sort algorithm can handle.  The function is consuming all the process stack space.  The only fix is to completely rewrite the function using a different and slower algorithm.  Needless to say, this is not going to be a 'quick' fix. 

Thanks for reporting the problem.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


td

  • Tech Support
  • *****
  • Posts: 2595
    • WinBatch
Re: Script execution speed
« Reply #31 on: January 24, 2018, 01:34:00 pm »
Things are not as dire as originally thought but MSFT's Spectre variant 1 bug fix makes debugging native code incredible slow sometimes.

The problem only occurs for sorts in descending order so the fix will be a lot simpler than stated above.
"Success is a lousy teacher. It seduces smart people into thinking they can't lose."
  - Bill Gates


hdsouza

  • Full Member
  • ***
  • Posts: 142
Re: Script execution speed
« Reply #32 on: January 24, 2018, 03:26:25 pm »
I find this type of algorithm analysis discussion endlessly fascinating.  I always learn something and WinBatch enhancements have been known to come out of them.  For example, this discussion has provided some rationale for migrating the WIL interpreter to a JIT compiler model someday. 

It will be interesting to see if anyone comes up with a faster algorithm.  I have gotten it down to about 20 seconds on my systems using the binary buffer + ParseData approach and the OP's dataset but I suspect that more could be done.       

Yes. Execution speed is always an interesting topic. I have several  loops  which can use some of the ideas discussed .. especially the PARAMS extraction.  I tend to like astatusbar.. I know its slow.. but I like to see what the script is doing.  Thanks all for your ideas

stanl

  • Pundit
  • *****
  • Posts: 793
Re: Script execution speed
« Reply #33 on: January 25, 2018, 03:35:33 am »
Having kept away from using large arrays it was interesting to see the algorithms in operation. Unfortunately, my use of WB is to support linked or PowerPivot data into Excel templates. Some of these connection updates to Microstrategy, SQL Server, Oracle can take 20-30 minutes - but everyone is on the same page with that [querying billing systems for over 21 million customers].

But a final question: could the original Powershell script the OP posted be converted to CLR code? And if so, how would it perform?

JTaylor

  • Pundit
  • *****
  • Posts: 919
    • Data & Stuff Inc.
Re: Script execution speed
« Reply #34 on: January 25, 2018, 05:57:44 pm »
Guess I should move on...the best I could do was 17 seconds :-(

Jim