BinaryRead vs FileRead

Started by jmburton2001, February 22, 2019, 05:40:36 PM

Previous topic - Next topic

jmburton2001

Windows 7 Pro 64
Intel i3-3220 @3.3GHz (no OC)
8GB RAM

My primary use for Winbatch is to create tools that allow me to look at stuff in logfiles that I'm interested in. I don't want to have to wade through 1,000's of lines so I read them with Winbatch and then drop the pertinent items into a text file. The files I'm interested in are getting larger by the day and a few have been over the 65535 limit for awhile. (Hence, the reason I worked on my astatusbar issue.)

When I was asking for guidance on the "HUGE STRING" issue, Jtaylor mentioned Binary operations. I'd always heard that Binary operations were much faster than "normal" operations. As my files grow, so does the processing time... so I thought I'd give BinaryRead a try.

I set up two scripts that are exactly the same and process the exact same file. In every case the BinaryRead is slower than the FileRead (Screenshot). This test is running from an SSD but some of my files are on old spindle drives and the processing times can easily exceed five minutes (which is excruciating).

The file size is approximately 6.67MB.

Is there anything I need to look into in order to speed up my processing time?

Note: "STUDIO" in the screenshots means that I ran it from "Debug -> Run" in Studio.

td

Without knowing how your scripts are written, it is impossible to comment.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

jmburton2001

Since I'm self taught, I gain most of my "training" from reading the help files, forum posts, and the tech support database. Much of that content is outside my sphere of knowledge and therefore my reading comprehension is subpar. My rudimentary understanding is that Binary operations take advantage of RAM because it's faster than disk based operations. I might be misinterpreting that, but that's why I included my RAM and processor speed as a potential source for the speed discrepancy.

The test file exceeded the limits of the attachments for this post so here is a link to the file I used

This is is the complete script for the FileRead operation.

Code (winbatch) Select
AddExtender("wwsop34i.dll")
AddExtender('WWHUG44I.DLL')

; Count total lines
FileName = "Test Log 65000.txt"
InputFilehandle = FileOpen(FileName, "READ")
FS=FileSize(FileName)
BinBuf = BinaryAlloc(FS + 1000)
if BinBuf == 0 then
   Message("Error", "Binary Allocation Failed")
  exit
end if
BinaryRead(BinBuf, FileName)
LogLines01 = BinaryStrCnt(BinBuf, 0, FS - 1, @CR)
BinaryFree(BinBuf)
FileClose (InputFilehandle)

MaxLines = "65535"
Result = huge_Divide(MaxLines,LogLines01)

;===== FILE READ ==================================

File = FileOpen(FileName,"READ")
linecount = Abs(huge_Multiply (LogLines01, Result))
StartTime = TimeYMDHMS()
Count = "0"
astatusbar(0,"Item Analysis","Reading Items",MaxLines,0)

For item = 1 To linecount

  line = FileRead(File)
  Count = Count + 1
  item = Abs(huge_Multiply (Count, Result))
  astatusbar(1,"Item Analysis","Reading item %Count% of %LogLines01%",MaxLines,item)

  ; Peform work
 
Next

FileClose(File)
astatusbar(2,"Item Analysis","Done",MaxLines,item)
EndTime = TimeYMDHMS()
TotalTime = TimeDiff (EndTime, StartTime)
astatusbar(2,"Item Analysis","Done",MaxLines,item)
Message ("File Read - STUDIO","Total lines processed = %Count%%@CRLF%Last line = %line%%@CRLF%%@CRLF%Elapsed time = %TotalTime%")
;Message ("File Read - COMPILED","Total lines processed = %Count%%@CRLF%Last line = %line%%@CRLF%%@CRLF%Elapsed time = %TotalTime%")

Exit


And this is the complete script for the BinaryRead operation.

Code (winbatch) Select
AddExtender("wwsop34i.dll")
AddExtender('WWHUG44I.DLL')

; Count total lines
FileName = "Test Log 65000.txt"
InputFilehandle = FileOpen(FileName, "READ")
FS=FileSize(FileName)
BinBuf = BinaryAlloc(FS + 1000)
if BinBuf == 0 then
   Message("Error", "Binary Allocation Failed")
  exit
end if
BinaryRead(BinBuf, FileName)
LogLines01 = BinaryStrCnt(BinBuf, 0, FS - 1, @CR)
BinaryFree(BinBuf)
FileClose (InputFilehandle)

MaxLines = "65535"
Result = huge_Divide(MaxLines,LogLines01)

;=== BINARY FILE READ ==================================

file = FileName
fsize = FileSize(file)
buffer = BinaryAlloc(fsize)

BinaryRead(buffer,file)
endoffile=BinaryEODGet(buffer)
StartTime = TimeYMDHMS()
astatusbar(0,"Item Analysis","Reading Items",MaxLines,0)
pos = 0
count = 0
While 1
   index = BinaryIndexEx( buffer, pos, @CRLF, @FWDSCAN,0)
   if index == -1 then break
   Count = Count + 1
   item = Abs(huge_Multiply (Count, Result))
   astatusbar(1,"Item Analysis","Reading item %Count% of %LogLines01%",MaxLines,item)
   linelen = index-pos
   linedata=BinaryPeekStr(buffer, pos, linelen)
   pos = index + 2
   eod = BinaryEodGet(buffer)-1
   if pos > eod then break
EndWhile

BinaryFree(buffer)

EndTime = TimeYMDHMS()
TotalTime = TimeDiff (EndTime, StartTime)
astatusbar(2,"Item Analysis","Done",MaxLines,item)
Message ("Binary Read - STUDIO","Total lines processed = %Count%%@CRLF%Last line = %linedata%%@CRLF%%@CRLF%Elapsed time = %TotalTime%")
;Message ("Binary Read - COMPILED","Total lines processed = %Count%%@CRLF%Last line = %linedata%%@CRLF%%@CRLF%Elapsed time = %TotalTime%")
Exit


Thank you for helping me better understand these processes!

JTaylor

While there are things I would do differently I don't see an issue with your FileRead option.   It takes less than a second to run.  If there is a slow-down it must be with the part you chopped out.

What is with the MaxLines stuff?  Is that to get around a limitation with the ShellOp Extender?   Why not just display the progress on a WinBatch window or dialog and skip all that?

Jim

stanl

Quote from: jmburton2001 on February 22, 2019, 05:40:36 PM
Windows 7 Pro 64
Intel i3-3220 @3.3GHz (no OC)
8GB RAM

My primary use for Winbatch is to create tools that allow me to look at stuff in logfiles that I'm interested in.

LogParser.   It is still IMHO the greatest thing Microsoft ever released for free.

JTaylor

If you want to drop the Extenders perhaps something like the following?  I am assuming all that is simply for the statusbar option so you have a visual cue?    The MOD stuff is simply for performance purposes.   The redrawing is resource intensive.  Again, this doesn't solve your problem, assuming there is a problem, because it isn't in this part of the code.

Jim

Code (winbatch) Select


WinTitle("","Item Analysis")
; Count total lines
FileName = "Test Log 65000.txt"
FS=FileSize(FileName)
BinBuf = BinaryAlloc(FS + 1000)
if BinBuf == 0 then
   Message("Error", "Binary Allocation Failed")
  exit
end if
BinaryRead(BinBuf, FileName)
lcnt = BinaryStrCnt(BinBuf, 0, FS - 1, @CR)
BinaryFree(BinBuf)

;===== FILE READ ==================================

File = FileOpen(FileName,"READ")
Count = 1

BoxesUp("300,300,500,400", @NORMAL)
  BoxColor(1,"0,0,128",7)
  BoxDrawRect(1,"0,0,1000,1000",2)

BoxNew(3,"50,250,950,500",0)
  BoxColor(3,"0,0,255",0)

BoxDataTag(1,"ACORN")
BoxDataTag(3,"WALNUT")


line = ""
While line != "*EOF*"

  If count MOD 50 == 0 Then
    BoxDataClear(1,"ACORN")
  EndIf
  line = FileRead(File)
  Box_Text = "Item %Count% of %lcnt%"

  ; Peform work

  Count += 1
  If count MOD 10 == 0 Then
    BoxDataClear(3,"WALNUT")
    b3_e = ItemExtract(1,((Count+0.00)/lcnt)*1000,".")+0
    BoxCaption (1, "Item Analysis - ":box_text)
    BoxDrawRect(3,"0,0,%b3_e%,1000",2)
  EndIf

; If IsKeyDown(@CTRL+@SHIFT) Then Break

EndWhile

FileClose(File)

Exit



kdmoyers

I've had good luck with using BinaryTag operations to scan big text files:

Code (winbatch) Select
  f = "G:\COMMON\LOG\BOELOG.LOG" ; 76Mb size, 1 million lines

  fs = filesize(f)          ; size of file
  bb=binaryalloc(fs)        ; reserve space
  binaryreadEX(bb,0,f,0,fs) ; gulp the file

  xx = 0 ; count chars
  yy = 0 ; count lines

  now = timeymdhms()

  struc = binarytaginit(bb,"",@lf) ; set up to scan text lines
  while 1
    struc = binarytagfind(struc)
    if struc == "" then break ; all done
    line = binarytagextr(struc,1)

    ;message('',line)

    xx = xx + strlen(line)
    yy = yy + 1

  endwhile

  now = timediffsecs(timeymdhms(), now)
  message(yy:'lines ':xx:'bytes',now:" seconds") ; 75 seconds on a slow machine

exit
The mind is everything; What you think, you become.

jmburton2001

Good Morning!

Yesterday the "honey-do list" reared its ugly head and ruined my fun! I sincerely appreciate everyone's pointers, examples, and explanations!

Unfortunately real life kicks in today and I probably won't get back to my project for quite some time. I'll be re-reading and studying all the replies to my three recent posts (large strings, astatusbar, and this one) when I get a chance.

Once again... I want to thank everyone for all your assistance!

You're awesome!