Reading *EOF*

Started by oradba4u, August 01, 2015, 05:14:39 PM

Previous topic - Next topic

oradba4u

All:

(Using Winbatch 2008c)

Is there a way to quickly read how many lines an ASCII text file has?
I want to be able to do this so I can dynamically dimension arrays, set countdown counters, etc. and do this quickly.
Some of these files can contain millions of lines.

Any ideas?

As always, thanks in advance

JTaylor

Two approaches come readily to mind...

   FileGet()/ItemCount  using @CL or @LF as a delimiter (whatever is appropriate).

   BinaryStrCnt().  Searching for above delimiter.


Jim

oradba4u

Here's my code snippet:

F=FileGet("abc.txt")
Num=ItemCount(F,@LF)
MsgTxt=StrCat(Num," Records")
Message("There are:",MsgTxt)
exit

When I run it, I get:
"VMalloc error - VirtualAlloc failed" message.
(File has about 7,000,000 lines)

Any ideas? Thanks

JTaylor

That is why I also suggested the Binary approach.  If the files are VERY large then that would be the better approach.

Jim

JTaylor

If your system lacks the resources to handle the data in that fashion you will probably need to split it into smaller chunks.

Another approach would be to load it into an array and use the ArrInfo() function....again, this assumes your system can handle the load.

Jim

stanl

Quote from: JTaylor on August 01, 2015, 06:22:40 PM
That is why I also suggested the Binary approach.  If the files are VERY large then that would be the better approach.

Jim

Might also consider the .net Streamreader through the WB CLR. There are a couple of posts using that in the Tech DB. I believe it has a lines or linecount property that may give what the OP is after.

td

Since the OP is using a 2008 version of WinBatch he doesn't not have access to CLR hosting.  FileGet is limited to a file of somewhere around 80-90 MB max so the error is not surprising.  The suggested binary buffer approach would be good up to around 350 MB file size depending on the execution environment.  If a straight binary buffer is too small then the it will be necessary  to take the divide and conquer approach.  The topic concerning replacing NULLs in this board has an example that can be adapted to the OP's purposes.

Possibly relevant topic:

http://forum.winbatch.com/index.php?topic=1429.0
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Should have also mentioned that the ArrayFileGet approach is a good solution as long as your file isn't to big to get into memory all at once.  The only downsides are that the function has a bit more memory and CPU overhead than the binary buffer approach. The obvious upside is that your file is loaded and placed in an array in a single step.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

I just remembered WB works with PHP (even version 2008) and you can harness something as simple as:


<?php 
  $file 
"somefile.txt"

  
$lines count(file($file)); 

  echo 
"There are $lines lines in $file"
?>


smarr

if this could help :
#definefunction NbLinesInFile(myfile)
   intcontrol (73, 1, 0, 0, 0)
   bb = 0
   fs = Filesize(myfile)
   if !fileexist(myfile) then return - 1
   if fs == 0 then return 0
   bb = binaryalloc(fs)
   fs = binaryread(bb, myfile)
   nblf     = Binarystrcnt(bb, 0, fs - 1, @lf)   
   nbcrlf   = Binarystrcnt(bb, 0, fs - 1, @crlf)
   nbcr     = Binarystrcnt(bb, 0, fs - 1, @cr)

   maxrc    = max(nblf, nbcrlf, nbcr)
   if maxrc == nbcrlf
      sep = @crlf
   else
      if maxrc == nblf
         sep = @lf
      else
         sep = @cr
      endif
   endif
   lsep  = strlen(sep)
   lastc = Binarypeekstr(bb, fs - lsep, lsep)
   if lastc <> sep then maxrc = maxrc + 1
   binaryfree(bb)
   return maxrc
:wberrorhandler
   if bb <> 0 then binaryfree(bb)
   return -1
#endfunction