WinBatch® Technical Support Forum

All Things WinBatch => WinBatch => Topic started by: NateT on June 19, 2015, 12:36:42 PM

Title: Remove NULL char from entire files
Post by: NateT on June 19, 2015, 12:36:42 PM
I'm trying to find the best code for taking a file and stripping out any NULL characters in the file.  FileGet has an option to do that in the function, so I simply went with:

FilePut(param1, FileGet(param1,""))

I just wanted to see if there were any arguments for using the Binary functions or any other ideas.  My testing accomplished what I wanted and it is very fast even on a 10 MB file.  I'm just questioning if it is really that easy or if there are caveats to doing it this way.
Title: Re: Remove NULL char from entire files
Post by: td on June 19, 2015, 02:00:36 PM
Count your blessings.
Title: Re: Remove NULL char from entire files
Post by: kdmoyers on June 22, 2015, 12:43:47 PM
Quote from: td on June 19, 2015, 02:00:36 PM
Count your blessings.

I think (might be wrong) that is Tony's short-winded way of saying:

Nope -- no caveats if it works.  You are fortunate that your files are that small, and that you are doing this in 2015 and not 2000, when the memory situation in most PCs was very different.  (Heck, I don't think FilePut was available in 2000).  When the files fit in handily memory, that method is safe and fast.

If your files get into the gigabytes, there are other less tidy options.
Title: Re: Remove NULL char from entire files
Post by: td on June 22, 2015, 09:50:24 PM
If memory serves, FileGet will run out of interpreter string space at somewhere around the 80-90 MB file size mark.
Title: Re: Remove NULL char from entire files
Post by: NateT on June 23, 2015, 06:10:58 AM
Quote from: td on June 22, 2015, 09:50:24 PM
If memory serves, FileGet will run out of interpreter string space at somewhere around the 80-90 MB file size mark.

Thanks.  I'll watch the file sizes.
Title: Re: Remove NULL char from entire files
Post by: DAG_P6 on June 29, 2015, 08:55:55 AM
Bigger files can be dispatched with a BinaryRead loop.
Title: Re: Remove NULL char from entire files
Post by: td on June 29, 2015, 10:17:27 AM
Slightly modified  help file example from the 'BinaryReplace' function topic:
Code (winbatch) Select
; Should be good for something a little over ~250 MB file size depending on execution environment.
str="" ; Search for nuls.
rep="" ; Replace with "nothing".
strFile="C:\Temp\FileWithNuls.txt"
nFs = FileSize( strFile )
hBuf = BinaryAlloc( nFs+100 )
ret = BinaryRead( hBuf, strFile )
nReps = BinaryReplace( hBuf, str, rep ,0)
Message( "Number of '%str%' strings replaced", nReps )
;;strFile="C:\Temp\FileNoNuls.txt"
BinaryWrite( hBuf, strFile )
BinaryFree( hBuf)
Title: Re: Remove NULL char from entire files
Post by: td on June 29, 2015, 01:07:27 PM
Here's another version that is much slower but should handle files of almost any size - assuming enough spare disk space is available. 
Code (winbatch) Select
;; Lightly tested!  Use at your own risk!
AddExtender("WWHUG44I.DLL", 0, "WWHUG64I.DLL")

str        = ""
rep        = ""
Offset     = "0"
OutOffset  = "0"
RepsTotal  = "0"
strFileOut = "C:\Temp\FileNoNuls.txt"
strFileIn  = "C:\Temp\FileWithNuls.txt"

; Could do file management task here like deleting an existing out file

Fs = FileSize( strFileIn, 1 )
if StrSub(huge_Subtract (Fs, "100000000"),1,1) !="-" then BufSize = 100000000 ;  Arbitrary binary buffer size
else BufSize = Fs
hBuf = BinaryAlloc( BufSize+100 )
while StrSub(huge_Subtract (Fs, Offset),1,1) !="-"
   nRead = BinaryReadEx( hBuf, 0, strFileIn, Offset, BufSize)
   nReps = BinaryReplace( hBuf, str, rep ,0)
   nWritten  = BinaryWriteEx(hBuf,0, strFileOut, OutOffset, nRead - nReps) 
   OutOffset = huge_Add(OutOffset, nWritten)
   RepsTotal = huge_Add(RepsTotal,nReps)
   Offset    = huge_Add(Offset,BufSize)
endwhile

; Could perform file delete and rename here.

Message( "Number of '":str:"' strings replaced", RepsTotal )
BinaryFree( hBuf)
Title: Re: Remove NULL char from entire files
Post by: kdmoyers on June 30, 2015, 05:58:13 AM
[Nice thread for the tech database]
Title: Re: Remove NULL char from entire files
Post by: DAG_P6 on July 13, 2015, 02:06:13 PM
Though it's been a few years since I did so, once upon a time, I ran a series of benchmarks in which I used various sized buffers to perform sequential read and write operations on large files. Somewhat to my surprise, given the then current sizes of disk sectors and cylinders, I found that there was a sweet spot around 8192 bytes beyond which I saw very little gain in performance. Though the fact that I was using synchronous file I/O may have had some effect on the outcome, I suspect not, since the application is already I/O bound, since there was nothing else to keep the application occupied while it waited for I/O operations to complete.

The reason this result surprised me was that when I wrote code for IBM mainframe computers, the sweet spot was almost always the number of bytes that fit into a cylinder (one track all the way around one side of one platter). Even then, cylinder sizes varied significantly, but the sweet spot was usually much higher than 8192 bytes. Conversely, when the output destination was a 9 track tape, the sweet spot was more like 8 KB, because if your blocks were much bigger than that, you spent a lot of time waiting for the tape drive to skip past bad spots in the tape, because it had to lay down the whole block in one contiguous run of usable tape. The other disadvantage of writing excessively large blocks onto a tape was also the result of the aforementioned bad spots; you frequently got significantly less data to fit onto one reel. In the worst cases, this meant that your job went into a holding pattern while a new tape was mounted and spun up.
Title: Re: Remove NULL char from entire files
Post by: kdmoyers on July 14, 2015, 01:09:04 PM
[complicated joke comparing block size of 9 track mag tape and shoe box full of punch cards ommitted]
Boy, I feel old.
-K
Title: Re: Remove NULL char from entire files
Post by: td on July 14, 2015, 02:56:19 PM
Saw a fellow student trip while walking down the stairs of one of the science buildings with a shoebox under his arm.  It was wet and windy....
Title: Re: Remove NULL char from entire files
Post by: JTaylor on July 14, 2015, 04:41:28 PM
Did the rain mask the tears?

Jim
Title: Re: Remove NULL char from entire files
Post by: kdmoyers on July 24, 2015, 04:10:13 AM
Speaking of the old days, here's an interesting article about how ghosts of the old days persist into the present: skeuomorphs
(it's from a SQL related forum, so there is an SQL tilt to the text)

https://www.simple-talk.com/opinion/opinion-pieces/sql-style-habits-attack-of-the-skeuomorphs/
Title: Re: Remove NULL char from entire files
Post by: stanl on July 24, 2015, 04:55:06 AM
Quote from: kdmoyers on July 24, 2015, 04:10:13 AM
Speaking of the old days, here's an interesting article

skeuomorphism  -  love it.

then there is meuomorphism  -  "my way or the highway"
Title: Re: Remove NULL char from entire files
Post by: JTaylor on July 24, 2015, 05:21:46 AM
Great article.

Jim