Search and replace within a file

Started by LouInova, December 08, 2019, 11:41:55 AM

Previous topic - Next topic

LouInova

Hi,

I'm trying to parse the content of some configuration files and target configuration item values within the file.  What started as seeming like such a simple thing, has me a little stumped.

The target files are structured like INI files.  But they don't have a section header and, therefore, I can't really use the INI functions.

They have line item configuration settings:

ConfigSetting=value
ConfigSetting=value
etc.

Depending on the configuration setting, it can also have sub config and value items listed that are separated by a space.

ConfigSetting=value subsetting1=value subsetting2=value (etc.)

What I'm trying to do is first see if the setting exists (main or sub) and then I need to either add it or read the value and change it if it's not correct.

Since the value could change, there really isn't a static string to search for and replace (using the built in replace options), I'm not sure what approach to use.  The only real constants are that each setting and value pair is separated by the = sign, each main config item is on a separate line (cr/lf) and sub items have the space separator.

I've been looking at some of the examples in the Tech DB but there doesn't seem to be anything specific to what I'm looking to do.  I would really appreciate it if someone could point me in the right direction.

Thanks,
Lou

kdmoyers

Well, I have a few minutes, so...

If it were me, I'd first investigate the idea of making a local copy of the file with the all important leading four bytes: "[","]",@cr,@lf
Being able to leverage the iniXXXXpvt functions is worth the hassle -- they do so much for you.
The modified file could be copied back to the original location.

Adding the bytes to the front is easy: make a binarybuffer, binarypoke the four bytes in, use binaryread to suck in the rest of the file behind it, then write the whole thing out.

parsing the complex values is just some itemextract stuff.  detailed work, but very doable.

(I'd use Regular Expressions (the chain saw of text operations) for the parsing, but that's a bit of learning curve.)

How fast does this have to happen? once per minute? 100 times a second, once per day?  If it has to be super fast, then this gets trickier.

Also, is there a multi-user contention angle? do you have to worry about the file being locked? you'll have to add extra code to handle that.

finally: watch out for unicode. if the file is unicode, binary operations are slightly more tricky.

$0.02
-Kirby

The mind is everything; What you think, you become.

td

Perhaps I am missing something but it seems that this task could be done in a more or less brute force manner.  Use the FileOpen/FileRead functions to extract each line, StrIndex to check for the setting name, a couple of ItemExtract function calls to check the value or values, and finally, ItemReplace to set a new value or values as needed.

Of course, consideration should also be given to Kirby's comments about speed and concurrency.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

LouInova

Thanks for the replies.

Oddly enough, Kirby's suggestion about adding in a section header and making it manageable through the INI functions is where I was headed until I decided to ask the question here to see if there was something else that I might be missing.

The files are small but there are a lot of them (~3k).  The updates don't happen often (once every couple of months or so) and I don't need to worry about usage lock.

Since there are so many, having to parse through line-by-line would a slow the process.  So, I was more inclined to use the binary functions.  But, with that, and not converting them to a standard INI format, I was getting all tangled up in how to best single out and act on the target lines out of the buffer.

Another thought here was to come up for automation that I could delegate out instead of being stuck doing it myself.  Right now, I'm the one making the file changes...but, ideally, this shold be farmed out to those that need the change made.

I guess, unless someone has a more elegant approach, I'll go ahead and use the INI method.

td

Performance is relative to the environment.  For example, my workstation has SSD drives and 6 X 2 cores.  Processing a few thousand small files one line at a time would be hard to notice but mime is likely not a common execution environment.  So converting to an ini file allows you to perform more of the work in native code with the resulting performance gain.  As Kirby suggested, using regular expression is likely the fastest approach.  WinBatch supports regular expressions through  CLR hosting.

https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions

https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/dotNet/System_Text/RegularExpressions+RegEx.txt
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

One way to speed the process no matter which search-and-replace method you use is to take the divide conquer approach.  You could set up a single script that splits the list of files into 2 to 4 lists and restart the script 2 to 4 times with each instance processing one of the sublists.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

LouInova

Sorry for the response delay and thanks for the additional pointers and suggestions.

I've pretty much settled for the INI method but am having some issues in the adding in of the section header.

I'm reading the file into a binary buffer and then I'm trying to add the dummy header and a @crlf.  My expectation would be that it would add the header/crlf combo and push everything else down one line.  Instead, it seems to be taking a chunk of the existing first line out.  I don't work with the binary functions too often...not sure what I'm doing wrong?

Here is what the beginning of each file looks like - there is a commented out area that defines the configuration section:

;*************************************************************
;*                         General 1                          *
;*************************************************************

Privilege=None Lockdown=yes EnableCancel=yes


Here is what it looks like after I add the header section:

[PlaceHolder]
***********************************************
;*                         General 1                          *
;*************************************************************

Privilege=None Lockdown=yes EnableCancel=yes


Seems to be taking a chunk out of the line instead of adding in?

Here is what I'm using:

infile = "C:\Temp\file1.txt"
procfile = "C:\Temp\procfile1.txt"
sect = "[PlaceHolder]"
sectin = StrCat (sect, @crlf)

fs1 = FileSize (infile)
buf1 = BinaryAlloc (fs1+100)
binaryread (buf1, infile)
binarypokestr (buf1, 0, sectin)
binarywrite (buf1, procfile)

binaryfree (buf1)



td

You are losing a chunk because that is what you are telling BinaryPokeStr to do.  Using binary buffer functions requires 2 buffers where the first buffer holds the file.  The second buffer would need to be larger so that you could first poke the faux section header with crlf and then poke the file contents.  Or you could use one buffer with the BinaryReadEx function to read the file into the single buffer just after the end of the section header you have already poked into that single buffer.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

A simple example:
Code (winbatch) Select
strFile='c:\temp\dummy.conf'

strCon = FileGet(strFile)
strCon = '[faux section]':@crlf:strCon
strFile = FilePath(strFile):FileRoot(strFile):'.ini'
FilePut(strFile, strCon)

"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

LouInova

Appreciate the additional info...thanks!  I should be good to go now.