Search-Replace all instances of multiple spaces with a single space

Started by cssyphus, April 09, 2024, 03:12:12 PM

Previous topic - Next topic

cssyphus

Surprised I haven't encountered this before, but I haven't.

I am parsing a CLI terminal export file that uses a random number of spaces between columns, but zero spaces within column data. It would be great to arrayize() each line, using the space(s) as the delimiter.

To do that, I need to S/R all multiple-spaces with a single space, so that columns are exactly one space apart (recall that there are zero spaces in the actual data, so this will work fine).

I am guessing that this problem has been solved before, likely many times, and wondering if anyone has any advice on the best / fastest approach -- or a tried-and-true UDF they could share?

For the curious, here's an example of the input file. I only need the first two columns... the rest is chaff.

------------------------------------------------------------------------------------------------------------------
VLAN  Name                              Status  Reason                  Type        Interfaces
------------------------------------------------------------------------------------------------------------------
1     DEFAULT_VLAN_1                    up      ok                      default     1/1/1-1/1/4,1/1/8-1/1/9,
                                                                                    1/1/11-1/1/15,1/1/17-1/1/19,
                                                                                    1/1/36-1/1/38,1/1/44,
                                                                                    1/1/51-1/1/52,2/1/1-2/1/24,
                                                                                    2/1/27-2/1/28
10    10-SERVER-CDE                     up      ok                      static      1/1/1-1/1/3,1/1/7,1/1/16,
                                                                                    1/1/22,1/1/51-1/1/52,
                                                                                    2/1/1-2/1/24
11    11-VOICE-LEGACY                   up      ok                      static      1/1/1-1/1/3,1/1/7,1/1/16,
                                                                                    1/1/22,1/1/51-1/1/52,
                                                                                    2/1/1-2/1/24
12    VLAN12                            up      ok                      static      1/1/1-1/1/3,2/1/1-2/1/24

JTaylor

If you are open to using a 3rdParty Extender and the columns begin in fixed positions, there is a snCut() function in the wbOmnibus extender that would allow you to chop out the two columns you want and place a delimiter between them.

If not, something like:

rcnt = 10
txt = FileGet(file)
While StrIndex(txt,"  ",......) > 0

  txt = StrReplace(txt,StrFill(" ", rcnt)," ")

  If rcnt > 2 Then rcnt = rcnt -1

EndWhile

FileWrite(newfile,txt)


Jim

spl

Seems to be a fixed length for for data columns. Code below appears to work
file = "C:\temp\vlan.txt" ;your sample copied/pasted to file
file1 =  "C:\temp\vlan_1.txt"
If FileExist(file1) Then FileDelete(file1)
h = FileOpen(file, "READ")
h1 = FileOpen(file1, "WRITE")
While @True
   line = FileRead(h)
   If line == "*EOF*" Then Break
   If strsub(line,1,1)=="-" || strlen(strtrim(strsub(line,1,30)))==0 Then Continue
   fld1 = strtrim(strsub(line,1,6))
   fld2 = strtrim(strsub(line,7,30))
   FileWrite(h1,fld1:" ":fld2)
Endwhile
FileClose(h)
FileClose(h1)
Stan - formerly stanl [ex-Pundit]

cssyphus

Thanks Stan and Jim !

Stan, looking at your solution, that will work perfectly and I'll implement that immediately for this task.

Jim, I've long been looking for a use case to play with your Omnibus extender - I will experiment with that here to see how it works. I've been meaning to kick the tires on wbOmnibus for a long time - much too long.

---

However, I frequently need to S/R random-length strings of multiple spaces with a single space. Typically, I just do this (using periods in place of spaces for visibility):

strReplace(strReplace(strReplace(strReplace(strReplace(_txt, `..`, `.`), `..`, `.`), `..`, `.`), `..`, `.`), `..`, `.`)

It works - but it's slow.

I would think one could use Regular Expressions to solve this, but I haven't a clue how to implement that for a Search/Replace... I've looked at the usual RegEx example in the Tech DB, but it is merely for conducting a search and returning the found text -- how would one turn that into a reusable Search/Replace udf?

Thanks guys!

td

Assuming that I haven't misunderstood your needs, here is a simple regex example that appears to work:
OutFile = '\temp\Cleaned.txt'
if FileExist(OutFile ) then FileDelete(OutFile )

ObjectClrOption('useany','System.Text.RegularExpressions')
objRegEx = ObjectClrNew('System.Text.RegularExpressions.Regex', '123')

strOrig = FileGet(OutFile)
strCleaned = objRegEx.Replace(strOrig, '\s+', " ")
FilePut('\temp\Cleaned.txt', strCleaned)

RunApp('Notepad', OutFile )

If you wish to keep carriage returns, line feeds, and tabs, the following works:

strCleaned = objRegEx.Replace(strOrig, ' +', " ")
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Nice, but seems it does a fileget() on a file that has been deleted
Stan - formerly stanl [ex-Pundit]

spl

Quote from: cssyphus on April 10, 2024, 05:57:56 AMThanks Stan and Jim !

Stan, looking at your solution, that will work perfectly and I'll implement that immediately for this task.

Jim, I've long been looking for a use case to play with your Omnibus extender - I will experiment with that here to see how it works. I've been meaning to kick the tires on wbOmnibus for a long time - much too long.

Thanks guys!


Glad it worked. If the issue is parsing specific columns from a fixed length file and rewriting them a single-space delimited then I assumed mapping the columns start and length would be necessary. This could be accomplished with a config/ini file, a WB map, or a regex pattern. Parsing the file, assuming it could be huge [i.e. 100-200,000 rows] could be done with WB file functions, or .net [System.IO.StreamReader] / [System.IO.StreaWriter] / [System.Text.StringBuilder] and would be interesting to test performance on a 1gb file. Then, of course there are the lines that should not be parsed into final output. In your case you only wanted to parse 2 columns, but if those circumstances change or you need to parse other columns in other files, then a one of the mappings for all columns would in my opinion be provident.
Stan - formerly stanl [ex-Pundit]

td

Quote from: spl on April 10, 2024, 11:24:45 AMNice, but seems it does a fileget() on a file that has been deleted

You are correct. The script should be
OutFile = '\temp\Cleaned.txt'
if FileExist(OutFile ) then FileDelete(OutFile )

ObjectClrOption('useany','System.Text.RegularExpressions')
objRegEx = ObjectClrNew('System.Text.RegularExpressions.Regex', '123')

strOrig = FileGet('\temp\test.txt')  ; Reminder: check before you post.
strCleaned = objRegEx.Replace(strOrig, ' +', " ")
FilePut(OutFile, strCleaned)

RunApp('Notepad', OutFile )

And thanks for pointing out the blunder.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Bad weather here, keep losing power on and off. It is quiet now so thought I would contribute one more quick script to the thread - a sample for creating a strsub() mapping for the sample
cols = "VLAN,Name,Status,Reason,Type,Interfaces"
line = "VLAN  Name                              Status  Reason                  Type        Interfaces             "

cnt = ItemCount(cols,",")
llen = Strlen(line)
i=1
k=2
map = ""
While k<=cnt+1
   If k>i
      col = ItemExtract(i,cols,",")
      col2 = ItemExtract(k,cols,",")
      pos = Strindex(line,col,0,@FWDSCAN)
      If k>cnt
         pos2 = llen
      else
         pos2 = Strindex(line,col2,0,@FWDSCAN)
      endif
      map=map:col:"=strtrim(strsub(line,":pos:",":pos2-pos:"))":@CRLF
      i+=1
      k+=1
   Endif
Endwhile

Message("Mapping",map)
Exit

Stan - formerly stanl [ex-Pundit]