Delete multiple substrings within one large string

Started by jmburton2001, February 19, 2019, 06:03:26 PM

Previous topic - Next topic

jmburton2001

Prologue -> I am not a programmer, just a simple-minded hobbyist. Please be gentle.

Winbatch+Compiler 2019A (all extenders installed and/or updated)
Windows 7 Pro x64

All the examples I've found concern file operations (binary or regular). I don't have a file, I have a large string already available within my script. The multiple substrings I want to delete are in a date pattern followed by an underscore. YYYY:MM:DD_

This is the string:
2017:02:05_ trace2017.02-2018.05.log                @TAB2/5/2017 to 5/5/2018@CRLF2017:02:05_ trace20170816.log                       @TAB2/5/2017 to 8/16/2017@CRLF2017:02:05_ trace20180505.log                       @TAB2/5/2017 to 5/5/2018@CRLF2017:08:16_ trace20171221_001508.log                @TAB8/16/2017 to 12/20/2017@CRLF2017:12:21_ trace20180504.log                       @TAB12/21/2017 to 5/10/2018@CRLF2018:05:05_ trace2018.05-2019.02.log                @TAB5/5/2018 to 2/14/2019@CRLF2018:05:05_ trace20180928.log                       @TAB5/5/2018 to 9/28/2018@CRLF2018:09:29_ trace20190214_001818.log                @TAB9/29/2018 to 2/14/2019@CRLF2019:01:24_ traceSoNic67.log                        @TAB1/24/2019 to 2/7/2019@CRLF2019:02:14_ trace.log                               @TAB2/14/2019 to 2/17/2019

This large string (assigned to a variable) produces this message box. It returns exactly what I want. I use that first column for sorting... but after I've sorted it I'd just like it gone.

Any pointers on how to delete all instances of "YYYY:MM:DD_" from the string would be GREATLY appreciated!

stanl

I think a search/replace with RegEx might work.... maybe  "\d{4}:\d{2}:\d{2}_"

JTaylor

If they are all in the first column, another option would be that you could use a loop and using @LF as the delimiter for each line and then a SPACE for the items remove the first item using ItemRemove() and rebuild the string as you process it.

Also, for future reference, you can use the BinaryPoke...() functions to place data into a BinaryBuffer whenever you have that need.

Jim

jmburton2001

Good Morning stanl,

Thank you so much for pointing me in the right direction!

I found an example in the COM Help File under "How do I locate all matching strings?". It appears to be able to do what I need (with some minor tweaks). Currently the variable "sTxt" contains my long string above.

It uses the following:
Code (winbatch) Select
oRegex = ObjectCreate("VBScript.RegExp")
oRegex.Global = 1
oRegex.Multiline = 1
oRegex.IgnoreCase = 1
retval = "nothing"
sTxt = "MY HUGE STRING"
oRegex.Pattern = "\d{4}:\d{2}:\d{2}_"
If oRegex.Test(sTxt)
   retval = ""
   objMatches = oRegex.Execute(sTxt)
   ForEach objMatch In objMatches
      ;Each item in the SubMatches collection is the string found and captured by the regular expression.
      ForEach objSubMatch In objMatch.SubMatches
         If retval == "" Then  retval = objSubMatch
         Else retval = retval : @TAB : objSubMatch
      Next
   Next
EndIf
AskItemlist("RegExp - Matches", retval, @TAB, @UNSORTED, @SINGLE )
oRegex = 0 ;Close object
Exit


The only things I've changed in the example are inserting my string in "sTxt"  and your expression in "oRegex.Pattern". I also tried "\d\d\d\d:\d\d:\d\d_" as the pattern.

In both cases AskItemList returns an empty box...

Questions:

  • Do I need to add an extender to make this work?
  • Is VBScript available by default in Win7 Pro and how can I tell if I have access to it?
Fun time is over for now! Gotta get to my real work for awhile...

I just noticed that JTaylor responded too. Thank you for that! I'll look into your suggestions as well.

Thank you both!

stanl

I was thinking about something like

Code (WINBATCH) Select

....... [your other code]
ForEach objMatch In objMatches
   result = oRegex.Replace(sTxt,oRegEx.Pattern,"")
Next



[MODIFY]. 

and you might try the pattern using .NET as the regex is more powerful than VbScript and WB's CLR can use it. see
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/dotNet/System_Text/RegularExpressions+RegEx.txt

jmburton2001

Hi stanl,

I incorporated your solution and when it went to execute the "return = " line I received a 1266 COM: Bad parameter count so I went to Microsoft and found that the Regex.Replace parameters are (input, pattern, replacement). That's exactly what we have! I checked everything carefully and tried, and tried, and tried...

As I stepped into it I verified that the input and pattern were properly populated in the variables we used, and the replacement was simply "" null. I never could figure out how there was a bad parameter count. Since I'm not a programmer I was getting too far out of my element, so I moved on...

Hi JTaylor!

I was reading through the "Binary Operations" section of the help files and realized that it wouldn't allow me to use a wildcard in the BinaryReplace function, which would have been perfect. Since every instance that needed to be replaced with a "null" character was in the exact pattern of YYYY:MM:DD_, the easiest way would be to search for the pattern using wildcards. So I started Googling...

I happened across a website by Detlev Dalitz.

The UDF at that link took in my string along with this wildcard ????:??:??_  (UGH! Every way that I try to put 3 question marks together produces an emoji)

and produced this output. It works as I needed it to.

Since I'm so uneducated about how this all works, I have an idea in my head of what I want to accomplish. This is simply one more "bite of the elephant" in the unimportant project I'm working on.

Thank you both so much for taking the time to point me toward solutions I never would have found on my own!

stanl

Darn..... it seems only 2 parms are needed in replace [I probably confused the vbscript replace() function with the regex replace method].  I put this together with a section of your huge string and it seemed to work


Code (WINBATCH) Select


oRegex = ObjectCreate("VBScript.RegExp")
oRegex.Global = 1
oRegex.Multiline = 1
oRegex.IgnoreCase = 1
retval = "nothing"
sTxt = "2017:02:05_ trace2017.02-2018.05.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:02:05_ trace20170816.log":@TAB:"2/5/2017 to 8/16/2017":@CRLF:"2017:02:05_ trace20180505.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:08:16_ trace20171221_001508.log":@TAB:"8/16/2017 to 12/20/2017":@CRLF:"2017:12:21_ trace20180504.log":@TAB:"B12/21/2017 to 5/10/2018":@CRLF:"2018:05:05_ trace2018.05-2019.02.log":@TAB:"5/5/2018 to 2/14/2019":@CRLF:"2018:05:05_ trace20180928.log":@TAB:"5/5/2018 to 9/28/2018":@CRLF:"2018:09:29_ trace20190214_001818.log"
oRegex.Pattern = "\d{4}:\d{2}:\d{2}_"
If oRegex.Test(sTxt)
   result = oRegex.Execute(sTxt)
   result = oRegex.Replace(sTxt,"")
EndIf
Message("",result)
oRegex = 0 ;Close object
Exit

kdmoyers

Regex is definitely the way to go here, and well worth learning. 

But I noticed that the pattern is particularly simple in this case, so a quick-n-dirty solution might also be possible, using the oft forgotten StrIndexWild

Code (winbatch) Select
st = "2017:02:05_ trace2017.02-2018.05.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:02:05_ trace20170816.log":@TAB:"2/5/2017 to 8/16/2017":@CRLF:"2017:02:05_ trace20180505.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:08:16_ trace20171221_001508.log":@TAB:"8/16/2017 to 12/20/2017":@CRLF:"2017:12:21_ trace20180504.log":@TAB:"B12/21/2017 to 5/10/2018":@CRLF:"2018:05:05_ trace2018.05-2019.02.log":@TAB:"5/5/2018 to 2/14/2019":@CRLF:"2018:05:05_ trace20180928.log":@TAB:"5/5/2018 to 9/28/2018":@CRLF:"2018:09:29_ trace20190214_001818.log"
askitemlist('before',st,@lf,@unsorted,@single)
while 1
  p = strindexwild(st,"????:??:??_",1)
  if p==0 then break
  st = strsub(st,1,p-1):strsub(st,p+11+1,-1)
endwhile
askitemlist('after',st,@lf,@unsorted,@single)
exit
The mind is everything; What you think, you become.

jmburton2001

Looks like I'm going to have to start learning about "regular expressions". The only problem is that I don't know anything about VBScript or .NET, so there's my stumbling block. I really love Winbatch because it seems to be so versatile, powerful, and usable by non-programmers.

stanl - I went directly to Microsoft's site and looked up the "Regex.Replace" syntax and the parameters were (input, pattern, replacement). Therefore I was stumped. Your expertise told you different. I would have struggled with this for the rest of my life because I would have always thought that I needed three because an "authoritative" source told me so.

Question -> Are "parms" and "parameters" the same thing?

kdmoyers - Thank you for your insights. Your solution is very graceful and I like that. I also noticed that the UDF from Detlev Dalitz's website gave attribution for your work. Thank you for that too!

td

Not sure what MSFT documentation you were looking at because the "VBScript.RegExp" version of the "Replace" method only takes two parameters.  You can verify this using the WIL Type Viewer tool provided with your WinBatch installation.  On the other hand, the dotNet version of the regular expression object's "Replace" method has 12 overloads that take as few as 2 parameters and as many as 6.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

I was only referencing the Binary Stuff to show how you can place strings in a Buffer because you mentioned you didn't have a file and didn't know how to get a string into a buffer.  Wasn't a suggestion for a solution.   That was the loop and Item Functions.   Sorry for any confusion.

Jim

stanl

Quote from: JTaylor on February 21, 2019, 03:20:02 PM
I was only referencing the Binary Stuff.   Sorry for any confusion.

Jim


I think the confusion was entirely mine. I wrote my initial response w/out really checking and as I wrote later 3 parms reflected the vbscript replace function not the regex replace method. [and I posted an update/working script].  Probably take a stab at coding with the CLR and .NET assemblies.

stanl

and here is a WB script using the CLR to perform the replace.
Code (WINBATCH) Select


;Winbatch 2018B - CLR Regex Replace
;=================================================================================
ObjectClrOption("useany","System")
Pattern = "\d{4}:\d{2}:\d{2}_"
Text="2017:02:05_ trace2017.02-2018.05.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:02:05_ trace20170816.log":@TAB:"2/5/2017 to 8/16/2017":@CRLF:"2017:02:05_ trace20180505.log":@TAB:"2/5/2017 to 5/5/2018":@CRLF:"2017:08:16_ trace20171221_001508.log":@TAB:"8/16/2017 to 12/20/2017":@CRLF:"2017:12:21_ trace20180504.log":@TAB:"B12/21/2017 to 5/10/2018":@CRLF:"2018:05:05_ trace2018.05-2019.02.log":@TAB:"5/5/2018 to 2/14/2019":@CRLF:"2018:05:05_ trace20180928.log":@TAB:"5/5/2018 to 9/28/2018":@CRLF:"2018:09:29_ trace20190214_001818.log"
oReg = ObjectClrNew('System.Text.RegularExpressions.Regex',Pattern)
oReg.CacheSize = ObjectType("ui2",30) ;2-byte unsigned integer.
newText = oReg.Replace(Text,"")
Message("Return after Replace",newText)
oReg=0


Exit