Regex Group Values

Started by spl, April 25, 2025, 10:25:01 AM

Previous topic - Next topic

spl

The code below is simple to test returning names/values from text based on a regex 'group' pattern. The goal would be to return a message like
Name: John Adams
Age: 204
Email: jqa@example.com

but the script, at this point, only returns the group names. I think I may have conflated the match/matches regex objects for properties. Matches should return an array, but unable to parse that, match returns groups (as does matches) but unable to parse that either. The regex pattern is distinct, i.e. works with John Adams but not John Quincey Adams (when parsed with PS or Python)... but that is a different issue. For now, where did I mess up obtaining groupname values?
IntControl(73,1,0,0,0)
gosub udfs
ObjectClrOption( 'useany', 'System')
text = "John Adams, Age: 204, Email: jqa@example.com"
results = ""
pattern = '(?<Name>\w+\s\w+),\sAge:\s(?<Age>\d+),\sEmail:\s(?<Email>[^\s]+)'
regex(text,pattern,results)
Message("results",results)
Exit

:WBERRORHANDLER
geterror()
Terminate(@TRUE,"Error Encountered",errmsg)

;=====================================================
:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine

#DefineSubRoutine regex(text,pattern,results)
   retval = 0
   opts = ObjectClrType('System.Text.RegularExpressions.RegexOptions',1)
   oReg = ObjectClrNew('System.Text.RegularExpressions.Regex',pattern,opts)
   oReg.CacheSize = ObjectType("ui2",30)
   ;couldn't go anywhere with .matches
   ;m = oReg.Matches(text) ;returns an array
   ;Message("Groups",m.Groups)  ;will fail
   m = oReg.Match(text)  ;will work
   if m<>0
      Message("Groups Object",m.Groups) ;show object exists
      names = oReg.GetGroupNames()
      i=1  ;used to obtain value from group
      foreach name in names
         if name<>"0"
            results := name:': <is>':@LF ;want <is> to be value
            ;not sure what works???
            ;results := m.Groups[i].Value ;tested/fails
            ;results := oReg.names[i].Value ;tested/fails
            i += 1
         endif
      Next
   else
      results := "No Group Matches Found"
   endif
   oReg=0
   Return(retval)
#EndSubRoutine

Return
Stan - formerly stanl [ex-Pundit]

td

"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Stan - formerly stanl [ex-Pundit]

spl

Had to remember using var.Item(index) instead of var[index]. Had some help modifying the pattern to accept names of variable elements. This works
IntControl(73,1,0,0,0)
gosub udfs
ObjectClrOption( 'useany', 'System')
;chose from text below to check variable names
text = "John Adams, Age: 204, Email: jqa@example.com"
;text = "John Quincey Adams, Age: 204, Email: jqa@example.com"
;text = "The 4th Earl of Northumberland, Age: 19, Email: earl4@mykingdom.com"
results = ""
pattern = '^(?<Name>\w+\s\w+(?:\s\w+)*),\sAge:\s(?<Age>\d+),\sEmail:\s(?<Email>[^\s]+)'
regex(text,pattern,results)
Message("results",results)
Exit

:WBERRORHANDLER
geterror()
Terminate(@TRUE,"Error Encountered",errmsg)

;=====================================================
:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine

#DefineSubRoutine regex(text,pattern,results)
   retval = 0
   opts = ObjectClrType('System.Text.RegularExpressions.RegexOptions',1)
   oReg = ObjectClrNew('System.Text.RegularExpressions.Regex',pattern,opts)
   oReg.CacheSize = ObjectType("ui2",30)
   m = oReg.Match(text)  ;will work
   if m<>0
      names = oReg.GetGroupNames()
      i=1  ;used to obtain value from group
      foreach name in names
         if name<>"0"
            results := name:': ':m.Groups.Item(i).Value:@LF 
            i += 1
         endif
      Next
   else
      results := "No Group Matches Found"
   endif
   oReg=0
   Return(retval)
#EndSubRoutine

Return
Stan - formerly stanl [ex-Pundit]

spl

The udf was a bit sloppy. This is a little clearer
#DefineSubRoutine regex(text,pattern,results)
   oReg = ObjectClrNew('System.Text.RegularExpressions.Regex',pattern,opts)
   oReg.CacheSize = ObjectType("ui2",30)
   matches = oReg.Match(text) 
   if matches
      names = oReg.GetGroupNames()
      foreach name in names
         if !IsInt(name) Then results := name:': ':matches.Groups.Item(name).Value:@LF 
      Next
   else
      results := "No Group Matches Found"
   endif
   oReg=0
   Return(results)
#EndSubRoutine
Stan - formerly stanl [ex-Pundit]

td

Quote from: spl on April 26, 2025, 02:59:28 AM
Quote from: td on April 25, 2025, 02:10:16 PMIf I use the expression and text in the example on the MSFT site, I get the same results MSFT does.
.https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.getgroupnames?view=netframework-4.8.1

So use C# or PS.

The point was that your WIL script worked with the MSFT expression and input.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Quote from: td on April 28, 2025, 08:05:24 AMThe point was that your WIL script worked with the MSFT expression and input.

The point was the initial script failed because it could not combine groupname with groups.value - something I figured out or should have know. And referencing C# code, while illustrating I was using correct logic, failed to distinguish key points in iterating results correctly.
Stan - formerly stanl [ex-Pundit]

kdmoyers

I appreciate the code sample, I'd not figured out Groups with System.Text.RegularExpressions.Regex
Thanks.
The mind is everything; What you think, you become.

spl

Quote from: kdmoyers on April 28, 2025, 11:51:55 AMI appreciate the code sample, I'd not figured out Groups with System.Text.RegularExpressions.Regex
Thanks.

Well I appreciate that you appreciate. I had actually been contacted by an old client I had written code for in 2009. Then I was parsing CSR 'notes' from .mdb note fields using SQL 'like' or 'contains' clauses. He asked about similar text processing from text notes, not db fields. I suggested PS regex and he said, not PS but wondered about WB. Obviously .NET regex is superior to older com [in my opinion] so playing with multiple options with the CLR prompted some of my recent posts. My learning challenge has been the significance of '?' in patterns, in terms of 'greedy' look-aheads etc....  Been satisfied with what I have learned and will post more code, even if you are the only one to appreciate it.
Stan - formerly stanl [ex-Pundit]

td

Quote from: spl on April 28, 2025, 10:41:34 AM
Quote from: td on April 28, 2025, 08:05:24 AMThe point was that your WIL script worked with the MSFT expression and input.

The point was the initial script failed because it could not combine groupname with groups.value - something I figured out or should have know. And referencing C# code, while illustrating I was using correct logic, failed to distinguish key points in iterating results correctly.

I see your point.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

SMF spam blocked by CleanTalk