System.String need some clarity

Started by stanl, November 17, 2020, 05:02:31 AM

Previous topic - Next topic

stanl

On another thread, where Jim asked about the StrTrim() function, Chuck recommended looking at the .NET String class. Since I realized how little I knew about system strings, thought I would give it a try. I first discovered

       
  • oString = ObjectClrNew("System.String") fails to instantiate for me
  • strChar = ObjectClrNew("System.Char") works
  • oString = ObjectClrType("System.String",str) work with str= "!@#abcxYZ123%@CRLF%111%@TAB%"
So I think I discovered that describing elements of a system string could be accomplished by system.char methods. The script below is a shot at an enumeration. Script seems to work, but

       
  • is there a way to get the .Length property of the system string
  • could the loop be written as ForEach strchar in oString
  • still not sure the enumeration is correct
Code (WINBATCH) Select


;Winbatch 2020A - using CLR system string
str= "!@#abcxYZ123%@CRLF%111%@TAB%"
dalen = StrLen( str)
cFile = Dirscript():"sResult.txt"
If Fileexist(cFile) Then FileDelete(cFile)
ObjectClrOption('useany', 'System')
strChar = ObjectClrNew("System.Char")
oString = ObjectClrType("System.String",str)   


;fails
;slen =  ObjectClrNew("System.Int32",oString.Length)
           
result=""
For i=0 to dalen-1
   char = StrSub(str,i,1)
   result = result:"[":char:"] ,"
   result = result:"IsControl ":strChar.IsControl(oString,i):","
   result = result:"IsDigit ":strChar.IsDigit(oString,i):","
   result = result:"IsLower ":strChar.IsLower(oString,i):","
   result = result:"IsLetter ":strChar.IsLetter(oString,i):","
   result = result:"IsNumber ":strChar.IsNumber(oString,i):","
   result = result:"IsPunctuation ":strChar.IsPunctuation(oString,i):","
   result = result:"IsSeparator ":strChar.IsSeparator(oString,i):","
   result = result:"IsSymbol ":strChar.IsSymbol(oString,i):","
   result = result:"IsWhiteSpace ":strChar.IsWhiteSpace(oString,i):@CRLF
Next
FilePut(cFile,result)
Run("notepad.exe",cFile)


Exit
   

stanl

just saw my dumb error:  should be char = StrSub(str,i+1,1)

td

Just a  few offhand comments.  The System.String class object is immutable.  That means it is read-only and it's constructors expect a pointer to a memory location which isn't accessible to WIL. Basically, none of System.String's constructors are available to WIL.  System.String does have a length parameter and an enumerator but since you can't instatiate the object they are of little use. MSFT recommends using the StringBuilder class for creating mutable strings but it is of limited usefulness.

Code (winbatch) Select
ObjectClrOption('useany', 'System')
objStr = ObjectClrNew('System.Text.StringBuilder', 'String built with StringBuilder')
wilStr = ''
for i = 0 to objStr.Length()-1
   wilStr := objStr.ToString(i, 1)
next
Message("Wil String", wilStr)
exit


Not a big deal but you can simplilfiy your example script by using "For i=1 to dalen" and going back to "char = StrSub(str,i,1)".

"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Quote from: td on November 17, 2020, 08:27:56 AM
Not a big deal but you can simplilfiy your example script by using "For i=1 to dalen" and going back to "char = StrSub(str,i,1)".


Don't know about your test but when I try that I get mscorlib exception: specified argument is out of range of valid values. That is why I changed to for i = 0 to dalen-1

ChuckC

[quote author=td link=topic=2559.msg14159#msg14159 date=1605630476]
Just a  few offhand comments.  The System.String class object is immutable.  That means it is read-only and it's constructors expect a pointer to a memory location which isn't accessible to WIL. Basically, none of System.String's constructors are available to WIL.



Is there a technical reason for not being able to obtain a reference to an actual instance of System.String, either as returned by a method or by allowing a WIL string variable or string literal to be coerced into a form suitable for use as a c'tor parameter?

The immutable nature of System.String isn't an issue if your reason for wanting to use one is to take advance of the instance methods and static class methods used for string manipulation purposes.  For example, the Split(), Trim(), TrimEnd() and TrimStart() methods provide some capabilities not present in WIL's own string handling functions such as the character array input parameters that allow a user-defined set of characters to be trimmed to be specified.



td

Quote from: stanl on November 17, 2020, 09:37:26 AM
Quote from: td on November 17, 2020, 08:27:56 AM
Not a big deal but you can simplilfiy your example script by using "For i=1 to dalen" and going back to "char = StrSub(str,i,1)".


Don't know about your test but when I try that I get mscorlib exception: specified argument is out of range of valid values. That is why I changed to for i = 0 to dalen-1

I thought is was obvious that you needed to add a line like nIndex = i-1 and then strChar.IsControl(oString, nIndex ).  It is more a style issue than anything else.  Recalculating the max value in a for statement on every iteration is considered bad form in some circles and is very minorly inefficient because the WIL can cashe the value if it doesn't need to recalculate it. As I said, it is no big deal.  I regret even mentioning it.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Quote from: ChuckC on November 17, 2020, 10:01:33 AM

Is there a technical reason for not being able to obtain a reference to an actual instance of System.String, either as returned by a method or by allowing a WIL string variable or string literal to be coerced into a form suitable for use as a c'tor parameter?

Yes, there is.  It is a complicated issue.


Quote
The immutable nature of System.String isn't an issue if your reason for wanting to use one is to take advance of the instance methods and static class methods used for string manipulation purposes.  For example, the Split(), Trim(), TrimEnd() and TrimStart() methods provide some capabilities not present in WIL's own string handling functions such as the character array input parameters that allow a user-defined set of characters to be trimmed to be specified.

The imuttable nature of the object  is an "issue" for WIL because of what MSFT permits and doesn't permit some interfaces to access.  You simply cannot project managed coding on to purely unmanaged CLR hosting.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

For what it's worth:

       
  • formatted the for loop For i=1 to dalen
  • added IsUpper() and lookup for IsHexDigit
Code (WINBATCH) Select


;Winbatch 2020A - using CLR system string
str= "!@#abcxYZ123%@CRLF%111%@TAB%"
hex= "0123456789ABCDEF"
dalen = StrLen( str)
cFile = Dirscript():"sResult.txt"
If Fileexist(cFile) Then FileDelete(cFile)
ObjectClrOption('useany', 'System')
strChar = ObjectClrNew("System.Char")
oString = ObjectClrType("System.String",str)   




;fails
;slen =  ObjectClrNew("System.Int32",oString.Length)
           
result=""
For i=1 to dalen
   char = StrSub(str,i,1)
   n=i-1
   result = result:"[":char:"],"
   If StrIndex(hex,StrUpper(char),0,@FWDSCAN) Then result = result:"IsHexDigit,"
   If strChar.IsControl(oString,n)== -1 Then result = result:"IsControl,"
   If strChar.IsDigit(oString,n)==-1 Then result= result:"IsDigit,"
   If strChar.IsUpper(oString,n)==-1 Then result = result:"IsUpper,"
   If strChar.IsLower(oString,n)==-1 Then result = result:"IsLower,"
   If strChar.IsLetter(oString,n)==-1 Then result = result:"IsLetter,"
   If strChar.IsNumber(oString,n)==-1 Then result = result:"IsNumber,"
   If strChar.IsPunctuation(oString,n)==-1 Then result = result:"IsPunctuation,"
   If strChar.IsSeparator(oString,n)==-1 Then result = result:"IsSeparator,"
   If strChar.IsSymbol(oString,n)==-1 Then result = result:"IsSymbol,"
   If strChar.IsWhiteSpace(oString,n)==-1 Then result = result:"IsWhiteSpace,"
   result = Strsub(result,1,Strlen(result)-1):@CRLF
Next
FilePut(cFile,result)
Run("notepad.exe",cFile)




Exit

stanl

And here is a version that uses a WB Map [attached]
Code (WINBATCH) Select


;Winbatch 2020A - using CLR system string
str= "!@#abcxYZ123%@CRLF%111%@TAB%"
hex= "0123456789ABCDEF"
dalen = StrLen( str)
mapfile = Dirscript():"stringtypes.txt"
If ! Fileexist(mapfile) Then Terminate(@TRUE,"Map File Not Found",mapfile)
cFile = Dirscript():"sResult.txt"
If Fileexist(cFile) Then FileDelete(cFile)
ObjectClrOption('useany', 'System')
strChar = ObjectClrNew("System.Char")
oString = ObjectClrType("System.String",str)   
map = MapFileGetCSV(mapfile)
           
result=""
For i=1 to dalen
   char = StrSub(str,i,1)
   n=i-1
   result = result:"[":char:"],"
   If StrIndex(hex,StrUpper(char),0,@FWDSCAN) Then result = result:"IsHexDigit,"
   If MapKeyExist(map, strChar.GetUnicodeCategory(oString,n)) Then result=result:map[strChar.GetUnicodeCategory(oString,n)]:","
   result = Strsub(result,1,Strlen(result)-1):@CRLF
Next
FilePut(cFile,result)
Run("notepad.exe",cFile)




Exit

stanl

Beating the dead horse, again:  Attached is a mapfile for the TypeCodes associated with a char value. Previous script could easily be re-written to open it as the map file, then in the for loop, code as
Code (WINBATCH) Select


   If MapKeyExist(map, strChar.GetTypeCode(oString,n)) Then result=result:map[strChar.GetTypeCode(oString,n)]:","



but which will error with CLR: Member signature not found...


Oh, well...

td

Haven't tried the method but the GetTypeCode method for the Char class unlike the method for the String class does not take any parameters.  That would explain the error.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Yeah, tried several attempts and just gave up. Sort of a who's on first...

td

You would need to initialize the "System.Char" object to a value in order to call the"GetTypeCode" method. Any object method that returns a "Char" to a WIL script would just return the UI2 variant Unicode point value of the character and not a reference to a "System.Char" structure.  Don't know any way around that one. It may exist but I just don't know what it might be.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Guess I'll settle on the Unicode Category for results. Below I adapted the earlier script to use the CharUnicodeInfo class rather than Char class. Uses the same map as previous script and according to the MSFT docs offers more appropriate results.
Code (WINBATCH) Select


;Winbatch 2020A - return Unicode Category for elements in system string
str= "!@#abcxYZ123%@CRLF%111%@TAB%"
hex= "0123456789ABCDEF"
dalen = StrLen( str)
mapfile = Dirscript():"stringtypes.txt"
If ! Fileexist(mapfile) Then Terminate(@TRUE,"Map File Not Found",mapfile)
cFile = Dirscript():"sResult.txt"
If Fileexist(cFile) Then FileDelete(cFile)
ObjectClrOption('useany', 'System')
ObjectClrOption('useany', 'System.Globalization')
strInfo = ObjectClrNew("System.Globalization.CharUnicodeInfo")
oString = ObjectClrType("System.String",str)   
map = MapFileGetCSV(mapfile)
           
result=""
For i=1 to dalen
   char = StrSub(str,i,1)
   n=i-1
   result = result:"[":char:"],"
   If StrIndex(hex,StrUpper(char),0,@FWDSCAN) Then result = result:"IsHexDigit,"
   If MapKeyExist(map, strInfo.GetUnicodeCategory(oString,n)) Then result=result:map[strInfo.GetUnicodeCategory(oString,n)]:","
   result = Strsub(result,1,Strlen(result)-1):@CRLF
Next
FilePut(cFile,result)
Run("notepad.exe",cFile)




Exit