I need to come up with a UDF to determine the encoding of a text file that will be loaded into a cloud server. I think the WB function ChrStringToHex() is the best way to determine the bom. I found the following in powershell so wondering if the parsing could be duplicated in WB using the function:
$enc = [Text.Encoding]::ASCII
if ($bom[0] -eq 0x2b -and $bom[1] -eq 0x2f -and $bom[2] -eq 0x76)
{ $enc = [Text.Encoding]::UTF7 }
if ($bom[0] -eq 0xff -and $bom[1] -eq 0xfe)
{ $enc = [Text.Encoding]::Unicode }
if ($bom[0] -eq 0xfe -and $bom[1] -eq 0xff)
{ $enc = [Text.Encoding]::BigEndianUnicode }
if ($bom[0] -eq 0x00 -and $bom[1] -eq 0x00 -and $bom[2] -eq 0xfe -and $bom[3] -eq 0xff)
{ $enc = [Text.Encoding]::UTF32}
if ($bom[0] -eq 0xef -and $bom[1] -eq 0xbb -and $bom[2] -eq 0xbf)
{ $enc = [Text.Encoding]::UTF8}
There are more approaches than I can image. Here is one way that may be appropriate based on the specific requirements:
#DefineFunction GetEncoding (_hBin, _nCnt)
if _nCnt > 4 then nByteMax = 4
else nByteMax = _nCnt
strHex = BinaryPeekHex(_hBin, 0, nByteMax)
strRet = "ASCI"
switch nByteMax
case 4
if strHex == "0000FEFF" then strRet = "UTF32"
then break
case 3
strHex = StrSub(strHex, 1, 6)
if strHex == "2B2F76" then strRet = "UTF7"
then break
if strHex == "EFBBBF" then strRet = "UTF8"
then break
case 2
strHex = StrSub(strHex, 1, 4)
if strHex == "FFFE" then strRet = "Unicode"
then break
if strHex == "FEFF" then strRet = "BigEndianUnicode"
endswitch
return strRet
#EndFunction
strFile = "c:\temp\Unicode-Test.txt"
;strFile = "c:\temp\UTF-8-Test.txt"
hBin = BinaryAlloc(FileSize(strFIle))
nBytes = BinaryReadEx(hBin, 0, strFIle, 0, 4)
strEncoding = GetEncoding(hBin, nBytes)
BinaryFree(hBin)
Message( strFile:' Encoding', strEncoding)
exit
Thanks