Ascii replace not working

Started by MW4, March 19, 2014, 12:33:53 PM

Previous topic - Next topic

MW4

I have this code that I am using to replace a registration mark with a null space "".
It is replacing the registration mark with Â.

This text:
BluetoothÃ,® Wireless Technology
ends up as this text:
Bluetooth Wireless Technology
I want it to end up like this:
Bluetooth Wireless Technology


Any ideas??

Code (winbatch) Select

str = Num2Char (174)
rep = ""

infile="c:\Flee.txt"
outfile="c:\Flee_Fixed.txt"

fs = FileSize( infile )
binbuf = binaryalloc( fs+100 )
ret = BinaryRead( binbuf, infile )
num = BinaryReplace( binbuf, str, rep , 0)
BinaryWrite( binbuf, outfile )
binbuf=BinaryFree(binbuf)

Deana

The code you posted will work fine with a truly ANSI file. However, I suspect you are maybe working with a Unicode or UTF-8 encoded file. Try opening the file in Notepad and make sure that you choose ANSI encoding when you save the file. Test the code again and see if that resolve your issue.
Deana F.
Technical Support
Wilson WindowWare Inc.

Deana

If you are actually dealing with a Unicode file your code "could" look something like this:

Code (winbatch) Select
; Unicode sample
str = ChrStringToUnicode( Num2Char(174) )
rep = ChrStringToUnicode( "" )

infile="c:\temp\data\unicode.txt"
outfile="c:\temp\data\_unicode.txt"

strU = FileGetW(infile)
data = StrReplace( strU, str, rep )
FilePutW( outfile, data )
Exit
Deana F.
Technical Support
Wilson WindowWare Inc.

MW4


Deana

Did you try my suggestion to open and save your file in notepad????

Post the input file. so we can see what might be going on.

ALso, I recommend using DebugTrace. Simply add DebugTrace(@on,"trace.txt") to the beginning of the script and inside any UDF, run it until the error or completion, then inspect the resulting trace file for clues as to the problem. Feel free to post the trace file here ( removing any private info) if you need further assistance.
Deana F.
Technical Support
Wilson WindowWare Inc.

td

Quote from: MW4 on March 19, 2014, 02:22:44 PM
Still does the same thing

FileGetW will only load a file as Unicode if it contains a BOM for UTF-16.  If the file does not contain a BOM at the beginning or it has a BOM for UTF-8, it will treat the file as ANSI and attempt to convert it to Unicode UTF-16.  Generally, UTF-8 Unicode files do not have a BOM because it has no meaning, other than to indicate that the file is UTF-8.

The are several ways to determine the character encoding of a file.  One fairly simple method is to load the file into a HEX file viewer like the WinBatch Browser utility and look for a BOM and for indicators of UTF-16 and UTF-8 encoding.  The BOM is the first one to three bytes of a file and will contain the hex values FFFE or FEFF for UTF-16 and EF, BB and BF for UTF-8.

If a file does not contain a BOM, you can still get a good idea of the encoding by looking at the hex values of the text.  If you see a lot of values preceded by a 00 hex value then the file is likely UTF-16.  If you see hex values C2, C3, C4, etc., mixed in with regular ANSI values then the file is likely UTF-8.

If it turns out that your file is UTF-8 (which is likely base on evidence presented), the following Tech. DB article demonstrates one technique used to handle UFT-8 in WinBatch

http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+winbatch/Strings+Convert~To~UTF-8.txt

[Edit] Another way to handle UTF-8 is to set the WinBatch code page to UTF-8 using the  ChrSetCodePage function before calling FileGetW.  If the file does not have a BOM, the function will assume that the file is UTF-8 and convert it to UTF-16. Make sure to set the code page back to the default after calling FileGetW.  If you don't, the subsequent call to StrReplace will not work.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MW4

OK,
As always you were right...
Saved as ansi and it worked.

How can I force ANSI save within winbatch to start my process



Deana

Actually I recommend dealing with the data in the format that it was received. Please see our previous posts about ways to deal with differently encoded files.
Deana F.
Technical Support
Wilson WindowWare Inc.

MW4

OK, I'm super confused.


I use this in my file to strip the registration mark, which it doesn't do because it's UTF-8

str = Num2Char (174)
rep = ""

infile="c:\Flee.txt"
outfile="c:\Flee_Fixed.txt"

fs = FileSize( infile )
binbuf = binaryalloc( fs+100 )
ret = BinaryRead( binbuf, infile )
num = BinaryReplace( binbuf, str, rep , 0)
BinaryWrite( binbuf, outfile )
binbuf=BinaryFree(binbuf)


All I want to do is strip out the registration mark


Are you suggesting this?  If so where would that go?

; Convert UTF-8 to ANSI.

strUTF8_    = "ÂÃ,® âââ,¬Å¡Ã,¬ ÃÆ'Ã,¾ÃÆ'Ã,¦ÃÆ'Ã,± HÃÆ'Ã,¶llÃÆ'Ã,«"          ; "ÂÃ,® âââ,¬Å¡Ã,¬ ÃÆ'Ã,¾ÃÆ'Ã,¦ÃÆ'Ã,± HÃÆ'Ã,¶llÃÆ'Ã,«"
strHex      = ChrStringToHex (strUTF8_)        ; "C2AE20E282AC20C3BEC3A6C3B12048C3B66C6CC3AB"
intVarType  = VarType (strUTF8_)               ; 2 string

strUTF16LE_ = ChrStringToUnicode (strUTF8_)    ; "Ã,® € þæñ Höllë"
strHex      = ChrUnicodeToHex (strUTF16LE_)    ; "AE002000AC202000FE00E600F10020004800F6006C006C00EB00"
intVarType  = VarType (strUTF16LE_)            ; 128 LPWSTR or "Unicode"

ChrSetCodepage (0)                             ; 0 ANSI code page

strANSI_    = ChrUnicodeToString (strUTF16LE_) ; "Ã,® € þæñ Höllë"
strHex      = ChrStringToHex (strANSI_)        ; "AE208020FEE6F12048F66C6CEB"
intVarType  = VarType (strANSI_)               ; 2 string





MW4

I pull the original file from a vendors FTP server using:

iFtpGet(conhandle,KiaFileName,slocBoxFile,0,@ASCII, @TRUE)

Can the codepage be set here?
Should that be binary instead of ASCII?

Deana

Since the file is UTF8 the code might look something like this:

Code (winbatch) Select
str = Num2Char (174)
rep = ""
utf8file = 'C:\TEMP\Data\utf8.txt'
intCP = ChrSetCodepage (65001); Translate using UTF-8
data = StrSub( FileGetW( utf8file ), 2, -1 ) ; Ignore BOM
ChrSetCodepage (intCP)
newdata = StrReplace( data, str, rep )
Pause(data,newdata)
Exit
Deana F.
Technical Support
Wilson WindowWare Inc.

MW4

So is this my best course then?
Are there any issues with handling it like this?

Code (winbatch) Select
str = Num2Char (174)
rep = ""

utf8file = 'c:\FleetinvCheckit.txt'
outfile="c:\FleetinvCheckit_Fixed_xx.txt"

intCP = ChrSetCodepage (65001); Translate using UTF-8
data = StrSub( FileGetW( utf8file ), 2, -1 ) ; Ignore BOM
ChrSetCodepage (intCP)
newdata = StrReplace( data, str, rep )

handle = FileOpen(outfile, "WRITE")
FileWrite(handle, newdata)
FileClose(handle)

Deana

Deana F.
Technical Support
Wilson WindowWare Inc.

MW4

Ugh...
It chops off the first character of the file which throws off the first record.
First character is a number 0

Only the first record, the other lines are perfect

MW4

Is it because of this?

Make sure to set the code page back to the default after calling FileGetW.  If you don't, the subsequent call to StrReplace will not work.

td

As mentioned last week, UTF-8 encoded files often do not have a BOM.   Notepad.exe puts a three byte BOM in UTF-8 encoded files but that is the exception and not the rule. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MW4

So how do I get around that?
In here : StrSub( FileGetW( utf8file ), 2, -1 )


StrSub( FileGetW( utf8file ), 1, -1 )   ??

Deana

Deana F.
Technical Support
Wilson WindowWare Inc.