All Things WinBatch > WinBatch

FileOpen ANSI Output Format

(1/2) > >>

I'm running into something odd.  I have a script that processes a UTF-8 file hashing passwords within each line in the file and outputs the data to a separate file.  The script was compiled in 2017 and I'm not sure which WinBatch version was used. This script creates an ANSI encoded file which the vendor is expecting.  See the attached OLD - Compiled in 2017.jpg image with how Notepad sees the file.
The old script uses these two FileOpen functions to read and write.

--- Code: Winbatch ---readfile = fileopen(sourcepath : inputfile, "read")
writefile = fileopen(sourcepath : outputfile, "write")
When I run the code below in 2022B Studio, I expect to get an ANSI encoded file since I'm not indicating a Unicode flag but it appears to be UTF-8. The output file from the code is attached as well as an image of how Notepad sees the file.

--- Code: Winbatch ---output = fileopen("C:\TempWork\Output_ANSI.txt","write")
filewrite(output, "Line1")
filewrite(output, "Line2")
filewrite(output, "Line3")
Am I doing something wrong, has something changed, is there a bug?


maybe try a third param on the FileOpen:   @false   to indicate you do Not want unicode.
I thought that was the default, but maybe not...

--- Code: Winbatch ---output = fileopen("C:\TempWork\Output_ANSI.txt","write",@False)

I did try adding the @FALSE flag but got the same UTF-8 file.

FileOpen, FileRead, and FileWrite have not changed since support for long paths was added in version 2017A and that did not affect file encoding in any way. The character encoding functionality of the functions has not changed since 2005E.  Also, when using the Windows OS "Unicode" refers to UTF-16 LE and not UTF-8 because UTF-16 LE is Windows's native character encoding. WinBatch follows that convention. FileOpen's default behavior is to treat a file as ANSI unless it has a UTF-16 LE or UTF-16 BE BOM. WinBatch will never convert text to UTF-8 unless you first convert the text to Unicode (UTF-16 LE), specify UTF-8 using the ChrSetCodePage function, and then convert the text back to ANSI.

Also, an ANSI file with no high-bit characters is identical to a UTF-8 file unless someone adds a BOM to the beginning of the file. Since the Unicode standard makes the BOM in UTF-8 text optional, it is seldom added.

Finally, keep in mind that Notepad simply guesses the character encode of a file unless it has a BOM at the beginning and the file you posted has no BOM nor high-bit characters.

<edit> When I refer to ANSI, I mean Windows Latin code pages which are more or less equivalent to ANSI.


So, using the venerable Notepad++ and the Hex Editor plugin,
I can confirm that, for this file, UTF-8 and ANSI are the same thing.

Either way, it is binary the same -- see photo attached


[0] Message Index

[#] Next page

Go to full version