FileOpen ANSI Output Format

Started by chrislegarth, August 17, 2023, 07:46:56 AM

Previous topic - Next topic

chrislegarth

I'm running into something odd.  I have a script that processes a UTF-8 file hashing passwords within each line in the file and outputs the data to a separate file.  The script was compiled in 2017 and I'm not sure which WinBatch version was used. This script creates an ANSI encoded file which the vendor is expecting.  See the attached OLD - Compiled in 2017.jpg image with how Notepad sees the file.
The old script uses these two FileOpen functions to read and write.
Code (winbatch) Select
readfile = fileopen(sourcepath : inputfile, "read")
writefile = fileopen(sourcepath : outputfile, "write")


When I run the code below in 2022B Studio, I expect to get an ANSI encoded file since I'm not indicating a Unicode flag but it appears to be UTF-8. The output file from the code is attached as well as an image of how Notepad sees the file.
Code (winbatch) Select
output = fileopen("C:\TempWork\Output_ANSI.txt","write")
filewrite(output, "Line1")
filewrite(output, "Line2")
filewrite(output, "Line3")
fileclose(output)


Am I doing something wrong, has something changed, is there a bug?

Thanks!!

kdmoyers

maybe try a third param on the FileOpen:   @false   to indicate you do Not want unicode.
I thought that was the default, but maybe not...

Code (winbatch) Select
output = fileopen("C:\TempWork\Output_ANSI.txt","write",@False)
The mind is everything; What you think, you become.

chrislegarth

I did try adding the @FALSE flag but got the same UTF-8 file.

td

FileOpen, FileRead, and FileWrite have not changed since support for long paths was added in version 2017A and that did not affect file encoding in any way. The character encoding functionality of the functions has not changed since 2005E.  Also, when using the Windows OS "Unicode" refers to UTF-16 LE and not UTF-8 because UTF-16 LE is Windows's native character encoding. WinBatch follows that convention. FileOpen's default behavior is to treat a file as ANSI unless it has a UTF-16 LE or UTF-16 BE BOM. WinBatch will never convert text to UTF-8 unless you first convert the text to Unicode (UTF-16 LE), specify UTF-8 using the ChrSetCodePage function, and then convert the text back to ANSI.
   

Also, an ANSI file with no high-bit characters is identical to a UTF-8 file unless someone adds a BOM to the beginning of the file. Since the Unicode standard makes the BOM in UTF-8 text optional, it is seldom added.

Finally, keep in mind that Notepad simply guesses the character encode of a file unless it has a BOM at the beginning and the file you posted has no BOM nor high-bit characters.

<edit> When I refer to ANSI, I mean Windows Latin code pages which are more or less equivalent to ANSI.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

Fascinating.

So, using the venerable Notepad++ and the Hex Editor plugin,
I can confirm that, for this file, UTF-8 and ANSI are the same thing.

Either way, it is binary the same -- see photo attached

The mind is everything; What you think, you become.

td

I got an identical result using the WinBatch Browser utility application.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

chrislegarth

Thanks for everyone's input.  I sent off a file to the vendor and I'll see if they have any issues.

I should have figured it was not WinBatch leading me astray but rather Notepad "guessing".  ;D