Is there some way to use WIL to convert an entire file of this type, with double-byte characters, to all single byte characters?
You cannot convert a file containing double-byte characters to one with single-byte characters only. That is because there are no representations in ANSI for some double-byte characters. You can, however, convert a file from double-byte to UFT-16 LE (Windows Unicode). All you need to do is open the file using FileGet, then make the conversion to Unicode using the function ChrStringToUnicode:
https://docs.winbatch.com/mergedProjects/WindowsInterfaceLanguage/html/WILAK_C__016.htm (https://docs.winbatch.com/mergedProjects/WindowsInterfaceLanguage/html/WILAK_C__016.htm)
Quote from: td on April 25, 2025, 08:42:36 AMThat is because there are no representations in ANSI for some double-byte characters.
I'm don't intend to "narrow" the kanji, only the Latin characters that are double-bytes. Is there a way?
What you now state you want is not a file conversion. It is stripping a file of some characters.
For the most part, Latin characters have the same representation in Shift-JIS as in regular ANSI. Also, WinBatch is double-byte aware as long as your OS is configured to support double-byte code points. Irregardless, the only issue with stripping your file is the lead bytes, which can precede characters above 0X40. If you convert to Unicode, characters outside the ANSI range are easily detected using WinBatch string functions. You could also load your file into a binary buffer and manually detect lead-bytes, which indicates the following byte is not a Latin character.
Wikipedia has a chart that shows how Shift-JIS represents single and double-byte characters.
https://en.wikipedia.org/wiki/Shift_JIS (https://en.wikipedia.org/wiki/Shift_JIS)
You may also need to use the function ChrSetCodePage with codepage 932 to convert to Unicode. It all depends.
Here is a simple and almost completely untested example - all bugs are provided at no extra charge.
TestFile = 'C:\temp\shift-jis.txt'
Size = FileSize(TestFile)
Stuff = FileGet(TestFile)
DefPage = ChrSetCodepage(932)
UniStuff = ChrStringToUnicode(Stuff)
ChrSetCodepage(DefPage)
Size *= 2
bin = BinaryAlloc(Size+2)
BinaryPokeStrW(bin, 0, UniStuff)
ansi = ''
for i = 0 to Size by 2
; Windows Unicode is LE.
if BinaryPeek(bin, i+1) then continue
else ansi := Num2Char(BinaryPeek(bin, i))
next
message('Double Bytes Removed', ansi)
exit
A slightly different untested version.
TestFile = 'C:\temp\shift-jis.txt'
Size = FileSize(TestFile)
Stuff = FileGet(TestFile)
DefPage = ChrSetCodepage(932)
UniStuff = ChrStringToUnicode(Stuff)
ChrSetCodepage(DefPage)
Size *= 2
bin = BinaryAlloc(Size+2)
BinaryPokeStrW(bin, 0, UniStuff)
ansi = ''
for i = 0 to Size by 2
; Windows Unicode is LE.
if BinaryPeek(bin, i+1) then continue
CodeP = BinaryPeek(bin, i)
if CodeP > 31 && CodeP < 126 then ansi := Num2Char(CodeP)
next
message('Double Bytes Removed', ansi)
exit