#1 10 Mar 2021 14:26

New Member
From: Austria
Registered: 14 Feb 2018
Posts: 1

Handling of UCS-2le unicode file names on XP


I'm having an interesting problem with unicode, so I wanted to ask for help.

Here's what I want to do:

1.) Read some data from input files (both names of those files and contents).
2.) Generate output data from files (A/V streams into an MKV container, meta data from existing file names and text file contents)
4.) Output file name (here's the tricky part) comes from a slightly modified input file name and/or from a text file

There's quite a few steps involved here, but this is the main problem:

Say I have already generated a UCS-2le encoded text file called "filename.txt" that contains a single line of text: My target file name. Say that name would be "MySweetTargetfile♡.mkv". Note that "MySweetTargetfile♡.mkv" can be created in Windows Explorer without any problems.

Now, I try to use that file name as variable content, and just create a dummy file containing only newline characters for testing. This should create a new file called "MySweetTargetfile♡.mkv":

SET /P outFile=<"filename.txt"
ECHO. >"%outFile%"

What it creates though is not "MySweetTargetfile♡.mkv" but just "M". It appears the string gets truncated at the second byte of the first character, which in this case is a null byte. So "M" is 4D 00 in hexadecimal, instead of just 4D, because UCS-2le is 16-bit fixed-width.

It appears that reading that into a variable truncates at the null byte. I tested this by changing 4D 00 to 4D 01, which turns the "M" it into a weird "ō" character. But when running the above code again, it now reads up to the "y" in the string "MySweetTargetFile♡.mkv", creating a file named "ōy". So it really terminates at the null byte, it seems.

I tried another approach:

FOR/F "tokens=1 delims=. usebackq" %%I IN (`TYPE "filename.txt"`) DO SET "outFile=%%I.mkv"
ECHO. >"%outFile%"

The result is the same, it truncates at the zero byte. I tried this with and without a UCS-2le BOM in "filename.txt", only difference being that with the BOM, the BOM itself it becomes a part of the new file name, which is obviously undesirable. I also tried running the whole script with cmd.exe /U /C "script.bat", but that made no difference, even though it should enable UCS-2le pipes and redirections, or in-/ouput in general.

So, bottom line, here's my question:

How can I create files with UCS-2le encoded file names on NTFS file systems using cmd.exe / batch programming, with symbols outside the range of what any ANSI codepage covers? How can I provide, or parameterize a batch script with such file names so that it can properly create them, even with highly uncommon characters like ☆★♡ ♥Ψ≤≥♪, amongst many others?

Or is this not possible in pure batch?

I hope I explained this in an understandable way...

Thank you very much!

Girls Love, Best Love!


Board footer

Powered by FluxBB