#1 16 Apr 2016 17:21

psyl0w
New Member
Registered: 16 Apr 2016
Posts: 1

Redirected streams - controlling output encoding

Here is what i wanna do:
Redirect properly the outputs of console application (I use the misuse term command in the rest of the message) into a file with 1252 encoding (to make it readable from any notepad soft in default configuration.

What I’ve observed*

Chcp is effective with internal commands and some external command (recent ones)
First of all it’s worth noticing CHCP operates differently under Win7 and Win 10.
If the following batch is run from a cmd prompt, you can notice the command outputs are displayed properly in win10 console whereas a win7 console renders characters out of ASCII badly.

for /f "tokens=2 delims=:" %%G in ('chcp') do Set _cp_=%%G
chcp 1252
@echo test an internal command
dir
@echo test an external (recent) command: Robocopy
robocopy .\ .\ /L
@echo test an external (legacy) command: Xcopy
xcopy test.txt 2>&1
chcp %_cp_% 

Incidentally, I am interested in knowing what causes such a difference although it’s not really the purpose of that message and since it’s easily fixable by adding a ps invoke “powershell [console]::outputencoding=[system.text.encoding]::getencoding(850)”  in the batch after the 1st chcp command.
Whatever the real issue occurs when the batch output is redirected into a file: test.cmd > test.txt.
In that case the result is the same whatever OS. The output of Internal commands and new external commands (Robocopy, Bcdedit, etc) are properly 1252 encoded. Legacy commands (xcopy, chcp, etc) are not (output in OEM code page). In brief, most of commands are not affected by CHCP or equivalent [console] change thru powershell.

Various speculations about that mess:

1-The legacy command code is based on CRT whereas internal commands and most recent external ones use Win32 API. It’s based on the last section regarding the console application development from //msdn.microsoft.com/en-us/library/bb688114.aspx MSDN Globalization Step-by-Step
2-Since with win10 what is displayed in the console (same encoding for all command outputs) and stored a file is different (output encoding change depending the command), output/input streams may be handled differently depending on the type of handles they point. Console functions may be used for display and I/O file functions in case of redirection. Based on  //msdn.microsoft.com/en-us/library/windows/desktop/ms683457%28v=vs.85%29.aspx High-Level Console Input and Output Functions
2bis- MS recommends the code of console applications forces OEM encoding of the output stream. Ref.://msdn.microsoft.com/en-us/library/windows/desktop/ms682060%28v=vs.85%29.aspx Console Application Issues
If MS suggestion is applied in the code of external commands that may explains why the redirection of their output streams into a file is always encoded OEM_CP whatever the console code page is applied. Oddly, readfile and writefile are not mentioned among functions affected by SetFileApisToOEM (://msdn.microsoft.com/en-us/library/windows/desktop/aa365534%28v=vs.85%29.aspx)
Finally I don’t know if the difference between legacy commands and lately introduced ones is because their code respect MS suggestion and just because string literales are coded OEM Vs ANSI..

Possible solutions/workaround

If 2bis is correct, they are certainly very few.:
It’s possible to change the value of registry key HKLM\system\currentset\control\NLS\codepage OEMCP=1252. It’s not safe (do not try to set Unicode 65001, your system may refuse booting) and inconvenient (reboot necessary).
Or, fill the file with OEM encoded contents only  and transcode the file with PS script at the end of the batch. Simple but not very elegant if the file is accessed and checked periodically.
If 2 is correct, it may exist a function that controls the encoding of I/O file function readfile and writefile (/msdn.microsoft.com/en-us/library/windows/desktop/bb540537%28v=vs.85%29.aspx)
If 1 is correct, it should be possible to control the international settings or culture of the current user session and so control the code page of CRT application. Since Win8, it’s possible thru Powershell technet.microsoft.com/en-us/library/hh825705.aspx PowerShell Configure International Settings in Windows. Command line applications are also able to perform such things: //devio.wordpress.com/2011/04/12/cmd-net-querying-the-net-environment/
Whatever, the difficulty here is about creating a “culture” with OEM code page set to 1252 as that doesn’t exist in the pre-defined set: /www.microsoft.com/resources/msdn/goglobal/default.mspx

*Notice: Of course what is dealt with here makes only sense for systems out of en-US (or similar) locale and using glyphs different of ASCII characters in their local language
Sorry for broken links, but not allowed to post link so far

Offline

Board footer

Powered by FluxBB