CHCP transcoding to ANSI - controlling output encoding

Microsoft Windows
Post Reply
User avatar
MigrationUser
Posts: 336
Joined: 2021-Jul-12, 1:37 pm
Been thanked: 2 times
Contact:

CHCP transcoding to ANSI - controlling output encoding

Post by MigrationUser »

16 Apr 2016 17:21
psyl0w


Here is what i wanna do:
Redirect properly the outputs of console application (I use the misuse term command in the rest of the message) into a file with 1252 encoding (to make it readable from any notepad soft in default configuration.

What I’ve observed*

Chcp is effective with internal commands and some external command (recent ones)
First of all it’s worth noticing CHCP operates differently under Win7 and Win 10.
If the following batch is run from a cmd prompt, you can notice the command outputs are displayed properly in win10 console whereas a win7 console renders characters out of ASCII badly.

Code: Select all

for /f "tokens=2 delims=:" %%G in ('chcp') do Set _cp_=%%G
chcp 1252
@echo test an internal command
dir
@echo test an external (recent) command: Robocopy
robocopy .\ .\ /L
@echo test an external (legacy) command: Xcopy
xcopy test.txt 2>&1
chcp %_cp_% 
Incidentally, I am interested in knowing what causes such a difference although it’s not really the purpose of that message and since it’s easily fixable by adding a ps invoke “powershell [console]::outputencoding=[system.text.encoding]::getencoding(850)” in the batch after the 1st chcp command.
Whatever the real issue occurs when the batch output is redirected into a file: test.cmd > test.txt.
In that case the result is the same whatever OS. The output of Internal commands and new external commands (Robocopy, Bcdedit, etc) are properly 1252 encoded. Legacy commands (xcopy, chcp, etc) are not (output in OEM code page). In brief, most of commands are not affected by CHCP or equivalent [console] change thru powershell.

Various speculations about that mess:

1-The legacy command code is based on CRT whereas internal commands and most recent external ones use Win32 API. It’s based on the last section regarding the console application development from //msdn.microsoft.com/en-us/library/bb688114.aspx MSDN Globalization Step-by-Step
2-Since with win10 what is displayed in the console (same encoding for all command outputs) and stored a file is different (output encoding change depending the command), output/input streams may be handled differently depending on the type of handles they point. Console functions may be used for display and I/O file functions in case of redirection. Based on //msdn.microsoft.com/en-us/library/windows/desktop/ms683457%28v=vs.85%29.aspx High-Level Console Input and Output Functions
2bis- MS recommends the code of console applications forces OEM encoding of the output stream. Ref.://msdn.microsoft.com/en-us/library/windows/desktop/ms682060%28v=vs.85%29.aspx Console Application Issues
If MS suggestion is applied in the code of external commands that may explains why the redirection of their output streams into a file is always encoded OEM_CP whatever the console code page is applied. Oddly, readfile and writefile are not mentioned among functions affected by SetFileApisToOEM (://msdn.microsoft.com/en-us/library/windows/desktop/aa365534%28v=vs.85%29.aspx)
Finally I don’t know if the difference between legacy commands and lately introduced ones is because their code respect MS suggestion and just because string literales are coded OEM Vs ANSI..

Possible solutions/workaround

If 2bis is correct, they are certainly very few.:
It’s possible to change the value of registry key HKLM\system\currentset\control\NLS\codepage OEMCP=1252. It’s not safe (do not try to set Unicode 65001, your system may refuse booting) and inconvenient (reboot necessary).
Or, fill the file with OEM encoded contents only and transcode the file with PS script at the end of the batch. Simple but not very elegant if the file is accessed and checked periodically.
If 2 is correct, it may exist a function that controls the encoding of I/O file function readfile and writefile (/msdn.microsoft.com/en-us/library/windows/desktop/bb540537%28v=vs.85%29.aspx)
If 1 is correct, it should be possible to control the international settings or culture of the current user session and so control the code page of CRT application. Since Win8, it’s possible thru Powershell technet.microsoft.com/en-us/library/hh825705.aspx PowerShell Configure International Settings in Windows. Command line applications are also able to perform such things: //devio.wordpress.com/2011/04/12/cmd-net-querying-the-net-environment/
Whatever, the difficulty here is about creating a “culture” with OEM code page set to 1252 as that doesn’t exist in the pre-defined set: /www.microsoft.com/resources/msdn/goglobal/default.mspx

*Notice: Of course what is dealt with here makes only sense for systems out of en-US (or similar) locale and using glyphs different of ASCII characters in their local language
Sorry for broken links, but not allowed to post link so far

----------------------------

#2 18 Feb 2021 15:07
psyl0w


Hi everyone,

As I was searching some info on the forum, i saw again that old post and realized I've never posted the solution I use in my script routinely to transcode OEM output of external commands to ANSI standard Windows encoding (the default one used by notepad to open a text file ).
It’s a short batch that doesn’t use any temporary file and acts as a stream transcoder.

Code: Select all

:: Description : Transcode the OEM output stream of external commands to ANSI chars in order to get notepad readable files with redirections
:: Usage: prog.exe | output1252
::        2>&1 prog.exe | output1252 > log.txt
@echo off
setlocal
:: CodePage of the command and consol programs (OEM code page)
set OEM_CP=850
chcp %OEM_CP% >NUL
:: The default local Windows codepage (ANSI code page), for Western Europe: 1252
set ANSI_CP=1252

>NUL chcp %ANSI_CP% & for /F delims^=^ eol^= %%A in ('more') do (
  call :WRITEOUT %%A)
echo:
goto :eof

:WRITEOUT
   echo %*
goto :eof
Limitations:
- Empty lines are dropped (by FOR /F instruction): an extra line is inserted at the very end of the script to make the log files more readable
- Can’t process binary data due to the ‘more’ instruction features: TAB code transformed to space, etc
- Redirect STDErr to STDout to trancode it

[EDIT]

Code: Select all

Another 'pipable' option I was thinking about:

: Description : Transcode OEM 850 input data to ANSI-1252 output
:: Usage: ThatBatch < 850.txt >1252.txt
::        prog.exe | ThatBatch
::        2>&1 prog.exe | ThatBatch > log.txt
@echo off
0<NUL chcp 850 >NUL
clip
0<NUL chcp 1252 >NUL
powershell Get-Clipboard
Limitations:
- Slightly slower than the previous one for short contents but probably faster for large chunks of data or big files.
- Require PowerShell 5

But : The data stream is let untouched (it is even possible to use raw option for binary data)

Enjoy

Last edited by psyl0w (19 Feb 2021 15:50)
Post Reply