Symptoms
In the output of a command line program (like e.g.
KPScript),
some special characters are displayed correctly and some are not.
When redirecting the output to a TXT file using the '> '
console operator and opening the TXT file in a text editor,
special characters appear to be garbled.
These problems usually appear only under Windows, not Linux
(because Linux usually uses UTF-8 encoding).
Cause
The Windows command line window by default uses OEM code pages.
For general information about code pages, see
Wikipedia: Code page.
In the US, code page 437 is used;
in Western Europe, code page 850 is used;
etc.
A detailed list can be found here:
Code Page Identifiers.
These OEM code pages do not support all characters.
They include a small subset of foreign characters
(e.g. the US code page 437 includes some Greek characters),
thus some special characters display properly in a command line window.
Characters that are rarely used cannot be encoded though.
When redirecting the output to a TXT file using the '> '
console operator, the TXT file uses the same encoding as the command
line window. Text editors usually do not expect OEM code pages and thus render
special characters improperly (even the ones that display properly
in the command line window).
Finding your code page.
You can find out which code page your command line window is using by
typing 'Chcp ' (without any parameters).
Weak solution: Good text editor
The most simple solution is to tell the text editor which code page
the TXT file is using.
All characters supported by the code page are then loaded/displayed properly.
The disadvantage of this solution is that characters outside the console
code page are lost.
Every advanced text editor supports selecting the code page; some examples:
- PSPad.
In the 'Format' menu, click 'OEM', then open the TXT file
(it is important to select 'OEM' before opening the TXT file).
- Notepad2.
Open the TXT file and choose the code page under 'File' → 'Encoding' → 'Recode'.
- Notepad++.
Open the TXT file and choose the code page under 'Encoding' → 'Character Sets'.
- Microsoft Visual Studio.
Go 'File' → 'Open' → 'File', select the TXT file, click the drop-down
arrow right of the 'Open' button in the file selection dialog,
choose 'Open With', select 'Source Code (Text) Editor With Encoding',
choose the correct code page and click 'OK'.
Recommended solution: Change console code page
The console character encoding can be changed to
UTF-8,
which is identified by code page 65001 (on Windows systems).
UTF-8 allows encoding all Unicode characters, i.e. special characters of all
languages are supported.
In order to change the code page to UTF-8, run the following command:
Chcp 65001
This works fine under Windows 7 and higher.
Older operating systems might not support it.
The command must be executed in the command line window before running
the command that redirects the output to the TXT file.
Windows does not save the chosen code page, so the code page change command
must be executed in every command line window separately.
The output TXT file will be encoded using UTF-8. This encoding is supported
by almost every text editor. UTF-8 is usually detected automatically, i.e.
you do not have to select the encoding / code page manually;
you can "just open" the file.
After changing the encoding to UTF-8, special characters might be displayed
improperly in the command line window (but are written fine to the TXT file),
because the default raster font does not support the characters.
In this case, select a different font (by clicking on the command line
window's icon → 'Properties'), like e.g.
'Consolas' or 'Lucida Console'.
PowerShell
When using the Windows PowerShell instead of the standard command line window,
the TXT file will always be encoded using UTF-16 LE, independent of
which console code page is selected.
PowerShell automatically converts the output of a command line program
from the currently active console code page to the UTF-16 LE representation.
So, PowerShell does not magically preserve special characters.
The command line program can only output characters that can be encoded
using the currently active console code page.
Thus, it is recommended to use the Chcp 65001 solution above
for PowerShell, too.
|