Foreign Characters for the Eclipse Build System


Having a problem with Eclipse and building files with foreign characters in the file name? If you are developing software, then read and follow this advice:

“Do :!: NOT :!: use foreign characters in file names, paths or for anything else!”

What I mean with ‘foreign characters’ are things like éöüàäü, or simply anything which is outside the 7bit ASCII or Windows-1252 code page table, even if they are allowed by the file system of your operating system (e.g. Windows).

Or in other words: only use these characters for file or directory names:

abcdefghijklmnopqrstuvwxyzABCDEFGEHIJKLMNOPQRSTUVWXYZ1234567890_

Following that advice will keep you out of a lot of troubles, because many tool chain will simply not handle anything else well. You might be able to use spaces in file names, but to keep things on the save side: don’t use it.

If you follow that rule, you are fine and you can stop reading that article now :-).

Eclipse and Foreign Characters

You are still reading? :shock: So if you have foreign characters in your file or director name, then here is how you to workaround at least some of the Eclipse (and Windows!) issues around it.

Eclipse itself deals pretty well with foreign characters. The issue is with Windows and the command prompt/DOS Shell :-(.

Failed Building

The issue is observed in Eclipse, e.g. in Texas Instrument Code Composer Studio v5. Having a file name with umlaut fails the build:

CCS Build and File with Umlaut

CCS Build and File with Umlaut

Obviously, the ‘Ü’ character of the source file is handled properly by Eclipse, but not by the build (make) system which run with command line tools and on DOS/cmd level.

Same thing with CodeWarrior and ARM gcc: the compiler is using a wrong file name:

GNU gcc build failure

GNU gcc build failure

Inspecting the make files shows that things are ok here:

Make File

Make File

So something is going wrong with calling make and the compiler. It looks a wrong character code translation is happening from Eclipse to the command prompt (DOS command line) level, and that code pages are not matching on my machine :-(.

Code Pages

It turns out that it is all about ‘Code Pages’: how ASCII or Windows-1252 code pages are handled on the DOS/command prompt level on windows. A Microsoft Tech Note explains code pages here.

How to find out which Code Page uses cmd.exe? This excellent article shows that the command chcp (for change code page) shows the active code page:

chcp in the cmd.exe

chcp in the cmd.exe

Eclipse Code Page

But what encoding uses Eclipse? It must be the code page set by the Java environment? I find the settings under the menu Window > Preferences:

Eclipse Text File Encoding

Eclipse Text File Encoding

So it shows for me the default windows code page 1252. It is possible to change the default code page of Eclipse (for the workspace) using the drop down box:

Changing Default Code Page

Changing Default Code Page

For CodeWarrior, it is possible to use an Eclipse command line argument to define the code page. This is set in the ‘cwide.ini’ file inside the eclipse installation folder. In trying to fix my problem, I have added this line to it and restarted Eclipse:

-Dfile.encoding=cp850

I was saying ‘trying’, because it fixed the error message reported back by the compiler, but the build still failed:

Build still fails with Code Page 850

Build still fails with Code Page 850

Well, there must be something more. So I decided to revert my change in the cwide.ini file, and asked around for thoughts and help. And yes, someone came to the rescue and explained what is happening (Sluvy: thank you, thank you, thank you!).

The thing is that GNU make and even the compiler/linker is internally calling its own programs and batch files, invisible for me. And it looks like these executables likely are using a different code page, thus failing the build. But Sluvy has found a fix which requires a Windows registry change :-).

Windows Registry

The trick is to permanently set the code page used by the Windows Command Processor (DOS Shell, cmd.exe) using a small ‘autorun’ command. Whenever something is using the command processor, it will execute my command, which is to set the code page to the same one I’m using in Eclipse.

For this, I run regedit.exe and go to this setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Command Processor

Here I use the context menu to add a new Multi-String Value:

Note: I case I already have that value, I do not need to add it, of course :-)

Adding Registry Multistring Value

Adding Registry Multi-String Value

I name the new value ‘Autorun’ and assign the command ‘chcp 1252′ to change the code page:

Autorun Value

Autorun Value

This is how it should look like:

Autorun in registry

Autorun in registry

:idea: To make that ‘autorun’ command invisible, I can use ‘@chcp 1252>nul’, see this link.

Building with Foreign Characters

Now time to try it out :-).

:idea: After changing the code page, it is advised to rebuild all the make files and do a clean build (menu Project > Clean).

And indeed: my project now compiles properly in Eclipse/CodeWarrior now:

Building with Eclipse and Foreign Characters in File Names

Building with Eclipse and Foreign Characters in File Names

Summary

I learned a lot around Windows and Code Pages. And as always: it looks like the shortcomings of the past (7-bit ASCII code, etc) echoes into our world today, making things fail. Luckily there are is a way to overcome this, if necessary: it is possible to change the code page of the Windows command processor (cmd.exe) with a registry if it does not match the Eclipse code page used.

But it enforces even more my rule: “Do not use foreign characters in file names”, and simply sticking with normal characters and letters. It will avoid a lot of troubles :grin:.

Happy Code Paging :-)

About these ads

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s