language: deutsch :: english

Articles

 

AspNet 1.1 Char Encoding

revised: Wednesday, December 01, 2004

[Web Developers: ASP.NET Encoding and VisualStudio.NET 2003]

Unicode is a standard that can solve the problems with multi-lingual content and supports currently 96.447 characters of the world's alphabets. It assigns a unique and unambiguous number to every single known character. Well, some may say that codepages and dozens of encoding/ CharSet Standards (like e.g. ISO-8859-1 which represents the 0-127 ASCII chars plus the Latin-1 Supplement) solved also the problems with language specific characters. But: you can't mix different languages and you're definitely stuck with one codepage or standard at a time. And if dealing with different platforms (e.g. MAC, Windows) you run again into problems.

Here we're just talking about the Web and Browsers. When dealing with a web page that is in a Western Europe language it might be ok to use ISO-8859-1 or may be -2, but you would get the same result by using Unicode, respective the Unicode Transformation Format UTF-8, which will compress and store most of your characters into 1-byte units.

How ASP.NET and VisualStudio.NET 2003 are dealing with encoding issues

ASP.NET has a globalization section in Web.config where response and request default encodings are set. This might look like the following:

<globalization 
    requestEncoding="utf-8" 
    responseEncoding="utf-8"     
/>

The good thing is...
ASP.NET will generate a HTTP header according to these settings, so you don't need to add one via coding!

You don't have to set an encoding HTTP header at web server level, either! This is a great advantage ASP.NET has built in.
Try using a HTTP header sniffing utility and you will see that you can switch to every other encoding by just changing the encoding setting in the globalization section.

This is of course a better mean than using a META tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in your HTML code, but you may do this additionally even when most clients will not take care about it. Anyhow, there are issues with robots from search engines that indeed seem to obey the setting in the meta tag, even when the HTTP header tells a different story. That means you have to be consistent, be aware what your web.config setting is and don't mix encodings! In most cases it will render perfectly fine, but it's an inconsistency anyway.

Handling of the file format files are physically stored in

You can set another element in the globalization section that might either mess up your output or deliver it correctly:

fileEncoding="utf-8" [...or onstead of utf-8 something else...]

The element 'fileEncoding' provides information about how ASP.NET expects certain source files! In particular, it specifies the default encoding for .aspx, .asmx, and .asax file parsing.

If you set this element your files actually have to be saved according to your setting. If NOT, you run into misinterpretation of the file content and in particular into misinterpretation of the language specific characters or e.g. the Trademark, Registered, Euro-Currency Symbol and many others. Reason: ASP.NET looks up what it has to expect and parses the file accordingly. If the bytes in your source file are stored with another encoding, let's say Western Europe Windows Codepage 1252 (which would of course include your ä,ü, etc. Umlaute) the bytes would be interpreted as utf-8 encoded and therefore wrongly!

By default VisualStudio.NET 2003 will set the standard encoding according to your System locale, which is (probably) Western Europe. So don't try to change the globalization element 'fileEncoding' to something different unless you are 100% sure that all your source files are stored the same way. To check what Visual Studio 2003 has set as default encoding you can open any .aspx page and then switch to HTML view. Now you will see a menu item under the 'File' Menu. It's called 'Advanced Save Options...' and allows you to force VisualStudio.NET 2003 to save your file according to the selected encoding format.

In most cases there's no need to change this for a file. Keep in mind: it has to be consistent with the globalization element fileEncoding! Anyhow there are exceptions: Files that will be saved in Unicode OR in 'UTF-8 with signature' will be recognized in any case even when your fileEncoding element is set different! For all other file formats the globalization element with have priority and if different ...your page might look messy.

Leave the globalization element 'fileEncoding' alone unless you have reason to change it and you're absolutely sure that all your pages and web controls are stored accordingly! If you have used the default setting then you will not run into problems when using VS.NET 2003 default storing method.

Unicode or UTF-8 Encoding with signature in VisualStudio.net 2003
UTF-8 with signature means that the utf-8 encoded file gets the so-called BOM (Byte Order Mark) which indicates what type of decoding had been used. Even if normally not necessary for utf-8 files it's useful for VS.NET 2003, since it only recognizes this format (and Unicode) automatically, even if your globalization setting is default or something different. If you save as a utf-8 without signature ASP.NET will not recognize that it is an UTF-8 file and tries to decode with the default encoding method or the one in the fileEncoding element what may result in trash characters. If you save a file in Unicode format you will have larger files, but ASP.NET will at least recognize the file format.

Conclusion
FileEncoding and ResponseEncoding are different pairs of shoes!
First thing is existent on server-side and can be of interest when dealing with files that have been edited and stored in different formats. The other is the way you propagate your character encoding to the client, the browser. The client will take this information and decodes the stream of bytes accordingly. Try it with a page that contains a lot of special characters and languages.



こんにちは   (konnichi-wa), Japanese, "Hello; Good afternoon"
®   (Registered Sign)
äüöß   (German Umlaute)

By the way: if you can't read the Japanese text, you probably lacking the necessary font support on your system. Fonts must be available on a computer in order to disply the sometimes 'complicated looking' foreign characters.
On most Windows XP Systems you will have "Japanese" installed by default - many other languages as well. But e.g. on Windows 2000 Systems not. On a Windows System you can check your settings in the Control Panel. On a Win2000 System go to "Regional Settings" and examine the "General Tab". Japanese is probably not set. After setting it the font support will be installed and you have to reboot. Try again.


Cheers, best regards,
Frank

Return to All News