[Web Developers: ASP.NET Encoding and VisualStudio.NET 2003]
Unicode is a standard that can solve the problems with multi-lingual content and
supports currently 96.447 characters of the world's alphabets. It assigns a unique
and unambiguous number to every single known character. Well, some may say that
codepages and dozens of encoding/ CharSet Standards (like e.g. ISO-8859-1 which
represents the 0-127 ASCII chars plus the Latin-1 Supplement) solved also the problems
with language specific characters. But: you can't mix different languages and you're
definitely stuck with one codepage or standard at a time. And if dealing with different
platforms (e.g. MAC, Windows) you run again into problems.
Here we're just talking about the Web and Browsers. When dealing with a web page
that is in a Western Europe language it might be ok to use ISO-8859-1 or may be
-2, but you would get the same result by using Unicode, respective the Unicode Transformation
Format UTF-8, which will compress and store most of your characters into 1-byte
units.
How ASP.NET and VisualStudio.NET 2003 are dealing with encoding issues
ASP.NET has a globalization section in Web.config where response and request default
encodings are set. This might look like the following:
<globalization
requestEncoding="utf-8"
responseEncoding="utf-8"
/>
The good thing is...
ASP.NET will generate a HTTP header according to these settings, so you
don't need to add one via coding!
You don't have to set an encoding HTTP header at web server level, either! This
is a great advantage ASP.NET has built in.
Try using a HTTP header sniffing utility and you will see that you can switch to
every other encoding by just changing the encoding setting in the globalization
section.
This is of course a better mean than using a META tag <meta http-equiv="Content-Type"
content="text/html; charset=utf-8"> in your HTML code, but you may do this additionally
even when most clients will not take care about it. Anyhow, there are issues with
robots from search engines that indeed seem to obey the setting in the meta tag,
even when the HTTP header tells a different story. That means you have to be consistent,
be aware what your web.config setting is and don't mix encodings! In most cases
it will render perfectly fine, but it's an inconsistency anyway.
Handling of the file format files are physically stored in
You can set another element in the globalization section that might either mess
up your output or deliver it correctly:
fileEncoding="utf-8" [...or onstead of utf-8 something else...]
The element 'fileEncoding' provides information about how ASP.NET expects certain
source files! In particular, it specifies the default encoding for .aspx, .asmx,
and .asax file parsing.
If you set this element your files actually have to be saved according to your setting.
If NOT, you run into misinterpretation of the file content and in particular into
misinterpretation of the language specific characters or e.g. the Trademark, Registered,
Euro-Currency Symbol and many others. Reason: ASP.NET looks up what it has to expect
and parses the file accordingly. If the bytes in your source file are stored with
another encoding, let's say Western Europe Windows Codepage 1252 (which would of
course include your ä,ü, etc. Umlaute) the bytes would be interpreted as utf-8 encoded
and therefore wrongly!
By default VisualStudio.NET 2003 will set the standard encoding according to your
System locale, which is (probably) Western Europe. So don't try to change the globalization
element 'fileEncoding' to something different unless you are 100% sure that all
your source files are stored the same way. To check what Visual Studio 2003 has
set as default encoding you can open any .aspx page and then switch to HTML view.
Now you will see a menu item under the 'File' Menu. It's called 'Advanced Save Options...'
and allows you to force VisualStudio.NET 2003 to save your file according to the
selected encoding format.
In most cases there's no need to change this for a file. Keep in mind: it has to
be consistent with the globalization element fileEncoding! Anyhow there are exceptions:
Files that will be saved in Unicode OR in 'UTF-8 with signature' will be recognized
in any case even when your fileEncoding element is set different! For all other
file formats the globalization element with have priority and if different ...your
page might look messy.
Leave the globalization element 'fileEncoding' alone unless you have reason to change
it and you're absolutely sure that all your pages and web controls are stored accordingly!
If you have used the default setting then you will not run into problems when using
VS.NET 2003 default storing method.
Unicode or UTF-8 Encoding with signature in VisualStudio.net 2003
UTF-8 with
signature means that the utf-8 encoded file gets the so-called BOM (Byte Order Mark)
which indicates what type of decoding had been used. Even if normally not necessary
for utf-8 files it's useful for VS.NET 2003, since it only recognizes this format
(and Unicode) automatically, even if your globalization setting is default or something
different. If you save as a utf-8 without signature ASP.NET will not recognize that
it is an UTF-8 file and tries to decode with the default encoding method or the
one in the fileEncoding element what may result in trash characters. If you save
a file in Unicode format you will have larger files, but ASP.NET will at least recognize
the file format.
Conclusion
FileEncoding and ResponseEncoding are different pairs of shoes!
First thing is existent on server-side and can be of interest when dealing with
files that have been edited and stored in different formats. The other is the way
you propagate your character encoding to the client, the browser. The client will
take this information and decodes the stream of bytes accordingly. Try it with a
page that contains a lot of special characters and languages.
こんにちは (konnichi-wa), Japanese, "Hello; Good afternoon"
® (Registered Sign)
äüöß (German Umlaute)
By the way: if you can't read the Japanese text, you probably lacking the necessary
font support on your system. Fonts must be available on a computer in order to disply
the sometimes 'complicated looking' foreign characters.
On most Windows XP Systems you will have "Japanese" installed by default - many
other languages as well. But e.g. on Windows 2000 Systems not. On a Windows System
you can check your settings in the Control Panel. On a Win2000 System go to "Regional
Settings" and examine the "General Tab". Japanese is probably not set. After setting
it the font support will be installed and you have to reboot. Try again.
Cheers, best regards,
Frank