- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Tue, 17 Dec 1996 11:09:40 +0100 (MET)
- To: Klaus Weide <kweide@tezcat.com>
- Cc: Larry Masinter <masinter@parc.xerox.com>, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
On Mon, 16 Dec 1996, Klaus Weide wrote: > Currently accept-charset is de facto used as an expression of _two_ > capabilities: (1) to decode a character encoding, (2) to be able to > display (or take responsibility for) a certain character repertoire. This is just because current clients don't actually do much of a decoding, they just switch to a font they have available. > The second aspect is not new, is has been around as long as MIME had > a charset parameter. It will be lost when/if everything moves to UTF-8 > (and starts using accept-charset: utf-8 / charset=utf-8). > > Example of a site where documents are provided in several charsets > (all for the same language): > see <URL: http://www.fee.vutbr.cz/htbin/codepage>. The list is impressive. It becomes less impressive if you realize that all (as far as I have checked) the English pages and some of the Check pages (MS Cyrillic/MS Greek/MS Hebrew,...) are just plain ASCII, and don't need a separate URL nor should be labeled as such in the HTTP header. They could add a long list of other encodings, and duplicate their documents, to e.g. serve them in Shift-JIS or so :-). Only the US-ASCII version contains the comment "bez diakritiky" (without diactritics), but that applies to many more pages. > It is certainly much easier to make a Web clients able to decode UTF-8 > to locally available character sets, than to upgrade all client > machines so that they have fonts available to display all of the 10646 > characters. The big problem is not fonts. A single font covering all current ISO 10646 characters can easily be bundeled with a browser. The main problem is display logic, for Arabic and Indic languages in particular. >I assume the former will be done much sooner, and that > use of UTF-8 should be encouraged before all the fonts (or knowledge > to choose culturally correct replacement representations) are > available to everyone. Definitely UTF-8 should be encouraged. But that's not done by introducing new protocol complications and requiring the servers to deal with unpredictable transliteration issues that can be dealt with more easily on the client side. Regards, Martin.
Received on Tuesday, 17 December 1996 02:15:32 UTC