- From: Richard L. Goerwitz <goer@mithra-orinst.uchicago.edu>
- Date: Thu, 2 Feb 1995 20:19:10 +0100
- To: Multiple recipients of list <www-html@www0.cern.ch>
>One would have expected that HTML, from its very early days, would >have provided the construct > <CHARSET="XXX"> ...any_8_bit_characters... </CHARSET> >where XXX could be Latin-2, Latin-3, Latin-4,... SGML has no mechanism for doing this, so the word I keep hearing is that we should strangle HTML with the same restrictions. The fallacy I often hear uttered is that if we can stuff Unicode into the MIME header as the charset, then we can avoid the problem of having to define a CHARSET tag (since Unicode encompasses most national char- acters). But this way of thinking is WRONG. Unicode doesn't provide a mechanism for varying sort order and other things that vary accord- ing to locale and language. To do this, THE UNICODE STANDARD ITSELF SAYS THAT ADDITIONAL TAGS ARE NECESSARY for this sort of thing. So although offering Unicode or UTF-8 as a default charset is a good idea, it does not do away with the need for LANG and CHARSET tags. Just to do away with one other fallacy: You can't have just LANG or CHARSET tags. You need both. You can have two different charsets for a single document (e.g., Shift-JIS and ISO 8859-1), and you can have two different languages within the same charset (e.g. English and Ger- man for ISO 8859-1; Urdu, Persian, and Arabic for Unicode - they all use the same Unicode pages). It may not make sense for all clients to allow all possible combina- tions, but this is something they can negotiate with servers. It is not a reason to cripple HTML. If I'm misunderstanding the Unicode standards, HTML, or SGML, someone please let me know. I'm doing my best to keep up :-). Richard Goerwitz goer@midway.uchicago.edu
Received on Thursday, 2 February 1995 11:25:46 UTC