- From: Alan J. Flavell <flavell@a5.ph.gla.ac.uk>
- Date: Tue, 2 Feb 1999 21:57:02 +0000 (GMT)
- To: Charles McCathieNevile <charles@w3.org>
- cc: WAI Guidelines List <w3c-wai-gl@w3.org>
On Tue, 2 Feb 1999, Charles McCathieNevile wrote: > OK, but this requires that the charset information is correct. In theory, in HTML the charset and the language are two entirely independent issues. "charset" is a technical matter that relates only to the encoding of coded characters. There are three valid ways of including characters into an HTML document: coded characters, "numerical characer references" (&#number; representation), and named character entities where available. Only one of these three representations is affected by the "charset": the others could in theory (and in practice too, if Netscape had been conformant to publised specifications) utilise an extensive repertoire of characters in a document whose "charset" was us-ascii, or whatever other charset was convenient to the author, just as it works in conforming browsers. It would be feasible to transmit, for example, Japanese using solely &#number; representations of the Japanese characters, without any mention of an unusual "charset" in the Content-type header. While I'm not suggesting that this possibility would be attractive to a native Japanese author, it might very well be selected by a non-Japanese author as a more resiliently portable representation when they wished to include some Japanese content into an otherwise Roman-alphabet document. I'm sorry if this seems pedantic, but there has been far too much confusion in the past when people have muddled up these issues; it would seem a pity to set off down that road again, in spite of the plausible heuristic reasons for wanting to do so. (And then there's the question of what you would do with a document that contained English text written in Japanese characters, or vice versa.) best regards
Received on Tuesday, 2 February 1999 16:57:08 UTC