- From: Charles McCathieNevile <charles@w3.org>
- Date: Wed, 3 Feb 1999 12:32:36 -0500 (EST)
- To: A.Flavell@physics.gla.ac.uk
- cc: WAI Guidelines List <w3c-wai-gl@w3.org>
Alan is right. The technique of guessing a language by charset is a 'Bad Idea' (TM?). So we should use LANG to specify the language. Analagously, the http header 'content-language' defines a language for the whole document, not for bits of it. Where languages are mixed in a document (I haven't seen this in any US-based document. It is much more common in places like Europe, Australia, Asia, and even Canada Charles On Tue, 2 Feb 1999, Alan J. Flavell wrote: On Tue, 2 Feb 1999, Charles McCathieNevile wrote: > OK, but this requires that the charset information is correct. In theory, in HTML the charset and the language are two entirely independent issues. "charset" is a technical matter that relates only to the encoding of coded characters. There are three valid ways of including characters into an HTML document: coded characters, "numerical characer references" (&#number; representation), and named character entities where available. Only one of these three representations is affected by the "charset": the others could in theory (and in practice too, if Netscape had been conformant to publised specifications) utilise an extensive repertoire of characters in a document whose "charset" was us-ascii, or whatever other charset was convenient to the author, just as it works in conforming browsers. It would be feasible to transmit, for example, Japanese using solely &#number; representations of the Japanese characters, without any mention of an unusual "charset" in the Content-type header. While I'm not suggesting that this possibility would be attractive to a native Japanese author, it might very well be selected by a non-Japanese author as a more resiliently portable representation when they wished to include some Japanese content into an otherwise Roman-alphabet document. I'm sorry if this seems pedantic, but there has been far too much confusion in the past when people have muddled up these issues; it would seem a pity to set off down that road again, in spite of the plausible heuristic reasons for wanting to do so. (And then there's the question of what you would do with a document that contained English text written in Japanese characters, or vice versa.) best regards --Charles McCathieNevile mailto:charles@w3.org phone: +1 617 258 0992 http://purl.oclc.org/net/charles W3C Web Accessibility Initiative http://www.w3.org/WAI MIT/LCS - 545 Technology sq., Cambridge MA, 02139, USA
Received on Wednesday, 3 February 1999 12:32:51 UTC