RE: Lang attribute not P1 ? from Charles McCathieNevile on 1999-02-03 (w3c-wai-gl@w3.org from January to March 1999)

From: Charles McCathieNevile <charles@w3.org>
Date: Wed, 3 Feb 1999 12:32:36 -0500 (EST)
To: A.Flavell@physics.gla.ac.uk
cc: WAI Guidelines List <w3c-wai-gl@w3.org>
Message-ID: <Pine.LNX.4.04.9902031229120.22013-100000@tux.w3.org>

Alan is right. The technique of guessing a language by charset is a 'Bad
Idea' (TM?). So we should use LANG to specify the language. Analagously,
the http header 'content-language' defines a language for the whole
document, not for bits of it. Where languages are mixed in a document (I
haven't seen this in any US-based document. It is much more common in
places like Europe, Australia, Asia, and even Canada

Charles

On Tue, 2 Feb 1999, Alan J. Flavell wrote:

  On Tue, 2 Feb 1999, Charles McCathieNevile wrote:
  
  > OK, but this requires that the charset information is correct. 
  
  In theory, in HTML the charset and the language are two entirely
  independent issues.
  
  "charset" is a technical matter that relates only to the encoding of
  coded characters.  There are three valid ways of including characters
  into an HTML document: coded characters, "numerical characer
  references" (&#number; representation), and named character entities
  where available. Only one of these three representations is affected
  by the "charset": the others could in theory (and in practice too, if
  Netscape had been conformant to publised specifications) utilise an
  extensive repertoire of characters in a document whose "charset" was
  us-ascii, or whatever other charset was convenient to the author, just
  as it works in conforming browsers.
  
  It would be feasible to transmit, for example, Japanese using solely
  &#number; representations of the Japanese characters, without any
  mention of an unusual "charset" in the Content-type header.  While I'm
  not suggesting that this possibility would be attractive to a native
  Japanese author, it might very well be selected by a non-Japanese
  author as a more resiliently portable representation when they wished
  to include some Japanese content into an otherwise Roman-alphabet
  document.
  
  I'm sorry if this seems pedantic, but there has been far too much
  confusion in the past when people have muddled up these issues;
  it would seem a pity to set off down that road again, in spite of
  the plausible heuristic reasons for wanting to do so.
  
  (And then there's the question of what you would do with a document
  that contained English text written in Japanese characters, or vice
  versa.)
  
  best regards
  

--Charles McCathieNevile            mailto:charles@w3.org
phone: +1 617 258 0992   http://purl.oclc.org/net/charles
W3C Web Accessibility Initiative    http://www.w3.org/WAI
MIT/LCS  -  545 Technology sq., Cambridge MA, 02139,  USA

Received on Wednesday, 3 February 1999 12:32:51 UTC