RE: Lang attribute not P1 ?

On Tue, 2 Feb 1999, Charles McCathieNevile wrote:

> OK, but this requires that the charset information is correct. 

In theory, in HTML the charset and the language are two entirely
independent issues.

"charset" is a technical matter that relates only to the encoding of
coded characters.  There are three valid ways of including characters
into an HTML document: coded characters, "numerical characer
references" (&#number; representation), and named character entities
where available. Only one of these three representations is affected
by the "charset": the others could in theory (and in practice too, if
Netscape had been conformant to publised specifications) utilise an
extensive repertoire of characters in a document whose "charset" was
us-ascii, or whatever other charset was convenient to the author, just
as it works in conforming browsers.

It would be feasible to transmit, for example, Japanese using solely
&#number; representations of the Japanese characters, without any
mention of an unusual "charset" in the Content-type header.  While I'm
not suggesting that this possibility would be attractive to a native
Japanese author, it might very well be selected by a non-Japanese
author as a more resiliently portable representation when they wished
to include some Japanese content into an otherwise Roman-alphabet
document.

I'm sorry if this seems pedantic, but there has been far too much
confusion in the past when people have muddled up these issues;
it would seem a pity to set off down that road again, in spite of
the plausible heuristic reasons for wanting to do so.

(And then there's the question of what you would do with a document
that contained English text written in Japanese characters, or vice
versa.)

best regards

Received on Tuesday, 2 February 1999 16:57:08 UTC