Re: charset & language priorities

"Kristi Schultz" <kristis@us.ibm.com> wrote:

> In the HTML spec/recommendations, a different priority order is given for
> language encodings than character sets for user agent
> interpretation....  I'd like to understand why - in particular, I'd like to
> understand the reasoning behind the priority for charset since
> it is (I think) counter-intuitive...  Especially given the priority for
> language encoding.

Those are quite different.  Character encoding must be known prior to
parsing the document, and it must be determined once - it cannot be
changed in the middle of a document.

There is an architectural principle in HTTP/MIME that the information
provided by the server/sender (Content-Type, Content-Encoding, charset
etc.) MUST be considered authoritative and recipients MUST NOT override
those information by sniffing the content.

On the other hand, a document may specify different languages for
different parts of a document, thus the innermost language information
takes precedence for that particular sub-tree of a document, but that
doesn't override the language information provided by the parent element
or the HTTP "Content-Language" header for other parts of a document.

> It seems to me that the document writer has more knowledge on the actual
> content than the server does, so is more likely to be
> accurate - hence should be given higher priority....

Since document writers will have knowledge on the actual content, they
should tell the server how to send that resource over the network so
that recipients can process that resource reliably.

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Monday, 29 July 2002 03:21:23 UTC