XHTML1: Clarification of Appendix C.9

Dear HTML Working Group,

  Appendix C.9 of the XHTML 1.0 Second Edition Recommendation states:

[...]
  If this is not possible, a document that wants to set its character
  encoding explicitly must include both the XML declaration an encoding
  declaration and a meta http-equiv statement (e.g., <meta
  http-equiv="Content-type" content="text/html; charset=EUC-JP" />). In
  XHTML-conforming user agents, the value of the encoding declaration of
  the XML declaration takes precedence.
[...]

The meaning of the last sentence is not clear to me. You have, as far as
I can see, refrained to make any normative statements on how user agents
are expected to process XHTML documents delivered as text/html, yet the
above statement suggests such behavioral expectations. Could you please
clarify to what situation the above informative statement applies? This
further seems to be erroneous, as far as I can tell, XHTML user agents
are expected to ignore such meta http-equiv statements entirely, so,
what information has less precedence than the encoding declaration? The
only information I can reasonably think of would be XML's defaulting
rules. But referring to precedence rules would then be rather confusing.
Also, there are a number of other things that take precedence over the
encoding declaration, like the byte order mark, mime type defaulting
rules or general higher level encoding information. So, what does this
mean exactly?

The section further states

[...]
  Note: be aware that if a document must include the character encoding
  declaration in a meta http-equiv statement, that document may always
  be interpreted by HTTP servers and/or user agents as being of the
  internet media type defined in that statement.
[...]

I do not quite understand where HTML 4.01 allows HTML 4.01 user agents
to do such a thing. My understanding is that in order to apply such
semantics to such an element, the user agent must already have chosen
to consider the document HTML or XHTML, e.g. because the HTTP server
responded with a corresponding Content-Type header which is as far as
I understand authoritative metadata which clients must not ignore, at
least not without the consent of the user. So do you mean here that
XHTML user agents may consider an XHTML document delivered with an
XHTML media type containing e.g.

  <meta http-equiv="Content-type" content="text/html" />

text/html and thus e.g. show the > as textual content as required by
the HTML 4.01 Recommendation? That seems like a very bad idea, but I
am also not entirely certain where you state this (or prohibe this).

The section continues:

[...]
  If a document is to be served as multiple media types, the HTTP
  server must be used to set the encoding of the document.
[...]

I am not sure what you mean here exactly. It seems that you mean that
if a document is delivered e.g. as outlined in

  http://www.w3.org/2003/01/xhtml-mimetype/content-negotiation

that regardless of the MIME Type, the Content-Type header must have
a charset parameter. That however makes not all that much sense, could
you please clarify what you had in mind here exactly?

regards.

Received on Monday, 12 July 2004 00:15:00 UTC