Re: 9 July 2003 draft of "Client handling of MIME headers" available from Roy T. Fielding on 2003-07-09 (www-tag@w3.org from July 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Wed, 9 Jul 2003 22:03:54 +0200
To: "Ian B. Jacobs" <ij@w3.org>
Cc: www-tag@w3.org
Message-Id: <77B0B452-B248-11D7-921F-000393753936@apache.org>
Sorry for the late comments, but the following is incorrect:

    For this reason, servers should only supply a character encoding
    header when there is complete certainty as to the encoding in use.

First, character encoding is a parameter of the media type value that
is provided by the MIME and HTTP Content-Type header field.  There
is no character encoding header.  Second, the charset parameter is
usually supplied by the server if the security checks it places on
the content are dependent upon the configured character encoding
for that content.  This strategy exists because of security flaws in
deployed browsers that allow auto-selection of character encoding
to change the interpretation of certain fields from raw data to
executable content.  So, contrary to this finding, such servers must
provide a default charset parameter to work around security flaws
and, in particular, the boneheaded way that browsers try to
autoselect character encoding, which is not recommended by HTTP.

    Otherwise, an error will cause a perfectly usable representation
    to be rejected by an architecturally sound client.  Section 7.1
    of [RFC3023] states:

It isn't a perfectly usable representation.  It is a configuration
mismatch between the intended charset and the actual charset.  This
is no different from the error regarding mislabeling the media type --
the correct action is for the client to refuse to render the content
unless the workaround is approved by the user.  Otherwise, the content
will remain mislabeled.

      The use of the charset parameter is STRONGLY RECOMMENDED,
      since this information can be used by XML processors to
      determine authoritatively the charset of the XML MIME entity.

    However, a receiving application can, with very high reliability,
    determine the character encoding of an XML document by reading it

Sorry, that is completely false.  Folks should read the number of
security vulnerabilities caused by such thinking before declaring
that it is the case.  The purpose of the charset parameter is to
reduce the complexity of implementations so that they don't need
to read the content character-by-character to determine the
character encoding.  The only time it should not be provided is
when the content contains multiple character encodings, and even
then there should be a standard way of indicating that as part of
the media type value.

BTW, on a related point, I will note that the W3C working groups
responsible for all of the exceptions requested on this point have
still failed to register their media types with IANA.  I just spent
an hour digging though the W3C site to pick up some of these types
for the Apache configuration file, since I am tired of waiting for
the appropriate authors.  People claiming that the registration
process is slow should be ashamed of themseleves -- there are dozens
of new types since the last update with far less applicability and
deployment.  The only organization that seems incapable of
registering deployed types is the W3C.  Whatever the problem is,
it sure as heck isn't the IANA process.

Finally, referring to representation metadata as "MIME headers"
is only applicable to e-mail.  They are called different things
in HTTP and NNTP, even though they share the same field names.
In general, they should be referred to as metadata that defines
how the content is to be interpreted by the recipient.  Where
appropriate, the specific field name "Content-Type" should be
described in quotes, and its value should always be described as
an Internet media type (MIME type is a term that was deprecated
eight years ago).

....Roy
Received on Wednesday, 9 July 2003 16:03:54 UTC