Comments on mime-respect from Martin Duerst on 2003-09-19 (www-tag@w3.org from September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 19 Sep 2003 16:16:45 -0400
To: WWW-Tag <www-tag@w3.org>
Cc: ietf-xml-mime@imc.org
Message-Id: <4.2.0.58.J.20030919141133.05106ee8@localhost>
These are my comments on http://www.w3.org/2001/tag/doc/mime-respect.html,
various issues mixed a bit, sorry.
[I have cross-posted ietf-xml-mime@imc.org because some of them are relevant
to the recent discussion about the charset paramenter on Content-Type.]

- Headings: Is this a completed finding, or a draft finding?

- "HTTP/1.1 a a response": word duplication

- Overall, it seems difficult to identify what is general architecture,
   and what is the way it is just because it is the way it (mostly) is.

- My understanding is that one origin of the 'charset' parameter was
   that it was useful to invoke different applications for different
   values. That was definitely the case 10 years or so ago when MIME
   was designed. I remember reading my email that way. This has gone away.
   It may happen that in a somewhat similar way, a lot of what we now
   see as different XML types, in need of different applications, may
   go away in a few years.

- Section 4: "The Unicode encoding of a message body (XML document) is
   inconsistent with the value of the charset parameter in the message
   headers."
   - Please replace 'Unicode encoding' with 'character encoding'.
     It would be strange to e.g. call iso-8859-1 an 'Unicode encoding'.
   - Please remove, or reword "XML document", to not give the impression
     that message bodies are always XML documents.
   - I'm not clear why this is in section 4, entitled "Why user agent
     behavior that misrepresents the user is harmful". This is a
     server problem, the user is not in any way misrepresented.

- The big problem with wrong encoding information for XML and other
   documents is not in a server-user context (where the user has
   to be able to read the document, such problems are usually
   discovered very quickly), but with XML sent between machines.
   This probably should be noted.

- The structure of sections 3 and 4 should be improved. It is good
   style to have an introductory paragraph or two before subsection.
   It is confusing to have a few paragraphs in the first subsection
   of the section after a lot of text that is not in subsections.

- "For this reason, servers should only supply a character encoding
    header when there is complete certainty as to the encoding in use.
    Otherwise, an error will cause a perfectly usable representation
    to be rejected by an architecturally sound client."

    Why doesn't the document say e.g. that a mime type should only be
    supplied when there is complete certainty that this type is
    appropriate? Why does this text assume that the XML is 'perfectly
    usable'? It might not be valid, it might be the wrong mime type,
    or it might not have the right 'encoding' attribute.

- "Servers which generate representations MUST NOT generate the charset
    parameter unless there is certainty that the headers are correct.
    When correct, this information can be used by non-XML processors
    to determine authoritatively the character encoding of the XML MIME
    entity."

    How is a server ever going to know, or going to be able to check,
    what the right character encoding is? Making this a requirement
    on the server itself seems inadequate.

- Section 5: "For instance, the http-equiv attribute of the HTML meta
   element is intended for servers (not clients)."
   Please change 'is' to 'was'. In particular with respect to character
   encoding, current practice is that it's used on the client. If you
   think that this should change, you should say so.

- SMIL 2.0 is "outmoded": I would prefer a different word here.
   I strongly agree that what SMIL 2.0 is saying on content types
   is a very bad idea, and I have said so to the SMIL WG (and more
   recently the Voice browser WG, I think). But given the 2001
   date, I don't think 'outmoded' is the right word, because it was
   never in fashion in the first place.

- Section 6: There is advice to server managers and authors. But
   I think we need to go one more step back, to server implementers
   and the default settings when servers are shipped.
   For example, some servers have an easy way to explore configurations
   and check settings. Others don't. Some servers come with default
   configurations that may be suboptimal. For example (not picking on
   it, just because that's the one I know), Apache at
   http://httpd.apache.org/docs-2.0/en/mod/core.html#adddefaultcharset
   says: "AddDefaultCharset On enables Apache's internal default charset
   of iso-8859-1 as required by the directive."
   Also, the default configuration file contains this:
    #
    # Specify a default charset for all pages sent out. This is
    # always a good idea and opens the door for future internationalisation
    # of your web site, should you ever want it. Specifying it as
    # a default does little harm; as the standard dictates that a page
    # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
    # are merely stating the obvious. There are also some security
    # reasons in browsers, related to javascript and URL parsing
    # which encourage you to always set a default char set.
    #
    AddDefaultCharset ISO-8859-1

   This seems to be 180 degrees opposite to what the TAG is saying.
   It is more about text/html,... than about application/...+xml, but
   there is considerable potential for harm here, too, in particular
   when combined with the default setting that Apache comes with that
   does not allow people managing a directory to override file info.


Regards,     Martin.
Received on Friday, 19 September 2003 16:29:59 UTC