- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 19 Sep 2003 16:16:45 -0400
- To: WWW-Tag <www-tag@w3.org>
- Cc: ietf-xml-mime@imc.org
These are my comments on http://www.w3.org/2001/tag/doc/mime-respect.html, various issues mixed a bit, sorry. [I have cross-posted ietf-xml-mime@imc.org because some of them are relevant to the recent discussion about the charset paramenter on Content-Type.] - Headings: Is this a completed finding, or a draft finding? - "HTTP/1.1 a a response": word duplication - Overall, it seems difficult to identify what is general architecture, and what is the way it is just because it is the way it (mostly) is. - My understanding is that one origin of the 'charset' parameter was that it was useful to invoke different applications for different values. That was definitely the case 10 years or so ago when MIME was designed. I remember reading my email that way. This has gone away. It may happen that in a somewhat similar way, a lot of what we now see as different XML types, in need of different applications, may go away in a few years. - Section 4: "The Unicode encoding of a message body (XML document) is inconsistent with the value of the charset parameter in the message headers." - Please replace 'Unicode encoding' with 'character encoding'. It would be strange to e.g. call iso-8859-1 an 'Unicode encoding'. - Please remove, or reword "XML document", to not give the impression that message bodies are always XML documents. - I'm not clear why this is in section 4, entitled "Why user agent behavior that misrepresents the user is harmful". This is a server problem, the user is not in any way misrepresented. - The big problem with wrong encoding information for XML and other documents is not in a server-user context (where the user has to be able to read the document, such problems are usually discovered very quickly), but with XML sent between machines. This probably should be noted. - The structure of sections 3 and 4 should be improved. It is good style to have an introductory paragraph or two before subsection. It is confusing to have a few paragraphs in the first subsection of the section after a lot of text that is not in subsections. - "For this reason, servers should only supply a character encoding header when there is complete certainty as to the encoding in use. Otherwise, an error will cause a perfectly usable representation to be rejected by an architecturally sound client." Why doesn't the document say e.g. that a mime type should only be supplied when there is complete certainty that this type is appropriate? Why does this text assume that the XML is 'perfectly usable'? It might not be valid, it might be the wrong mime type, or it might not have the right 'encoding' attribute. - "Servers which generate representations MUST NOT generate the charset parameter unless there is certainty that the headers are correct. When correct, this information can be used by non-XML processors to determine authoritatively the character encoding of the XML MIME entity." How is a server ever going to know, or going to be able to check, what the right character encoding is? Making this a requirement on the server itself seems inadequate. - Section 5: "For instance, the http-equiv attribute of the HTML meta element is intended for servers (not clients)." Please change 'is' to 'was'. In particular with respect to character encoding, current practice is that it's used on the client. If you think that this should change, you should say so. - SMIL 2.0 is "outmoded": I would prefer a different word here. I strongly agree that what SMIL 2.0 is saying on content types is a very bad idea, and I have said so to the SMIL WG (and more recently the Voice browser WG, I think). But given the 2001 date, I don't think 'outmoded' is the right word, because it was never in fashion in the first place. - Section 6: There is advice to server managers and authors. But I think we need to go one more step back, to server implementers and the default settings when servers are shipped. For example, some servers have an easy way to explore configurations and check settings. Others don't. Some servers come with default configurations that may be suboptimal. For example (not picking on it, just because that's the one I know), Apache at http://httpd.apache.org/docs-2.0/en/mod/core.html#adddefaultcharset says: "AddDefaultCharset On enables Apache's internal default charset of iso-8859-1 as required by the directive." Also, the default configuration file contains this: # # Specify a default charset for all pages sent out. This is # always a good idea and opens the door for future internationalisation # of your web site, should you ever want it. Specifying it as # a default does little harm; as the standard dictates that a page # is in iso-8859-1 (latin1) unless specified otherwise i.e. you # are merely stating the obvious. There are also some security # reasons in browsers, related to javascript and URL parsing # which encourage you to always set a default char set. # AddDefaultCharset ISO-8859-1 This seems to be 180 degrees opposite to what the TAG is saying. It is more about text/html,... than about application/...+xml, but there is considerable potential for harm here, too, in particular when combined with the default setting that Apache comes with that does not allow people managing a directory to override file info. Regards, Martin.
Received on Friday, 19 September 2003 16:29:59 UTC