- From: Chris Lilley <chris@w3.org>
- Date: Mon, 1 Nov 2004 22:02:40 +0100
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: www-svg@w3.org, ietf-types@iana.org
On Monday, November 1, 2004, 9:46:15 PM, Boris wrote: BZ> Chris Lilley wrote: >> On the contrary! The +xml convention clearly indicates, for an unknown >> media type, that it is xml; thus, that an XML processor should be used; >> which will correctly determine the encoding from the xml encoding >> declaration or lack therof. BZ> I think the concern was about what happens when someone sends the BZ> following HTTP header: BZ> Content-Type: image/svg+xml; charset=iso-8859-1 BZ> combined with an XML document that has no encoding declaration (so BZ> defaulting to UTF-8). That is (for a random +xml media type) currently allowed. It is, as you say, a problem. (It defaults to UTF-8 or UTF-16 depending on the presence of absence of a BOM and, if present, what bytes represent it). BZ> Now per the type registration for "image/svg+xml", the above BZ> Content-Type header is invalid, right? Yes. Instead of an optional parameter which should not be used and if used, causes problems, the proposal is to not have the parameter. BZ> So what's a UA to do? What encoding to use? Under which rules? Currently, that is a malformed document that has been temporarily made well formed while in transit. If saved, it needs to be rewritten (some implementations do this, most do not). Note that if the document used DSig, that would actually break it. BZ> Using UTF-8 means hardcoding knowledge about the fact BZ> that image/svg+xml, unlike most other character-based types used today, BZ> doesn't have a charset parameter. No, it doesn't. This is not specific to SVG, it could (and should) be adopted by any non-text +xml registration. >> No, they would not. RFC 3023 already allows the charset to be omitted, >> and gives rules to follow for this case. SVG follows those rules, as the >> registration document makes plain. BZ> The problems arise when there IS a charset parameter. Exactly. The code path for when there isn't one is well implemented and interoperable, today. BZ> I don't think BZ> anyone ever claimed there is a problem when the charset parameter is BZ> omitted. Correct. There is no problem when its omitted, for SVG or for anything else. >> In general, a representation provider SHOULD NOT specify the >> character encoding for XML data in protocol headers since the data is >> self-describing BZ> Given that this is a not a MUST NOT, Its a should not, because for text/* you have to unless your data is guaranteed to always be US-ASCII (and even then, it is required to fall back to text/plain; charset=us-ascii) and because it was not desired to force a change on legacy formats, just to stop the problem spreading to new formats. BZ> people will continue to do this in BZ> some cases (particularly as some web servers automatically tack on a BZ> "charset" parameter to Content-Type headers). Which leads us to Server software designers SHOULD NOT specify a default Internet media type in the default configuration shipped with the server. http://www.w3.org/2001/tag/doc/mime-respect.html#self-describing Some web servers do that, agreed. Including the W3C one. Its wrong, and it causes pain. Some of that is because of the requirements of the text/* media type tree. For XML, that is being dealt with in the RFC 3023 revision by deprecating text/xml. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Monday, 1 November 2004 21:02:40 UTC