- From: Chris Lilley <chris@w3.org>
- Date: Thu, 7 Oct 2004 16:44:24 +0200
- To: MURATA Makoto <EB2M-MRT@asahi-net.or.jp>(FAMILY Given), Dan Kohn <dan@dankohn.com>
- Cc: MURATA Makoto <eb2m-mrt@asahi-net.or.jp>, www-tag@w3.org
Hello all, In the approved TAG finding Internet Media Type registration, consistency of use TAG Finding 3 June 2002 (Revised 4 September 2002) a specific criticism of RFC 3023 is raised 3. Consistency in Communicating Character Encoding http://www.w3.org/2001/tag/2002/0129-mime#char-encoding and the conclusion is >> Thus there is no ambiguity when the charset is omitted, and the >> STRONGLY RECOMMENDED injunction to use the charset is misplaced for >> application/xml and for non-text "+xml" types. Consequently, for XML >> representations, server-side applications SHOULD only supply a >> charset header when there is complete certainty as to the encoding in >> use. Otherwise, an error will cause a perfectly usable representation >> to be rejected by an architecturally sound client. >> We recommend that section 7.1 of [RFC3023] be amended to something >> like the following: >> The use of the charset parameter, when the charset is reliably known >> and agrees with the encoding declaration, is RECOMMENDED, since this >> information can be used by non-XML processors to determine >> authoritatively the charset of the XML MIME entity. This is further backed up by another approved TAG finding Authoritative Metadata TAG Finding 25 February 2004 4.2 Self-describing data and Risk of Inconsistency http://www.w3.org/2001/tag/doc/mime-respect.html#self-describing >> Representation providers SHOULD NOT in general specify the character >> encoding for XML data in protocol headers since the data is >> self-describing. However, the registration for application/xml still says > Although listed as an optional parameter, the use of the charset > parameter is STRONGLY RECOMMENDED, since this information can be used > by XML processors to determine authoritatively the charset of the XML > MIME entity. The charset parameter can also be used to provide > protocol-specific operations, such as charset-based content > negotiation in HTTP. Since RFC 3023 was published, it has become clear that the +xml convention has taken off. One consequence is that a transcoding proxy can reliably distinguish xml from non-xml media types, when meeting an unknown media type. Thus, it can know to either a) leave it alone, or b) transcode to another charset, at the same time fixing up the XML encoding declaration in the same way that it knows to not transcode, say, an image/gif from Latin-1 to Shift-JIS. Thus the generality argument (we want all encoding handled in the same way) can be applied to all the +xml types. Coupled with the deprecation of the text/xml and text/xml-external-parsed-entity types (and thus insulation from the particular encoding testrictions of text/*) we are now, in this revision of the document, in a position to be a little stronger: The encoding declaration in an XML document and the charset (if provided) MUST be consistent. This removes the requirement on all XML tools from wget on up, to rewrite XML instances when saving to a local filestore, so that they are well formed. Instead, no rewriting is required. In consequence, the wording on the optional charset parameter should be changed from STRONGLY RECOMMENDED. The main value of a charset parameter is as a duplicate copy of the encoding in use; for use by non-XML processors (full text search engines? content management systems?) and for use in content negotiation. Thus, I would like to see language in the specification that removes the idea of charset as an overide to the XML encoding declaration, and instead talks of charset as an optional parameter that may have certain uses and if provided MUST be consistent with the encoding declared by the instance (BOM, encoding declaration, or absence therof) I am of course happy to propose specific text, but wanted reactions to this first, to ensure all the editors are in agreement as to how to proceed. Due to the interplay between the draft and the two TAG findings, I have copied this to www-tag. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Thursday, 7 October 2004 14:44:24 UTC