Re: SVG12: charset parameter for image/svg+xml

On Monday, November 1, 2004, 9:46:15 PM, Boris wrote:

BZ> Chris Lilley wrote:
>> On the contrary! The +xml convention clearly indicates, for an unknown
>> media type, that it is xml; thus, that an XML processor should be used;
>> which will correctly determine the encoding from the xml encoding
>> declaration or lack therof.

BZ> I think the concern was about what happens when someone sends the 
BZ> following HTTP header:

BZ>    Content-Type:  image/svg+xml; charset=iso-8859-1

BZ> combined with an XML document that has no encoding declaration (so 
BZ> defaulting to UTF-8).

That is (for a random +xml media type) currently allowed. It is, as you
say, a problem. (It defaults to UTF-8 or UTF-16 depending on the
presence of absence of a BOM and, if present, what bytes represent it).

BZ> Now per the type registration for "image/svg+xml", the above 
BZ> Content-Type header is invalid, right?

Yes. Instead of an optional parameter which should not be used and if
used, causes problems, the proposal is to not have the parameter.

BZ>  So what's a UA to do?  What encoding to use?

Under which rules? Currently, that is a malformed document that has been
temporarily made well formed while in transit. If saved, it needs to be
rewritten (some implementations do this, most do not). Note that if the
document used DSig, that would actually break it.

BZ>  Using UTF-8 means hardcoding knowledge about the fact
BZ> that image/svg+xml, unlike most other character-based types used today,
BZ> doesn't have a charset parameter.

No, it doesn't. This is not specific to SVG, it could (and should) be
adopted by any non-text +xml registration.

>> No, they would not. RFC 3023 already allows the charset to be omitted,
>> and gives rules to follow for this case. SVG follows those rules, as the
>> registration document makes plain.

BZ> The problems arise when there IS a charset parameter.

Exactly. The code path for when there isn't one is well implemented and
interoperable, today.

BZ>   I don't think
BZ> anyone ever claimed there is a problem when the charset parameter is
BZ> omitted.

Correct. There is no problem when its omitted, for SVG or for anything
else.

>>    In general, a representation provider SHOULD NOT specify the
>>    character encoding for XML data in protocol headers since the data is
>>    self-describing

BZ> Given that this is a not a MUST NOT,

Its a should not, because for text/* you have to unless your data is
guaranteed to always be US-ASCII (and even then, it is required to fall
back to text/plain; charset=us-ascii) and because it was not desired to
force a change on legacy formats, just to stop the problem spreading to
new formats.

BZ> people will continue to do this in
BZ> some cases (particularly as some web servers automatically tack on a
BZ> "charset" parameter to Content-Type headers).

Which leads us to

  Server software designers SHOULD NOT specify a default Internet media
  type in the default configuration shipped with the server.
  http://www.w3.org/2001/tag/doc/mime-respect.html#self-describing


Some web servers do that, agreed. Including the W3C one. Its wrong, and
it causes pain.

Some of that is because of the requirements of the text/* media type
tree. For XML, that is being dealt with in the RFC 3023 revision by
deprecating text/xml.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Monday, 1 November 2004 21:02:40 UTC