Recent spec change to XMLHttpRequest default Content-Type from Carsten Orthbandt on 2007-06-15 (public-webapi@w3.org from June 2007)

From: Carsten Orthbandt <carsten@pixeltamer.net>
Date: Fri, 15 Jun 2007 07:58:30 +0200
To: public-webapi@w3.org
Message-ID: <46722A86.4090705@pixeltamer.net>

Hi!

I specifically submitted to this list to discuss this issue. I'm by no
means a web standards expert so if I'm simply reading the specs wrong
please let me know.

Executive summary:

The current specs clearly say that XMLHttpRequest responses without a
Content-Type header are to be treated as text/plain, not XML. There's
a change underway that says that such content should be treated as XML.
I think this is not a good idea.

Detail:

In the discussion of bug 384298 in the upcoming FireFox3
(https://bugzilla.mozilla.org/show_bug.cgi?id=384298)
Anne van Kesteren pointed out a very recent spec change in the way
XMLHttpRequest is supposed to interpret responses that do not specify
a Content-Type header.

The currently published version at http://www.w3.org/TR/XMLHttpRequest/
says in section 2.1 responseXML that

- that the response is only to be treated as XML if Content-Type is
"either text/xml, application/xml, or ends in +xml"
- "If Content-Type did not contain such a media type, or if the document
could not be parsed (due to an XML namespace well-formedness error or
unsupported character encoding, for instance), it must be null"

The upcoming version at
http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?content-type=text/html;%20charset=utf-8#dfn-responsexml
changed this to:

"If there is no Content-Type header or there is a Content-Type header
which contains a MIME type that is text/xml, application/xml or ends in
+xml (ignoring any parameters) use the rules set forth in the XML specification
to determine the character encoding. Let charset be the determined character
encoding."

and

"If a Content-Type is present and it does not contain a MIME type (ignoring
any parameters) that is text/xml, application/xml or ends in +xml terminate
these steps and return null. (Do not terminate these steps if there is no
Content-Type header at all.)"

Why do I think this is bad?

First, RFC 2616, Section 7.2.1 (I haven't checked any CVS if there's a change
to that underway) says:

"If and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its content
and/or the name extension(s) of the URI used to identify the resource. If the
media type remains unknown, the recipient SHOULD treat it as type
'application/octet-stream'."

The introduction of the XMLHttpRequest specifically mentions that it is badly
named and NOT an XML-only API.

So forcing XMLHttpRequest to default to XML is in conflict with both the
underlying HTTP protocol and the overall scope of the XMLHttpRequest API.

Why is this a problem at all?

The specific problem here is that GranParadiso (FF3) in its current version
logs XML parsing errors to the console when trying to parse non-XML responses
without a Content-Type header. Previous versions tried to parse as well but
didn't log errors.
Given the importance of XMLHttpRequest for dynamic web applications I think
it's very very bad to log errors where there are none. But I was pointed at
the upcoming spec change and told that this should be the standard behaviour.
As stated above, XMLHttpRequest is NOT only for XML. We use it for very light-
weight data formats and try to achieve minimal header overhead. For our typical
XMLHttpRequest the header size is already 30-90% of the total message length
so an additional header DOES matter.
On the other hand if I'm already using XML as my data format I certainly don't
care that much about protocol bloat or have to work against a fixed API.

Why should the spec be changed at all?

Obviously it is appcompat-wise a good idea to also XML-parse responses that
aren't marked as such. Doing otherwise would certainly break a mass of
existing AJAX apps.
So a clarification of the spec is indeed a good idea.

So what do I want?

- I'd like to avoid the implied header overhead of Content-Type for protocols
that don't use XML.
- I definately dont want to see future browsers choke on that

I propose to change the spec in a different way. Bullet points:
- If the XMLHttpRequest response does not specify a Content-Type, scan it for
the xml signature and ONLY parse as XML if it found.
- Do NOT log errors for parsing errors when no Content-Type was given

Please note that the spec has something to say of when to throw JavaScript
exceptions, but not about when to log an error message to the user. This
should be specified as well, I think.

Best regards,

Carsten Orthbandt

pixeltamer.net
c/o Carsten Orthbandt
Baumschulenstrasse 102
12437 Berlin
+49 (0) 30 34347690

Received on Friday, 15 June 2007 07:17:39 UTC