Re: Proposal for an HTTP ERR method from Henry Story on 2004-06-23 (ietf-http-wg@w3.org from April to June 2004)

From: Henry Story <henry.story@bblfish.net>
Date: Wed, 23 Jun 2004 17:58:34 +0200
To: ietf-http-wg@w3.org
Cc: Jamie Lokier <jamie@shareable.org>, Atom Syntax <atom-syntax@imc.org>
Message-Id: <2E83D4E7-C52E-11D8-8703-000A95D9FA7A@bblfish.net>
[I am also cc ing this to atom-syntax, just so those that are 
interested there to continue this discussion can move it over here. A 
log of it can be found here:
http://lists.w3.org/Archives/Public/ietf-http-wg/2004AprJun/
]

On 23 Jun 2004, at 16:58, Jamie Lokier wrote:

> Henry Story wrote:
>> When a client receives a malformed server response it CAN (SHOULD?)
>> notify the resource that it is broken, by sending a ERR request,
>
> What kind of malformed server response?
>
> Broken HTTP headers are comparatively rare and should probably get an
> ERR, except perhaps for the Server header.

Thanks. One more good reason for ERR. :-)

> Malformed HTML is very common.  Sending ERR in response to malformed
> HTML would generate a flood of ERRs.  But -- what is malformed HTML
> anyway?

Yes. Presumably HTML would not warrant an ERR. But XHTML might very 
well.

> XML and XHTML do have well defined specifications.
> I admit, that despite reading many documents and specifications,
> I hadn't realised that text/xml needed to use ASCII characters only.

Neither had I, nor most of the people on the atom mailing list. There 
is a HUGE thread there going on and on about that, which lead us to 2 
proposals to solve this issue, of which this is the more generally 
applicable one.

> To your example:
>
>> GET /index.xml HTTP/1.x
>> Content-encoding: text/xml; charset=UTF-8
>> Accept: */*
>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
>> Accept-Language: en-us, ja;q=0.62, de-de;q=0.93, de;
>> ...
>
> That's a malformed request.
> 400 Bad Request is the correct server response :)

Thanks. I am sorry I wrote this all out a little too fast.
The request would be the following:

------------8<---------------------
GET /index.xml HTTP/1.1
Host: example.com
Connection: keep-alive
User-Agent: BlogEx
Accept: text/xml
-----------8<-----------------------

The response would be something like

------------8<---------------------
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/xml
Server: SomeServer/2.1
Content-Length: 55
Date: Wed, 23 Jun 2004 15:36:05 GMT

<?xml version="1.0" encoding="iso-8859-1" ?>
<pløtz/>
-----------8<---------------------

I will fix it right away on the wiki at:
http://www.intertwingly.net/wiki/pie/PaceErrVerb#preview

> [...] I can't see what character that is after the "l" and before the 
> "t".
> It appears as a box in my mailer.  (Emacs says it's character code
> 0x8f8 but I am suspicious).

It's a swedish o I think (the one with a line through it. So it is not 
ascii.

>> The response is broken though clearly interpretable. Clients (in the
>> wider of Consumer2C or B2C) will therefore attempt to accommodate the
>> standards due to market pressure. Market pressures are close to
>> physical laws in their ferocity. We cannot change them. As a result
>> more an more such breakages will occur, and the standards will be left
>> in the dust of this vicious whirlwind.[1] In any case fighting against
>> it is going to be very tiresome.

>
> It would be an easier fight if there were a central, high profile
> place where commonly needed implementation bugs and workarounds could
> be deposited -- and eventually removed a few years later when it's
> confirmed they're not required any more.

I imagine that in the body of the message (if one thinks it would be a 
good thing for ERR to have a body that is) one could have a URL that 
points to such a place. Perhaps a few will pop up as a result of 
creating such a method.

> Not sure if that would help or hinder the fight to get clean standard
> implementations out there,but it would certainly help with building
> interoperable code, and highlighting the problems of real 
> implementations.
> Name and shame, perhaps?

That can be something additional. But perhaps before shaming someone 
one should first alert them to the error of their ways.

>> Here is an example of the clients message:
>>
>> -------8<-------
>> ERR /index.xml HTTP/1.x
>> Content-encoding: text/xml; charset=UTF-8
>> Accept: */*
>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
>> Accept-Language: en-us, ja;q=0.62, de-de;q=0.93, de;
>> Error-Message: XML is of incorrect content type
>> Error-Code: XXXX
>> Error-Spec: RFCXYZ,sec 3; RFCXXX, sec54
>> Error-Date:  Saturday 19 June 2004, 18:05:30 GMT (whatever encoding)
>> Error-Method: GET
>> Error-ContentLength: 63
>
> Again, why do you have a Content-Encoding header, and malformed at
> that, in the request?

Thanks for pointing that out. Get it right fixed.

>
>> The Mime type of the content was text/xml. This requires the content 
>> to
>> be in ASCII format, but we found some UTF-8 characters in the message.
>> We could interpret the message at present but will not necessarily be
>> able to do so in the future. Please refer to RFCXYZ, sec 3 and RFCXXX,
>> sec54 for more information. These can be found at http://ietf.org/
>
> The XML file identifies itself as iso-8859-1.  Clearly it's intended
> that those bytes are understood as iso-8859-1 characters, not UTF-8
> characters.  A decent implementation would surely _either_ use the
> encoding declaration, when none appears in the Content-Type (i.e. the
> same as if "application/xml" were the content-type), or (conforming to
> RFC 2376) use us-ascii, and treat all the high byte characters as
> broken or single byte unexpected characters in a default encoding such
> as (so often the case) iso-8859-1?

I have to direct you to the huge thread that started this out on the
atom mailing list.
http://www.imc.org/atom-syntax/mail-archive/msg04656.html
Perhaps someone there can post a short resume of it.

>
> Also, shouldn't the text say US-ASCII as opposed to just ASCII? :)
>
>> ADVANTAGES:
>
> I quite like the idea.  Filling up logs of broken servers --
> excellent.  Perhaps you could take advantage of the Referer header to
> get a short message in there. :)

There should clearly be some good behavior rules.

> Note that some dubious servers ignore the method: they'll treat ERR
> the same as GET, or do even worse things.  (E.g. one server treats
> this request line as a GET of "HTTP/1.1": "ERR /GET HTTP/1.1", and
> treats this request as a GET of an empty URL: "ERR /index.html
> HTTP/1.1").

Interesting point. You can't do much about broken servers. They will 
slowly die out hopefully.

>
> So you might not want to send ERRs to servers which haven't solicited
> them.

Yes. One could request which methods a server supports before sending 
the ERR.
There is an HTTP method for that, OPTIONS I think.

> -- Jamie

Thanks a lot for the lengthy response. Looks like this is the right 
place to debate this proposal.

Henry Story
http://bblfish.net
Received on Wednesday, 23 June 2004 11:58:42 UTC