- From: Henry Story <henry.story@bblfish.net>
- Date: Wed, 23 Jun 2004 17:58:34 +0200
- To: ietf-http-wg@w3.org
- Cc: Jamie Lokier <jamie@shareable.org>, Atom Syntax <atom-syntax@imc.org>
[I am also cc ing this to atom-syntax, just so those that are interested there to continue this discussion can move it over here. A log of it can be found here: http://lists.w3.org/Archives/Public/ietf-http-wg/2004AprJun/ ] On 23 Jun 2004, at 16:58, Jamie Lokier wrote: > Henry Story wrote: >> When a client receives a malformed server response it CAN (SHOULD?) >> notify the resource that it is broken, by sending a ERR request, > > What kind of malformed server response? > > Broken HTTP headers are comparatively rare and should probably get an > ERR, except perhaps for the Server header. Thanks. One more good reason for ERR. :-) > Malformed HTML is very common. Sending ERR in response to malformed > HTML would generate a flood of ERRs. But -- what is malformed HTML > anyway? Yes. Presumably HTML would not warrant an ERR. But XHTML might very well. > XML and XHTML do have well defined specifications. > I admit, that despite reading many documents and specifications, > I hadn't realised that text/xml needed to use ASCII characters only. Neither had I, nor most of the people on the atom mailing list. There is a HUGE thread there going on and on about that, which lead us to 2 proposals to solve this issue, of which this is the more generally applicable one. > To your example: > >> GET /index.xml HTTP/1.x >> Content-encoding: text/xml; charset=UTF-8 >> Accept: */* >> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0 >> Accept-Language: en-us, ja;q=0.62, de-de;q=0.93, de; >> ... > > That's a malformed request. > 400 Bad Request is the correct server response :) Thanks. I am sorry I wrote this all out a little too fast. The request would be the following: ------------8<--------------------- GET /index.xml HTTP/1.1 Host: example.com Connection: keep-alive User-Agent: BlogEx Accept: text/xml -----------8<----------------------- The response would be something like ------------8<--------------------- HTTP/1.1 200 OK Cache-Control: private Content-Type: text/xml Server: SomeServer/2.1 Content-Length: 55 Date: Wed, 23 Jun 2004 15:36:05 GMT <?xml version="1.0" encoding="iso-8859-1" ?> <pløtz/> -----------8<--------------------- I will fix it right away on the wiki at: http://www.intertwingly.net/wiki/pie/PaceErrVerb#preview > [...] I can't see what character that is after the "l" and before the > "t". > It appears as a box in my mailer. (Emacs says it's character code > 0x8f8 but I am suspicious). It's a swedish o I think (the one with a line through it. So it is not ascii. >> The response is broken though clearly interpretable. Clients (in the >> wider of Consumer2C or B2C) will therefore attempt to accommodate the >> standards due to market pressure. Market pressures are close to >> physical laws in their ferocity. We cannot change them. As a result >> more an more such breakages will occur, and the standards will be left >> in the dust of this vicious whirlwind.[1] In any case fighting against >> it is going to be very tiresome. > > It would be an easier fight if there were a central, high profile > place where commonly needed implementation bugs and workarounds could > be deposited -- and eventually removed a few years later when it's > confirmed they're not required any more. I imagine that in the body of the message (if one thinks it would be a good thing for ERR to have a body that is) one could have a URL that points to such a place. Perhaps a few will pop up as a result of creating such a method. > Not sure if that would help or hinder the fight to get clean standard > implementations out there,but it would certainly help with building > interoperable code, and highlighting the problems of real > implementations. > Name and shame, perhaps? That can be something additional. But perhaps before shaming someone one should first alert them to the error of their ways. >> Here is an example of the clients message: >> >> -------8<------- >> ERR /index.xml HTTP/1.x >> Content-encoding: text/xml; charset=UTF-8 >> Accept: */* >> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0 >> Accept-Language: en-us, ja;q=0.62, de-de;q=0.93, de; >> Error-Message: XML is of incorrect content type >> Error-Code: XXXX >> Error-Spec: RFCXYZ,sec 3; RFCXXX, sec54 >> Error-Date: Saturday 19 June 2004, 18:05:30 GMT (whatever encoding) >> Error-Method: GET >> Error-ContentLength: 63 > > Again, why do you have a Content-Encoding header, and malformed at > that, in the request? Thanks for pointing that out. Get it right fixed. > >> The Mime type of the content was text/xml. This requires the content >> to >> be in ASCII format, but we found some UTF-8 characters in the message. >> We could interpret the message at present but will not necessarily be >> able to do so in the future. Please refer to RFCXYZ, sec 3 and RFCXXX, >> sec54 for more information. These can be found at http://ietf.org/ > > The XML file identifies itself as iso-8859-1. Clearly it's intended > that those bytes are understood as iso-8859-1 characters, not UTF-8 > characters. A decent implementation would surely _either_ use the > encoding declaration, when none appears in the Content-Type (i.e. the > same as if "application/xml" were the content-type), or (conforming to > RFC 2376) use us-ascii, and treat all the high byte characters as > broken or single byte unexpected characters in a default encoding such > as (so often the case) iso-8859-1? I have to direct you to the huge thread that started this out on the atom mailing list. http://www.imc.org/atom-syntax/mail-archive/msg04656.html Perhaps someone there can post a short resume of it. > > Also, shouldn't the text say US-ASCII as opposed to just ASCII? :) > >> ADVANTAGES: > > I quite like the idea. Filling up logs of broken servers -- > excellent. Perhaps you could take advantage of the Referer header to > get a short message in there. :) There should clearly be some good behavior rules. > Note that some dubious servers ignore the method: they'll treat ERR > the same as GET, or do even worse things. (E.g. one server treats > this request line as a GET of "HTTP/1.1": "ERR /GET HTTP/1.1", and > treats this request as a GET of an empty URL: "ERR /index.html > HTTP/1.1"). Interesting point. You can't do much about broken servers. They will slowly die out hopefully. > > So you might not want to send ERRs to servers which haven't solicited > them. Yes. One could request which methods a server supports before sending the ERR. There is an HTTP method for that, OPTIONS I think. > -- Jamie Thanks a lot for the lengthy response. Looks like this is the right place to debate this proposal. Henry Story http://bblfish.net
Received on Wednesday, 23 June 2004 11:58:42 UTC