- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Wed, 28 Mar 2012 09:33:02 +0200
- To: Glenn Adams <glenn@skynav.com>
- CC: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
On 2012-03-28 00:35, Glenn Adams wrote: > > On Tue, Mar 27, 2012 at 4:17 PM, Boris Zbarsky <bzbarsky@mit.edu > <mailto:bzbarsky@mit.edu>> wrote: > > On 3/27/12 2:46 PM, Glenn Adams wrote: > > Is this really a problem? > > > Yes. We've run into bug reports in the past of sites sending some > pretty random bytes in the HTTP status text, then reading > .statusText from script. If we want interop here, we need to define > the conversion. > > > HTTP defines the form and encoding of the status text > > > Except it doesn't, last I checked. Has that changed? > > > RFC2616 states (on pages : > > Fielding, et al. Standards Track [Page 39] > > Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF > > Fielding, et al. Standards Track [Page 40] > > Reason-Phrase = *<TEXT, excluding CR, LF> > > Fielding, et al. Standards Track [Page 15] > > The TEXT rule is only used for descriptive field contents and values > that are not intended to be interpreted by the message parser. Words > of *TEXT MAY contain characters from character sets other than ISO- > 8859-1 [22] only when encoded according to the rules of RFC 2047 > [14]. > > TEXT =<any OCTET except CTLs, > but including LWS> > > This makes it pretty clear that Reason Phrase must use ISO-8859-1 > (Latin1) unless it uses the encoded-word extension from RFC2047. If the > latter is used, then a charset must be designated. > > Given this, I don't see any spec bug (though there may be implementation > bugs in case the client side does not correctly implement the above HTTP > requirements). It's time to stop citing RFC 2616. Please have a look at <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-19.html#rfc.section.4>. Summary: HTTPbis does not attempt to define the character encoding anymore; if you use anything other than US-ASCII, you are on your own. RFC 2047 encoding never was used in practice, and has been removed. The right thing to do is the same as for header field values: use a US-ASCII compatible encoding that is most likely to work, and which is non-lossy, so a UTF-8 field value *can* be retrieved when needed. That encoding is ISO-8859-1. (And HTTPBis doesn't talk about this because it defines octets on the wire, not an API). Best regards, Julian
Received on Wednesday, 28 March 2012 07:33:38 UTC