- From: Glenn Adams <glenn@skynav.com>
- Date: Wed, 28 Mar 2012 01:48:54 -0600
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
- Message-ID: <CACQ=j+dndNXHxzyREWVDYAkng6t4FraMFJt4OwsgReeXTSEweQ@mail.gmail.com>
On Wed, Mar 28, 2012 at 1:33 AM, Julian Reschke <julian.reschke@gmx.de>wrote: > On 2012-03-28 00:35, Glenn Adams wrote: > >> >> On Tue, Mar 27, 2012 at 4:17 PM, Boris Zbarsky <bzbarsky@mit.edu >> <mailto:bzbarsky@mit.edu>> wrote: >> >> On 3/27/12 2:46 PM, Glenn Adams wrote: >> >> Is this really a problem? >> >> >> Yes. We've run into bug reports in the past of sites sending some >> pretty random bytes in the HTTP status text, then reading >> .statusText from script. If we want interop here, we need to define >> the conversion. >> >> >> HTTP defines the form and encoding of the status text >> >> >> Except it doesn't, last I checked. Has that changed? >> >> >> RFC2616 states (on pages : >> >> Fielding, et al. Standards Track [Page 39] >> >> Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF >> >> Fielding, et al. Standards Track [Page 40] >> >> Reason-Phrase = *<TEXT, excluding CR, LF> >> >> Fielding, et al. Standards Track [Page 15] >> >> The TEXT rule is only used for descriptive field contents and values >> that are not intended to be interpreted by the message parser. Words >> of *TEXT MAY contain characters from character sets other than ISO- >> 8859-1 [22] only when encoded according to the rules of RFC 2047 >> [14]. >> >> TEXT =<any OCTET except CTLs, >> but including LWS> >> >> This makes it pretty clear that Reason Phrase must use ISO-8859-1 >> (Latin1) unless it uses the encoded-word extension from RFC2047. If the >> latter is used, then a charset must be designated. >> >> Given this, I don't see any spec bug (though there may be implementation >> bugs in case the client side does not correctly implement the above HTTP >> requirements). >> > > It's time to stop citing RFC 2616. Please have a look at < > http://greenbytes.de/tech/**webdav/draft-ietf-httpbis-p2-** > semantics-19.html#rfc.section.**4<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-19.html#rfc.section.4> > >. > Since 2616 is published and HTTPbis is not, I will go on citing it. > Summary: HTTPbis does not attempt to define the character encoding > anymore; if you use anything other than US-ASCII, you are on your own. RFC > 2047 encoding never was used in practice, and has been removed. > > The right thing to do is the same as for header field values: use a > US-ASCII compatible encoding that is most likely to work, and which is > non-lossy, so a UTF-8 field value *can* be retrieved when needed. > > That encoding is ISO-8859-1. > I'm not sure what you mean by citing ISO-8859-1 and UTF-8 in the same context. Please elaborate. > (And HTTPBis doesn't talk about this because it defines octets on the > wire, not an API). > If HTTPbis doesn't define the character encoding of bytes on the wire when serializing reason status, then it leaves much to be desired.
Received on Wednesday, 28 March 2012 07:49:45 UTC