Re: [xhr] statusText is underdefined from Julian Reschke on 2012-03-28 (public-webapps@w3.org from January to March 2012)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 28 Mar 2012 09:33:02 +0200
To: Glenn Adams <glenn@skynav.com>
CC: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
Message-ID: <4F72BEAE.9000005@gmx.de>

On 2012-03-28 00:35, Glenn Adams wrote:
>
> On Tue, Mar 27, 2012 at 4:17 PM, Boris Zbarsky <bzbarsky@mit.edu
> <mailto:bzbarsky@mit.edu>> wrote:
>
>     On 3/27/12 2:46 PM, Glenn Adams wrote:
>
>         Is this really a problem?
>
>
>     Yes.  We've run into bug reports in the past of sites sending some
>     pretty random bytes in the HTTP status text, then reading
>     .statusText from script.  If we want interop here, we need to define
>     the conversion.
>
>
>         HTTP defines the form and encoding of the status text
>
>
>     Except it doesn't, last I checked.  Has that changed?
>
>
> RFC2616 states (on pages :
>
> Fielding, et al. Standards Track [Page 39]
>
>     Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
>
> Fielding, et al. Standards Track [Page 40]
>
>     Reason-Phrase  = *<TEXT, excluding CR, LF>
>
> Fielding, et al. Standards Track [Page 15]
>
>     The TEXT rule is only used for descriptive field contents and values
>     that are not intended to be interpreted by the message parser. Words
>     of *TEXT MAY contain characters from character sets other than ISO-
>     8859-1 [22] only when encoded according to the rules of RFC 2047
>     [14].
>
>         TEXT           =<any OCTET except CTLs,
>                          but including LWS>
>
> This makes it pretty clear that Reason Phrase must use ISO-8859-1
> (Latin1) unless it uses the encoded-word extension from RFC2047. If the
> latter is used, then a charset must be designated.
>
> Given this, I don't see any spec bug (though there may be implementation
> bugs in case the client side does not correctly implement the above HTTP
> requirements).

It's time to stop citing RFC 2616. Please have a look at 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-19.html#rfc.section.4>.

Summary: HTTPbis does not attempt to define the character encoding 
anymore; if you use anything other than US-ASCII, you are on your own. 
RFC 2047 encoding never was used in practice, and has been removed.

The right thing to do is the same as for header field values: use a 
US-ASCII compatible encoding that is most likely to work, and which is 
non-lossy, so a UTF-8 field value *can* be retrieved when needed.

That encoding is ISO-8859-1.

(And HTTPBis doesn't talk about this because it defines octets on the 
wire, not an API).

Best regards, Julian

Received on Wednesday, 28 March 2012 07:33:38 UTC