Re: [xhr] statusText is underdefined from Glenn Adams on 2012-03-28 (public-webapps@w3.org from January to March 2012)

From: Glenn Adams <glenn@skynav.com>
Date: Wed, 28 Mar 2012 01:48:54 -0600
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
Message-ID: <CACQ=j+dndNXHxzyREWVDYAkng6t4FraMFJt4OwsgReeXTSEweQ@mail.gmail.com>

On Wed, Mar 28, 2012 at 1:33 AM, Julian Reschke <julian.reschke@gmx.de>wrote:

> On 2012-03-28 00:35, Glenn Adams wrote:
>
>>
>> On Tue, Mar 27, 2012 at 4:17 PM, Boris Zbarsky <bzbarsky@mit.edu
>> <mailto:bzbarsky@mit.edu>> wrote:
>>
>>    On 3/27/12 2:46 PM, Glenn Adams wrote:
>>
>>        Is this really a problem?
>>
>>
>>    Yes.  We've run into bug reports in the past of sites sending some
>>    pretty random bytes in the HTTP status text, then reading
>>    .statusText from script.  If we want interop here, we need to define
>>    the conversion.
>>
>>
>>        HTTP defines the form and encoding of the status text
>>
>>
>>    Except it doesn't, last I checked.  Has that changed?
>>
>>
>> RFC2616 states (on pages :
>>
>> Fielding, et al. Standards Track [Page 39]
>>
>>    Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
>>
>> Fielding, et al. Standards Track [Page 40]
>>
>>    Reason-Phrase  = *<TEXT, excluding CR, LF>
>>
>> Fielding, et al. Standards Track [Page 15]
>>
>>    The TEXT rule is only used for descriptive field contents and values
>>    that are not intended to be interpreted by the message parser. Words
>>    of *TEXT MAY contain characters from character sets other than ISO-
>>    8859-1 [22] only when encoded according to the rules of RFC 2047
>>    [14].
>>
>>        TEXT           =<any OCTET except CTLs,
>>                         but including LWS>
>>
>> This makes it pretty clear that Reason Phrase must use ISO-8859-1
>> (Latin1) unless it uses the encoded-word extension from RFC2047. If the
>> latter is used, then a charset must be designated.
>>
>> Given this, I don't see any spec bug (though there may be implementation
>> bugs in case the client side does not correctly implement the above HTTP
>> requirements).
>>
>
> It's time to stop citing RFC 2616. Please have a look at <
> http://greenbytes.de/tech/**webdav/draft-ietf-httpbis-p2-**
> semantics-19.html#rfc.section.**4<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-19.html#rfc.section.4>
> >.
>

Since 2616 is published and HTTPbis is not, I will go on citing it.


> Summary: HTTPbis does not attempt to define the character encoding
> anymore; if you use anything other than US-ASCII, you are on your own. RFC
> 2047 encoding never was used in practice, and has been removed.
>
> The right thing to do is the same as for header field values: use a
> US-ASCII compatible encoding that is most likely to work, and which is
> non-lossy, so a UTF-8 field value *can* be retrieved when needed.
>
> That encoding is ISO-8859-1.
>

I'm not sure what you mean by citing ISO-8859-1 and UTF-8 in the same
context. Please elaborate.


> (And HTTPBis doesn't talk about this because it defines octets on the
> wire, not an API).
>

If HTTPbis doesn't define the character encoding of bytes on the wire when
serializing reason status, then it leaves much to be desired.

Received on Wednesday, 28 March 2012 07:49:45 UTC