Re: [xhr] statusText is underdefined

On Tue, Mar 27, 2012 at 3:23 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:

> The spec says:
>
>  Return the HTTP status text.
>
> But the HTTP status text is a sequence of bytes, while the return value
> for statusText is a DOMString.  The conversion from one to the other needs
> to be defined.


If I may summarize:

(1) although RFC2616 prescribes the use of 8859-1 for the on-the-wire
representation of status text, this has not been followed in practice, and
indeed, arbitrary character encodings are being used when serializing the
reason status;

(2) xhr client implementations have two options for exposing status text:

   - do not interpret status text in terms of character encoding; rather,
   simply expose the byte string to the user-defined code and leave encoding
   determination up to the user-defined code;
   - do interpret status text encoding, and convert to a semantically well
   defined character string, possibly requiring sniffing the serialized byte
   sequence;

(3) in both of these options, it is possible to use DOMString to return the
results:

   - in the first case, using what I have called "hyde mode", the DOMString
   merely serves as an unsigned short[] for which the originally serialized
   byte sequence (of status text) is stuffed into the lower bytes (having no
   necessary relationship to a Unicode coded character sequence);
   - in the second case, using what I have called "jekyll mode", the
   DOMString is interpreted (as normal) as a UTF-16 encoded Unicode string
   (corresponding to a well-defined Unicode coded character sequence);

Is this a accurate summary?

I agree that if the first option above is chosen, then the inflate
algorithm is adequate. However, the specification text should make it
abundantly clear that the "hyde mode" flavor of DOMString is being
employed, and that the user defined code has the burden of decoding.

As a web-content author and user, I would prefer that option #2 is adopted;
or, if I were very particular, I would prefer that two accessors were
provided: one for obtaining the raw input bytes (e.g., as a BLOB) and
another for obtaining the client's best guess at a decoded Unicode string.
In this latter case, I could make the decision on which to use.

Overall, I could accept option #1 if the spec makes clear that "hyde mode"
applies.

G.

Received on Wednesday, 28 March 2012 18:25:25 UTC