- From: Glenn Adams <glenn@skynav.com>
- Date: Wed, 28 Mar 2012 12:24:33 -0600
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: public-webapps@w3.org
- Message-ID: <CACQ=j+cw970-23B1FKeNa3kfT9mrWRDFhf_Xq9pAEg-UZ5mvjg@mail.gmail.com>
On Tue, Mar 27, 2012 at 3:23 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > The spec says: > > Return the HTTP status text. > > But the HTTP status text is a sequence of bytes, while the return value > for statusText is a DOMString. The conversion from one to the other needs > to be defined. If I may summarize: (1) although RFC2616 prescribes the use of 8859-1 for the on-the-wire representation of status text, this has not been followed in practice, and indeed, arbitrary character encodings are being used when serializing the reason status; (2) xhr client implementations have two options for exposing status text: - do not interpret status text in terms of character encoding; rather, simply expose the byte string to the user-defined code and leave encoding determination up to the user-defined code; - do interpret status text encoding, and convert to a semantically well defined character string, possibly requiring sniffing the serialized byte sequence; (3) in both of these options, it is possible to use DOMString to return the results: - in the first case, using what I have called "hyde mode", the DOMString merely serves as an unsigned short[] for which the originally serialized byte sequence (of status text) is stuffed into the lower bytes (having no necessary relationship to a Unicode coded character sequence); - in the second case, using what I have called "jekyll mode", the DOMString is interpreted (as normal) as a UTF-16 encoded Unicode string (corresponding to a well-defined Unicode coded character sequence); Is this a accurate summary? I agree that if the first option above is chosen, then the inflate algorithm is adequate. However, the specification text should make it abundantly clear that the "hyde mode" flavor of DOMString is being employed, and that the user defined code has the burden of decoding. As a web-content author and user, I would prefer that option #2 is adopted; or, if I were very particular, I would prefer that two accessors were provided: one for obtaining the raw input bytes (e.g., as a BLOB) and another for obtaining the client's best guess at a decoded Unicode string. In this latter case, I could make the decision on which to use. Overall, I could accept option #1 if the spec makes clear that "hyde mode" applies. G.
Received on Wednesday, 28 March 2012 18:25:25 UTC