Re: [xhr] statusText is underdefined from Glenn Adams on 2012-03-28 (public-webapps@w3.org from January to March 2012)

From: Glenn Adams <glenn@skynav.com>
Date: Wed, 28 Mar 2012 11:42:09 -0600
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
Message-ID: <CACQ=j+ffCzMWSLrv081RGjvrYX2Ua=oQV5D15Mtx9tvxuPS_OQ@mail.gmail.com>

On Wed, Mar 28, 2012 at 4:48 AM, Julian Reschke <julian.reschke@gmx.de>wrote:

> On 2012-03-28 09:48, Glenn Adams wrote:
>
>> I'm not sure what you mean by citing ISO-8859-1 and UTF-8 in the same
>
> context. Please elaborate.
>>
>
> If you have UTF-8 on the wire and the client handles it as ISO-8859-1, the
> API user can extract the original octets from the string and re-decode from
> UTF-8. Of course that requires either heuristics or out-of-band information
> that this actually was UTF-8 in the first place.

The problem I have with this is now you have DOMString serving as a
container for an arbitrary byte string; i.e., no longer having any relation
to a UTF-16 code unit sequence. Naive uses of DOMString should be able to
assume it denotes UTF-16 encoded strings.

Any use of DOMString to serve as a holder for arbitrary binary data
(including inflating from UTF-8 bytes into 16-bit code units), should be
specifically marked as such. Since the user authored content will need to
know it is in fact not UTF-16 data.

Let's call these two modes jekyll and hyde. When the inflate algorithm's
input coding is not specified or known, then the output is a hyde mode
DOMString, which is in fact not a character string, but merely an unsigned
short[] array with no other semantics.

It is certainly possible to define reasonStatus in this fashion, but if
done this way, it should be made abundantly clear in the spec that this
usage of DOMString is of they hyde variety, which has the effect of placing
the burden of charset sniffing on the user defined code. This is certainly
a possible strategy for XHR client implementations to use in order to deal
with the mess of actual usage in the web (wherein the 8859 dictum was
ignored).

Received on Wednesday, 28 March 2012 17:43:01 UTC