- From: Glenn Adams <glenn@skynav.com>
- Date: Wed, 28 Mar 2012 11:42:09 -0600
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
- Message-ID: <CACQ=j+ffCzMWSLrv081RGjvrYX2Ua=oQV5D15Mtx9tvxuPS_OQ@mail.gmail.com>
On Wed, Mar 28, 2012 at 4:48 AM, Julian Reschke <julian.reschke@gmx.de>wrote: > On 2012-03-28 09:48, Glenn Adams wrote: > >> I'm not sure what you mean by citing ISO-8859-1 and UTF-8 in the same > > context. Please elaborate. >> > > If you have UTF-8 on the wire and the client handles it as ISO-8859-1, the > API user can extract the original octets from the string and re-decode from > UTF-8. Of course that requires either heuristics or out-of-band information > that this actually was UTF-8 in the first place. The problem I have with this is now you have DOMString serving as a container for an arbitrary byte string; i.e., no longer having any relation to a UTF-16 code unit sequence. Naive uses of DOMString should be able to assume it denotes UTF-16 encoded strings. Any use of DOMString to serve as a holder for arbitrary binary data (including inflating from UTF-8 bytes into 16-bit code units), should be specifically marked as such. Since the user authored content will need to know it is in fact not UTF-16 data. Let's call these two modes jekyll and hyde. When the inflate algorithm's input coding is not specified or known, then the output is a hyde mode DOMString, which is in fact not a character string, but merely an unsigned short[] array with no other semantics. It is certainly possible to define reasonStatus in this fashion, but if done this way, it should be made abundantly clear in the spec that this usage of DOMString is of they hyde variety, which has the effect of placing the burden of charset sniffing on the user defined code. This is certainly a possible strategy for XHR client implementations to use in order to deal with the mess of actual usage in the web (wherein the 8859 dictum was ignored).
Received on Wednesday, 28 March 2012 17:43:01 UTC