Re: ByteString in Web IDL from Boris Zbarsky on 2013-07-10 (public-script-coord@w3.org from July to September 2013)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Wed, 10 Jul 2013 00:07:10 -0400
To: public-script-coord@w3.org
Message-ID: <51DCDDEE.4030701@mit.edu>

On 7/9/13 11:58 PM, Norbert Lindenberg wrote:
> Why do Web IDL and XMLHttpRequest need ByteString [1, 2]?

Because of legacy API, basically.  New APIs should not be using ByteString.

> And why does ByteString have a conversion to/from ECMAScript strings that assumes ISO 8859-1 [3]?

Because of legacy API...

> If I understand the previous discussion [4] correctly, XMLHttpRequest needs a way to communicate byte sequences that occur in HTTP status messages or headers for which HTTPbis doesn't specify a character encoding anymore, and for which XMLHttpRequest doesn't determine the character encoding either.

Yes. Furthermore, this stuff has been exposed as an ES string for the 
entire existence of XMLHttpRequest in all the browsers that implement 
it.  So all WebIDL and the XMLHttpRequest specification are doing is 
documenting existing behavior.

> For such byte sequences, ArrayBuffer or UInt8Array seem well suited, in particular since the proposed Encoding API [5, 6] uses them.

If we were designing XMLHttpRequest now, that's exactly what would get 
used.  But we're not dealing with green-field API design here: we have 
existing implementations, existing websites that depend on those 
implementations, and we just need to document what those implementations 
do...

> A default conversion using ISO 8859-1 seems misguided - in general, today's web standards are not shy about recommending UTF-8.

I think thinking of this as a "conversion using ISO-8859-1" is somewhat 
wrong.  This is a case where an ES string is not being used as an actual 
Unicode string but as something more like Uint16Array.

> Such a default conversion might have made sense in the past when ECMAScript strings were the only containers available for compact binary data, and all kinds of binary data processing used them, including character encoding conversion, but such hacks should no longer be necessary.

The hack here is necessary to the extent that the API exposed by ES 
strings does not match that of ArrayBuffer or Uint8Array, because 
websites expect these objects to be ES strings, for exactly the 
historical reasons you describe.

> HTTP method and header names, BTW, are clearly specified as containing only ASCII characters [7, 8], and so can be represented as DOMString, with exceptions if the strings contain any non-ASCII characters.

To the extent that we're sure no server-side stuff is violating the HTTP 
specification here, yes.  I, personally, is sure that there _are_ 
violations out there.  Whether they're used with XHR is an open question.

-Boris

Received on Wednesday, 10 July 2013 04:07:40 UTC