- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Wed, 10 Jul 2013 00:07:10 -0400
- To: public-script-coord@w3.org
On 7/9/13 11:58 PM, Norbert Lindenberg wrote: > Why do Web IDL and XMLHttpRequest need ByteString [1, 2]? Because of legacy API, basically. New APIs should not be using ByteString. > And why does ByteString have a conversion to/from ECMAScript strings that assumes ISO 8859-1 [3]? Because of legacy API... > If I understand the previous discussion [4] correctly, XMLHttpRequest needs a way to communicate byte sequences that occur in HTTP status messages or headers for which HTTPbis doesn't specify a character encoding anymore, and for which XMLHttpRequest doesn't determine the character encoding either. Yes. Furthermore, this stuff has been exposed as an ES string for the entire existence of XMLHttpRequest in all the browsers that implement it. So all WebIDL and the XMLHttpRequest specification are doing is documenting existing behavior. > For such byte sequences, ArrayBuffer or UInt8Array seem well suited, in particular since the proposed Encoding API [5, 6] uses them. If we were designing XMLHttpRequest now, that's exactly what would get used. But we're not dealing with green-field API design here: we have existing implementations, existing websites that depend on those implementations, and we just need to document what those implementations do... > A default conversion using ISO 8859-1 seems misguided - in general, today's web standards are not shy about recommending UTF-8. I think thinking of this as a "conversion using ISO-8859-1" is somewhat wrong. This is a case where an ES string is not being used as an actual Unicode string but as something more like Uint16Array. > Such a default conversion might have made sense in the past when ECMAScript strings were the only containers available for compact binary data, and all kinds of binary data processing used them, including character encoding conversion, but such hacks should no longer be necessary. The hack here is necessary to the extent that the API exposed by ES strings does not match that of ArrayBuffer or Uint8Array, because websites expect these objects to be ES strings, for exactly the historical reasons you describe. > HTTP method and header names, BTW, are clearly specified as containing only ASCII characters [7, 8], and so can be represented as DOMString, with exceptions if the strings contain any non-ASCII characters. To the extent that we're sure no server-side stuff is violating the HTTP specification here, yes. I, personally, is sure that there _are_ violations out there. Whether they're used with XHR is an open question. -Boris
Received on Wednesday, 10 July 2013 04:07:40 UTC