W3C home > Mailing lists > Public > public-script-coord@w3.org > July to September 2013

Re: ByteString in Web IDL

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 10 Jul 2013 00:30:20 -0400
Message-ID: <CA+c2ei8pgJwPTrCJY1z86D-rRpFjKkxFfFSr5AidPEkEAhgmow@mail.gmail.com>
To: Norbert Lindenberg <ecmascript@lindenbergsoftware.com>
Cc: Anne van Kesteren <annevk@annevk.nl>, public-script-coord <public-script-coord@w3.org>
On Tue, Jul 9, 2013 at 11:58 PM, Norbert Lindenberg
<ecmascript@lindenbergsoftware.com> wrote:
> Why do Web IDL and XMLHttpRequest need ByteString [1, 2]? And why does ByteString have a conversion to/from ECMAScript strings that assumes ISO 8859-1 [3]?
>
> If I understand the previous discussion [4] correctly, XMLHttpRequest needs a way to communicate byte sequences that occur in HTTP status messages or headers for which HTTPbis doesn't specify a character encoding anymore, and for which XMLHttpRequest doesn't determine the character encoding either.
>
> For such byte sequences, ArrayBuffer or UInt8Array seem well suited, in particular since the proposed Encoding API [5, 6] uses them.

It seems *very* annoying if you couldn't do

xhr.open("GET", "/foo");
xhr.setResponseHeader("some-header", "value");
xhr.send();

but instead had to do:

xhr.open(new Int8Array([71, 69, 84]), "/foo");
xhr.setResponseHeader(Int8Array([115, 111, 109, 101, 45, 104...]),
Int8Array([...]));
xhr.send();

The latter might technically be more correct, but it's much more error
prone and harder for authors to use.

> A default conversion using ISO 8859-1 seems misguided - in general, today's web standards are not shy about recommending UTF-8. Such a default conversion might have made sense in the past when ECMAScript strings were the only containers available for compact binary data, and all kinds of binary data processing used them, including character encoding conversion, but such hacks should no longer be necessary.

Given that HTTP doesn't use UTF8 for these values, encoding to UTF8
seems like a mistake.

So the answer here is basically "legacy".

> HTTP method and header names, BTW, are clearly specified as containing only ASCII characters [7, 8], and so can be represented as DOMString, with exceptions if the strings contain any non-ASCII characters.

That's essentially what we are doing. Except that we support 8 bits
instead of 7. Yes, HTTP says that headers are supposed to be ASCII,
but I doubt that all servers follow that.

/ Jonas
Received on Wednesday, 10 July 2013 04:31:19 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:37:50 UTC