Re: ByteString in Web IDL

On Jul 10, 2013, at 7:04 , Anne van Kesteren <annevk@annevk.nl> wrote:

> On Wed, Jul 10, 2013 at 9:39 AM, Robin Berjon <robin@w3.org> wrote:
>> It's just a name, nothing keeps you from using it if you know what you're
>> doing — it's just about scaring off people who don't. Besides, you might
>> need ByteStrings in a new API, but only if you're interfacing with some form
>> of legacy content.
> 
> I don't understand. HTTP is and will remain bytes, but in the
> confusing way that the bytes look like strings. Any new API around
> that would not be legacy and would have to deal with this somehow.


HTTP transports bytes, but where those bytes represent text, the HTTPbis spec actually uses several different ways to specify what the bytes mean:
- Restrict to ASCII: methods, header field names
- Use charset parameter on Content-Type: message body
- Ignore them entirely: status reason phrases
- Leave it unspecified: header field values

The last one can be confusing, but here the specification is sometimes complemented by other specs, such as RFC 6266 for Content-Disposition header field values. In the remaining cases, a new API could either impose a character encoding (as XMLHttpRequest does for JSON), try to guess the character encoding from the byte sequence and contextual information, or let the API client decide which character encoding to use.

Using ByteString effectively imposes ISO 8859-1. That seems to be required for legacy reasons in XMLHttpRequest, but it would generally be the wrong choice for any new API.

Norbert

Received on Wednesday, 10 July 2013 22:21:35 UTC