Re: XHR LC comment: header encoding

Hi Boris,

thanks for the feedback! Comments inline.

Boris Zbarsky wrote:
>...
> More precisely, what Gecko does here is to take the raw byte string and 
> byte-inflate it (by setting the high byte of each 16-bit code unit to 0 
> and the low byte to the corresponding byte of the given byte string) 
> before returning it to JS.
> 
> This happens to more or less match "decoding as ISO-8859-1", but not quite.
> ...

Not quite?

> ...
>>  From HTTP's point of view, the header field value really is opaque. So
>> you can put there anything, as long as it fits into the header field 
>> ABNF.
> 
> True; what does that mean for converting header values to 16-bit code 
> units in practice?  Seems like byte-inflation might be the only 
> reasonable thing to do...
> ...

It at least preserves all the information that was there and would allow 
a caller to re-decode as UTF-8 as a separate step.

>> Of course that only helps if senders and receivers agree on the
>> encoding.
> 
> True, but "encoding" here needs to mean more than just "encoding of 
> Unicode", since one can just stick random byte arrays, within the ABNF 
> restrictions, in the header, right?

Yes.

Right now there is no interoperable encoding, so the best thing to do in 
APIs that use character sequences instead of octets is to preserve as 
much information as possible.

It would be nice if we could find out whether anybody relies on the 
current implementation. Maybe switch it back to byte inflation in 
Mozilla trunk?

Best regards, Julian

Received on Monday, 4 January 2010 16:44:40 UTC