W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2016

Re: Unicode escape sequence | Re: draft-ietf-httpbis-header-structure-00, unicode range

From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Wed, 14 Dec 2016 22:04:03 +1000
Message-ID: <CACweHNAtn=Qruf7aWW9eReGCY0kv2ujMYmH9HdRUjgpvKEWVUQ@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Martin Thomson <martin.thomson@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Alexey Melnikov <alexey.melnikov@isode.com>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>, Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
On 14 December 2016 at 21:53, Julian Reschke <julian.reschke@gmx.de> wrote:

> On 2016-12-14 12:37, Martin Thomson wrote:
>
>> On 14 December 2016 at 21:51, Poul-Henning Kamp <phk@phk.freebsd.dk>
>> wrote:
>>
>>> Well, UTF-8 would also go through HPACK, but by eye-ball it seems
>>> that it would be more efficient.
>>>
>>
>> If you have lots of ASCII still, you can probably Huffman encode,
>> though if you have lots of non-ASCII, you need to watch out: a three
>> octet UTF-8 encoded codepoint turns into (worst case) 82 bits.  Best
>> case is 58 bits (both of which are invalid, so maybe not).
>>
>> I can't remember, is there actually a good reason why we can't just
>> start shoving UTF-8 in header fields?  I mean, h2 is probably OK with
>> this.
>>
>
> Some APIs assume ISO-8859-1, so unexpected things might happen (of course
> that's independent of the actual transport).
>
> Best regards, Julian
>
>
​Particularly since HTML4 taught us that "ISO-8859-1" means "Windows-1252",
which actually has values for most of \x80-\x9F

Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Received on Wednesday, 14 December 2016 12:04:36 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 14 December 2016 12:04:40 UTC