W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2016

Re: draft-ietf-httpbis-header-structure-00, unicode range

From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Tue, 13 Dec 2016 21:43:15 +0000
To: Ilari Liusvaara <ilariliusvaara@welho.com>
cc: Kari Hurtta <hurtta-ietf@elmme-mailer.org>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
Message-ID: <25434.1481665395@critter.freebsd.dk>
In message <20161213175419.GA7943@LK-Perkele-V2.elisa-laajakaista.fi>, Ilari Li
usvaara writes:

>> 3.  HTTP/1 Serialization of HTTP Header Common Structure
>> https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-3

>Well, that production lists UTF8-4, which is presumably 4-byte UTF-8
>sequences, and all valid ones are astral plane codepoints.

My impression was that UTF8 and 8-bit clean HTTP/1 got shot down
in previous discussions, but I left UTF8 here for now, pending a
more structured decision making on this.

I see us having four options, in my order of preference:

1) Forbid Unicode in headers.

2) Take UTF8 out and leave all (non-ASCII) unicode to the \uxxxx
   escape mechanism.

3) Leave UTF8 in, and make it clear that it may or may not work, so
   that people can use it in controlled environments.

4) Leave UTF8 in, and specify how to indicate/negotiate if it can be used.

>astral planes (and I hope the escape system there would be more sane
>than the one JSON has...)

Any suggestions ?

Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
