- From: Ilari Liusvaara <ilariliusvaara@welho.com>
- Date: Tue, 13 Dec 2016 23:42:23 +0200
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Kari Hurtta <hurtta-ietf@elmme-mailer.org>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
On Tue, Dec 13, 2016 at 09:28:47PM +0000, Poul-Henning Kamp wrote: > -------- > In message <20161213173327.C1F7D1714B@welho-filter2.welho.com>, Kari Hurtta wri > tes: > > >2. Definition of HTTP Header Common Structure > >https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-2 > > > >| unicode_string = * unicode_codepoint > >| # XXX: Is there a place to import this from ? > >| # Unrestricted unicode, because there is no sane > >| # way to restrict or otherwise make unicode "safe". > > > >What is range of unicode_codepoint ? > > As far as I know, UNICODE does not have a firm upper end, but > everybody _expects_ 32 bits to be enough for everybody. Actually, it does: 10FFFD is the last codepoint in Unicode (it is actually allocated as part of PUA). IIRC, Unicode has exactly 1,111,998 codepoints in total (most of those are unallocated). > Since section two is the abstract datamodel, that's the best we can > do there. > > >3. HTTP/1 Serialization of HTTP Header Common Structure > >https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-3 > >[...] > >Or is unicode values > 0xFFFF > >encoded with surrogates (values 0xd8000 - 0xdffff) ? > >( UCS-2 or UTF-16 is used ) > > That was the plan. > > Not a particular good plan, as evindenced by the fact that I forgot > to write that, and that JSON has seen interop issues with parsers > missing that detail. Also, note that the surrogate mechanism can only encode up to plane 16 (that's the reason why unicode only has 17 planes!) And I suppose that the surrogates MUST be paired properly (JSON actually does not require this). -Ilari
Received on Tuesday, 13 December 2016 21:43:05 UTC