Re: Unicode escape sequence | Re: draft-ietf-httpbis-header-structure-00, unicode range

2016-12-14 15:28 GMT+09:00 Kari Hurtta <hurtta-ietf@elmme-mailer.org>:
> Poul-Henning Kamp <phk@phk.freebsd.dk>: (Tue Dec 13 23:43:15 2016)
>> --------
>> In message <20161213175419.GA7943@LK-Perkele-V2.elisa-laajakaista.fi>, Ilari Li
>> usvaara writes:
>>
>> >> 3.  HTTP/1 Serialization of HTTP Header Common Structure
>> >> https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-3
>
>> >astral planes (and I hope the escape system there would be more sane
>> >than the one JSON has...)
>
> I think that one escape sequence is more sane than something like
> \uD834\uDD1E  for one unicode codepoint.

Agreed.

Surrogate pairs are a remnant of old systems that expected all unicode
characters would fit into 16-bit, which turned out to be false. They
make sense only when the underlying system is using UTF-16.

Considering the fact that most of the programming languages (and
operating systems) are moving away from UTF-16, I think creating a new
standard that permits the use of surrogate pairs would be considered a
bad move in the long run.

>> Any suggestions ?
>
> Ilari Liusvaara told that 10FFFD is the last codepoint. So 6
> hex digits is sufficient.
>
> Either
>         ( "\" "X" 6*HEXDIG )
>
> or
>
>          ( "\" "X" 1*6HEXDIG "#" )
> or
>
>          ( "\" "#" 1*6HEXDIG "#" )
>
> or   escape characters may be some other also.
>      This was my first suggestion.
>
> I did not suggested \u  or \U  because these
> two are used with different length.
>
>> --
>> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
>> phk@FreeBSD.ORG         | TCP/IP since RFC 956
>> FreeBSD committer       | BSD since 4.3-tahoe
>> Never attribute to malice what can adequately be explained by incompetence.
>
> / Kari Hurtta
>



-- 
Kazuho Oku

Received on Thursday, 15 December 2016 22:14:46 UTC