Re: Unicode escape sequence | Re: draft-ietf-httpbis-header-structure-00, unicode range from Martin J. Dürst on 2017-01-04 (ietf-http-wg@w3.org from January to March 2017)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Wed, 4 Jan 2017 11:52:20 +0900
To: Matthew Kerwin <matthew@kerwin.net.au>
CC: Julian Reschke <julian.reschke@gmx.de>, Alexey Melnikov <alexey.melnikov@isode.com>, HTTP working group mailing list <ietf-http-wg@w3.org>
Message-ID: <6071d7e0-f1b9-3bb7-2a7d-a8ed60500e20@it.aoyama.ac.jp>

Sorry to be late, cleanup during the holidays.

On 2016/12/15 10:57, Matthew Kerwin wrote:

> I have should noted here that Ruby uses this \u{N...} syntax, including
> the lower limit of one hexadecimal digit.  This is a valid string literal
> in Ruby:
>
> "\u{df}\u{9}\u{1f602}"

Not only that, but Ruby allows \uABCD in case there are exactly 4 hex 
digits. Also, you can write the above as \u{df 9 1f602}, too. Ruby puts 
writers' and readers' convenience above other concerns, but this doesn't 
mean that we can't use it.

> There is precedent, although I'm not sure if it's a good precedent: the
> "content" attribute in CSS uses:
>
>     %5c 1*6HEXDIGIT
>
> ...which is both undelimited (which I oppose) and without an explicit
> hexadecimal indicator (about which I'm mostly ambivalent.)

Yes. That lead to some of the stuff in 
https://www.w3.org/TR/charmod/#sec-Escaping, in particular 
https://www.w3.org/TR/charmod/#C044.

As for the \u'ABCD' recommendation in 
https://tools.ietf.org/html/rfc5137#section-5.1:

On 2016/12/14 19:38, Alexey Melnikov wrote:
 > On 14/12/2016 10:21, Julian Reschke wrote:

 >> Has this ever been used in a protocol?

I think this is a very good question. RFC 5137 doesn't even give a full 
example of its very own notation. Also, I don't think \u'ABCD' existed 
before RFC 5137. It smells quite a bit of https://xkcd.com/927/ (but I 
may be wrong, and of course, this area is prone for such phenomena).

 > Some:
 > https://datatracker.ietf.org/doc/rfc5137/referencedby/

That record is very sparse.

 > This was also extensively used in other RFCs without referencing the BCP.

Pointers, please.

Regards,   Martin.

Received on Wednesday, 4 January 2017 02:53:14 UTC