- From: Kari Hurtta <hurtta-ietf@elmme-mailer.org>
- Date: Wed, 14 Dec 2016 19:39:58 +0200 (EET)
- To: Matthew Kerwin <matthew@kerwin.net.au>
- CC: Julian Reschke <julian.reschke@gmx.de>, Alexey Melnikov <alexey.melnikov@isode.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>, Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
Matthew Kerwin <matthew@kerwin.net.au>: (Wed Dec 14 13:53:45 2016) > It says that "forms that use explicit string delimiters are generally > preferred over other alternatives. In many contexts, symmetric paired > delimiters are easier to recognize and understand than visually unrelated > ones." So brackets are good. > > And while it advises against using Perl's \x{NNNN...} syntax (because of > potential ambiguities with two-digit hex codes), it doesn't say anything at > all about \u{N...} > > Curly braces cost 14+15 bits in HPACK, parentheses 10+10 (incidentally > cheaper than single quotes, which are 11+11). It's also convenient that > little 'u' is one bit cheaper than little 'x'. > > I don't think parentheses are at too much risk of needing escaping, so it > seems like the solution that goes with BCP 137, and compresses alright with > HPACK, is: > > %x5c.75.28 1*6HEXDIGIT %x29 > > It's still a little bit clunky for things like "Stra\u(df)e", but not so > bad for emoji "\u(1f602)" and somewhere in between for Hiragana " > \u(3053)\u(3093)\u(306b)\u(3064)". I think that this is best suggestion so far. But can this also be shorter ? %x5c.28 1*6HEXDIGIT %x29 Makes \(3064) { Yes, it is not visible that this is hexadecimal. } Although EmbeddedUnicodeChar = %x5C.75.27 4*6HEXDIG %x27 works for me. > Cheers​ > > > > > Best regards, Julian > > > > PS: and, as a nit, it's strange that the syntax uses delimiters but > > doesn't allow sequences of 1 to 3 HEXDIGs... > > > > > ​Having just written "\u(df)" I kind of understand; it really feels like > I'm describing an octet rather than a codepoint. I don't think there's a > *technical* reason, though. Yes. > Is it alright to see "\u(9)" or an equivalent > in text? Or is that "\(9)" alright if 'u' is also dropped. If that wanted to be avoid, that means %x5c.75.28 3*6HEXDIGIT %x29 or %x5c.28 3*6HEXDIGIT %x29 on my newest suggestion. > -- > Matthew Kerwin > http://matthew.kerwin.net.au/ / Kari Hurtta
Received on Wednesday, 14 December 2016 17:45:01 UTC