- From: Matthew Kerwin <matthew@kerwin.net.au>
- Date: Thu, 15 Dec 2016 11:57:40 +1000
- To: Kari Hurtta <hurtta-ietf@elmme-mailer.org>
- Cc: Julian Reschke <julian.reschke@gmx.de>, Alexey Melnikov <alexey.melnikov@isode.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
- Message-ID: <CACweHNDbv9dDXqjpU61HvfpgZ6Dt4S-CG=GjwOZcwaZh6LEirQ@mail.gmail.com>
On 15 December 2016 at 03:39, Kari Hurtta <hurtta-ietf@elmme-mailer.org> wrote: > Matthew Kerwin <matthew@kerwin.net.au>: (Wed Dec 14 13:53:45 2016) > > It says that "forms that use explicit string delimiters are generally > > preferred over other alternatives. In many contexts, symmetric paired > > delimiters are easier to recognize and understand than visually unrelated > > ones." So brackets are good. > > > > And while it advises against using Perl's \x{NNNN...} syntax (because of > > potential ambiguities with two-digit hex codes), it doesn't say anything > at > > all about \u{N...} > > > I have should noted here that Ruby uses this \u{N...} syntax, including the lower limit of one hexadecimal digit. This is a valid string literal in Ruby: "\u{df}\u{9}\u{1f602}" > > Curly braces cost 14+15 bits in HPACK, parentheses 10+10 (incidentally > > cheaper than single quotes, which are 11+11). It's also convenient that > > little 'u' is one bit cheaper than little 'x'. > > > > I don't think parentheses are at too much risk of needing escaping, so it > > seems like the solution that goes with BCP 137, and compresses alright > with > > HPACK, is: > > > > %x5c.75.28 1*6HEXDIGIT %x29 > > > > It's still a little bit clunky for things like "Stra\u(df)e", but not so > > bad for emoji "\u(1f602)" and somewhere in between for Hiragana " > > \u(3053)\u(3093)\u(306b)\u(3064)". > > > I think that this is best suggestion so far. > > But can this also be shorter ? > > %x5c.28 1*6HEXDIGIT %x29 > > Makes > > \(3064) > > > { Yes, it is not visible that this is hexadecimal. } > > There is precedent, although I'm not sure if it's a good precedent: the "content" attribute in CSS uses: %5c 1*6HEXDIGIT ...which is both undelimited (which I oppose) and without an explicit hexadecimal indicator (about which I'm mostly ambivalent.) > > Although > > EmbeddedUnicodeChar = %x5C.75.27 4*6HEXDIG %x27 > > works for me. > > I suppose it comes down to a question of which data we want to target for optimisation, and then taking measurements and evaluating them. It sounds like Julian thinks «%x5c.75 DELIM 1*6HEXDIGIT DELIM» "\u(abc)" is verbose, and we don't have many opinions yet on «%x5c.28 1*6HEXDIGIT %x29» "\(abc)" I'm not sure at what point this decision becomes so minor that it's just paint on a bike shed. :) > > Cheers > > > > > > > > > Best regards, Julian > > > > > > PS: and, as a nit, it's strange that the syntax uses delimiters but > > > doesn't allow sequences of 1 to 3 HEXDIGs... > > > > > > > > Having just written "\u(df)" I kind of understand; it really feels like > > I'm describing an octet rather than a codepoint. I don't think there's a > > *technical* reason, though. > > Yes. > > > Is it alright to see "\u(9)" or an > equivalent > > in text? > > Or is that "\(9)" alright if 'u' is also dropped. > > If that wanted to be avoid, that means > > %x5c.75.28 3*6HEXDIGIT %x29 > > or > > %x5c.28 3*6HEXDIGIT %x29 > > on my newest suggestion. > > Left-padding a with zeroes to make three digits screams "octal" at me, even when they're not all octal digits, which elicits an even stronger Pavlovian response. I think it has to be either 1*6 or 4*6 , and I lean towards 1*6 . Cheers -- Matthew Kerwin http://matthew.kerwin.net.au/
Received on Thursday, 15 December 2016 01:58:14 UTC