- From: Kari Hurtta <hurtta-ietf@elmme-mailer.org>
- Date: Wed, 14 Dec 2016 19:39:58 +0200 (EET)
- To: Matthew Kerwin <matthew@kerwin.net.au>
- CC: Julian Reschke <julian.reschke@gmx.de>, Alexey Melnikov <alexey.melnikov@isode.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>, Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
Matthew Kerwin <matthew@kerwin.net.au>: (Wed Dec 14 13:53:45 2016)
> It says that "forms that use explicit string delimiters are generally
> preferred over other alternatives. In many contexts, symmetric paired
> delimiters are easier to recognize and understand than visually unrelated
> ones." So brackets are good.
>
> And while it advises against using Perl's \x{NNNN...} syntax (because of
> potential ambiguities with two-digit hex codes), it doesn't say anything at
> all about \u{N...}
>
> Curly braces cost 14+15 bits in HPACK, parentheses 10+10 (incidentally
> cheaper than single quotes, which are 11+11). It's also convenient that
> little 'u' is one bit cheaper than little 'x'.
>
> I don't think parentheses are at too much risk of needing escaping, so it
> seems like the solution that goes with BCP 137, and compresses alright with
> HPACK, is:
>
> %x5c.75.28 1*6HEXDIGIT %x29
>
> It's still a little bit clunky for things like "Stra\u(df)e", but not so
> bad for emoji "\u(1f602)" and somewhere in between for Hiragana "
> \u(3053)\u(3093)\u(306b)\u(3064)".
I think that this is best suggestion so far.
But can this also be shorter ?
%x5c.28 1*6HEXDIGIT %x29
Makes
\(3064)
{ Yes, it is not visible that this is hexadecimal. }
Although
EmbeddedUnicodeChar = %x5C.75.27 4*6HEXDIG %x27
works for me.
> Cheers​
>
>
>
> > Best regards, Julian
> >
> > PS: and, as a nit, it's strange that the syntax uses delimiters but
> > doesn't allow sequences of 1 to 3 HEXDIGs...
> >
> >
> ​Having just written "\u(df)" I kind of understand; it really feels like
> I'm describing an octet rather than a codepoint. I don't think there's a
> *technical* reason, though.
Yes.
> Is it alright to see "\u(9)" or an equivalent
> in text?
Or is that "\(9)" alright if 'u' is also dropped.
If that wanted to be avoid, that means
%x5c.75.28 3*6HEXDIGIT %x29
or
%x5c.28 3*6HEXDIGIT %x29
on my newest suggestion.
> --
> Matthew Kerwin
> http://matthew.kerwin.net.au/
/ Kari Hurtta
Received on Wednesday, 14 December 2016 17:45:01 UTC