Re: FYI... Binary Optimized Header Encoding for SPDY from James M Snell on 2012-08-06 (ietf-http-wg@w3.org from July to September 2012)

From: James M Snell <jasnell@gmail.com>
Date: Mon, 6 Aug 2012 10:00:42 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Jonathan Ballard <dzonatas@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Mike Belshe <mike@belshe.com>
Message-ID: <CABP7RbfGcBcLXipo5KJ7yPXbHh-cvFAAt76iMt_XRi=cbuPYZA@mail.gmail.com>

On Mon, Aug 6, 2012 at 9:28 AM, Roberto Peon <grmocg@gmail.com> wrote:

>
> On Aug 6, 2012 12:21 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote:
> >
> > On 2012/08/04 2:33, Roberto Peon wrote:
> >>
> >> I'm biased against utf-8, because it is trivial at the application
> layer to
> >> write a function which encodes and/or decodes it.
> >
> >
> > It's maybe trivial to write such functions once, but it's a total waste
> of time to write them over and over.
>
> But they don't... it is almost always a single function call where the
> function is provided to them.
>
>
Except when it's not... which happens much more frequently than you may
imagine. There is actually quite a bit of inconsistency in the various,
more complex headers such as Authorization, Content-Disposition, Link, etc.
Developers don't always know when they need to be using Base64, RFC5987,
B-Coding, Q-Coding, %-encoding, or no encoding allowed at all and the
existing specification does not provide any help at all.

> >
> >
> >> I see that handling utf-8 adds complexity
> >
> >
> > What complexity?
>
> Reencoding to ASCII for http/1.1, checking that all the characters are
> actually displayable, parsing the dang strings in the cases where it does
> wish to encode a multi byte character.
>
> I don't see why proxies should have to do this. I don't care, however, so
> long as a distinction is made for opaque (user set) headers, at which point
> you could use an xor encoding for all I care.
>
>
But, I thought writing encoders was a trivial exercise? And the proxy
really shouldn't have to care if the characters are displayable.. in fact,
for the vast overwhelming majority of cases, proxies will simply treat the
headers as opaque and pass them along. It would be excellent if HTTP/2.0
could make things a bit easier for application developers too.


> >
> >
> >> to the protocol but buys the
> >> protocol nothing.
> >
> >
> > It doesn't buy the protocol itself much. But it buys the users of the
> protocol a lot.
>
> Which users? I'm having a hard time imagining why metadata has to be utf-8.
>

No one said it *HAS* to be. Again, simply saying that a particular value
could contain UTF-8 characters doesn't mean we change the specific value
definitions of individual headers. Allowing for UTF-8, however, would
provide benefits to non-english speaking users who are making use of things
like IDN's, Content-Disposition, Link header titles, etc. The typical every
day browser user likely would not see a difference, but developers and
users of HTTP API's would likely see significant advantage.


> >
> >
> >> It adds minimal advantage for the entities using the
> >> protocol, and makes intermediaries lives more difficult since they'll
> have
> >> to do more verification.
> >>
> >> Saying that the protocol handles sending a length-delimited string or a
> >> string guaranteed not to include '\n' would be fine, however, as at that
> >> point whatever encoding is used in any particular header value is
> matter of
> >> the client-app and server, as it should be for things that the protocol
> >> doesn't need to know about.
> >
> >
> > No, it is not fine. First, for most headers, interoperability should be
> between all clients and all servers.
>
> The person who wrote the application also controls the server. They will
> interpret the byte stream how they see fit.
> It is the other parties to the exchange that won't-- forward and reverse
> proxies, for instance.
>
It is certainly not always true that the person writing the application
controls the server. In fact, I would say quite the opposite in countless
cases.

> > Second, it is absolutely no fun for client apps developers to solve the
> same character encoding problem again and again. It's just useless work,
> prone to errors.
>
> No disagreement there. Am I wrong about such functions already being
> provided to such client app writers?
>
Spotty, at best. Oh sure, Base64 encoders and decoders are a dime a dozen
and there are implementations of other encoders out there. *That* is the
easy part. It's knowing when encoding is required, and which particular
scheme to use in which particular case, and dealing with inconsistent
implementation across servers and clients, and dealing with intermediaries
in the middle that potentially screw things up further.

- James

> > If you got told today that the host header can be in ASCII or EBCDIC,
> it's just between your client and your server, what would you say?
>
> I'd say to ignore EBCDIC in more colorful words :)
> -=R
>
> >
> > Regards,    Martin.
>

Received on Monday, 6 August 2012 17:01:30 UTC