Re: FYI... Binary Optimized Header Encoding for SPDY from Jonathan Ballard on 2012-08-06 (ietf-http-wg@w3.org from July to September 2012)

From: Jonathan Ballard <dzonatas@gmail.com>
Date: Mon, 6 Aug 2012 10:50:58 -0700
To: James M Snell <jasnell@gmail.com>
Cc: Roberto Peon <grmocg@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Mike Belshe <mike@belshe.com>
Message-ID: <CAAPAK-7rG_OH8-jinvYFbi1uVR77jL=eO53M2Q2SwN+4xSZn3w@mail.gmail.com>

The %-encoding is literal compared to entities. If there are no such
literals in the URI then, we can assume the message is ASCII. That is true
to the point we don't see any registered UTF8 headers here unless I skipped
something in HTTP/2.0.

On Monday, August 6, 2012, James M Snell wrote:

>
> On Mon, Aug 6, 2012 at 9:28 AM, Roberto Peon <grmocg@gmail.com<javascript:_e({}, 'cvml', 'grmocg@gmail.com');>
> > wrote:
>
>>
>> On Aug 6, 2012 12:21 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp<javascript:_e({}, 'cvml', 'duerst@it.aoyama.ac.jp');>>
>> wrote:
>> >
>> > On 2012/08/04 2:33, Roberto Peon wrote:
>> >>
>> >> I'm biased against utf-8, because it is trivial at the application
>> layer to
>> >> write a function which encodes and/or decodes it.
>> >
>> >
>> > It's maybe trivial to write such functions once, but it's a total waste
>> of time to write them over and over.
>>
>> But they don't... it is almost always a single function call where the
>> function is provided to them.
>>
>>
> Except when it's not... which happens much more frequently than you may
> imagine. There is actually quite a bit of inconsistency in the various,
> more complex headers such as Authorization, Content-Disposition, Link, etc.
> Developers don't always know when they need to be using Base64, RFC5987,
> B-Coding, Q-Coding, %-encoding, or no encoding allowed at all and the
> existing specification does not provide any help at all.
>
>> >
>> >
>> >> I see that handling utf-8 adds complexity
>> >
>> >
>> > What complexity?
>>
>> Reencoding to ASCII for http/1.1, checking that all the characters are
>> actually displayable, parsing the dang strings in the cases where it does
>> wish to encode a multi byte character.
>>
>> I don't see why proxies should have to do this. I don't care, however, so
>> long as a distinction is made for opaque (user set) headers, at which point
>> you could use an xor encoding for all I care.
>>
>>
> But, I thought writing encoders was a trivial exercise? And the proxy
> really shouldn't have to care if the characters are displayable.. in fact,
> for the vast overwhelming majority of cases, proxies will simply treat the
> headers as opaque and pass them along. It would be excellent if HTTP/2.0
> could make things a bit easier for application developers too.
>
>
>> >
>> >
>> >> to the protocol but buys the
>> >> protocol nothing.
>> >
>> >
>> > It doesn't buy the protocol itself much. But it buys the users of the
>> protocol a lot.
>>
>> Which users? I'm having a hard time imagining why metadata has to be
>> utf-8.
>>
>
> No one said it *HAS* to be. Again, simply saying that a particular value
> could contain UTF-8 characters doesn't mean we change the specific value
> definitions of individual headers. Allowing for UTF-8, however, would
> provide benefits to non-english speaking users who are making use of things
> like IDN's, Content-Disposition, Link header titles, etc. The typical every
> day browser user likely would not see a difference, but developers and
> users of HTTP API's would likely see significant advantage.
>
>
>> >
>> >
>> >> It adds minimal advantage for the entities using the
>> >> protocol, and makes intermediaries lives more difficult since they'll
>> have
>> >> to do more verification.
>> >>
>> >> Saying that the protocol handles sending a length-delimited string or a
>> >> string guaranteed not to include '\n' would be fine, however, as at
>> that
>> >> point whatever encoding is used in any particular header value is
>> matter of
>> >> the client-app and server, as it should be for things that the protocol
>> >> doesn't need to know about.
>> >
>> >
>> > No, it is not fine. First, for most headers, interoperability should be
>> between all clients and all servers.
>>
>> The person who wrote the application also controls the server. They will
>> interpret the byte stream how they see fit.
>> It is the other parties to the exchange that won't-- forward and reverse
>> proxies, for instance.
>>
> It is certainly not always true that the person writing the application
> controls the server. In fact, I would say quite the opposite in countless
> cases.
>
>> > Second, it is absolutely no fun for client apps developers to solve the
>> same character encoding problem again and again. It's just useless work,
>> prone to errors.
>>
>> No disagreement there. Am I wrong about such functions already being
>> provided to such client app writers?
>>
> Spotty, at best. Oh sure, Base64 encoders and decoders are a dime a dozen
> and there are implementations of other encoders out there. *That* is the
> easy part. It's knowing when encoding is required, and which particular
> scheme to use in which particular case, and dealing with inconsistent
> implementation across servers and clients, and dealing with intermediaries
> in the middle that potentially screw things up further.
>
> - James
>
>> > If you got told today that the host header can be in ASCII or EBCDIC,
>> it's just between your client and your server, what would you say?
>>
>> I'd say to ignore EBCDIC in more colorful words :)
>> -=R
>>
>> >
>> > Regards,    Martin.
>>
>
>

Received on Monday, 6 August 2012 18:16:41 UTC