Re: FYI... Binary Optimized Header Encoding for SPDY from Amos Jeffries on 2012-08-06 (ietf-http-wg@w3.org from July to September 2012)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Tue, 07 Aug 2012 11:08:35 +1200
To: <ietf-http-wg@w3.org>
Message-ID: <ab966b02aa8c08a3c53bc55457b7f64d@treenet.co.nz>
On 07.08.2012 04:28, Roberto Peon wrote:
> On Aug 6, 2012 12:21 AM, Martin J. Dürst wrote:
>>
>> On 2012/08/04 2:33, Roberto Peon wrote:
>>>
>>> I'm biased against utf-8, because it is trivial at the application 
>>> layer
> to
>>> write a function which encodes and/or decodes it.
>>
>>
>> It's maybe trivial to write such functions once, but it's a total 
>> waste
> of time to write them over and over.
>
> But they don't... it is almost always a single function call where 
> the
> function is provided to them.
>
>>
>>
>>> I see that handling utf-8 adds complexity
>>
>>
>> What complexity?
>
> Reencoding to ASCII for http/1.1, checking that all the characters 
> are
> actually displayable, parsing the dang strings in the cases where it 
> does
> wish to encode a multi byte character.
>
> I don't see why proxies should have to do this. I don't care, 
> however, so
> long as a distinction is made for opaque (user set) headers, at which 
> point
> you could use an xor encoding for all I care.
>>
>>
>>> to the protocol but buys the
>>> protocol nothing.
>>
>>
>> It doesn't buy the protocol itself much. But it buys the users of 
>> the
> protocol a lot.
>
> Which users? I'm having a hard time imagining why metadata has to be 
> utf-8.
>

names: File names, user names, country names, domain names, protocol 
names, User-Agent: names, Server: names, Accept-* names, type names ... 
metadata is chock full of names when you start looking at it closely. 
And "for some strange reason" people around the world insist on being 
able to send/receive them in different languages and non-American 
spellings nowdays.


>>
>>> It adds minimal advantage for the entities using the
>>> protocol, and makes intermediaries lives more difficult since 
>>> they'll
> have
>>> to do more verification.
>>>
>>> Saying that the protocol handles sending a length-delimited string 
>>> or a
>>> string guaranteed not to include '\n' would be fine, however, as at 
>>> that
>>> point whatever encoding is used in any particular header value is 
>>> matter
> of
>>> the client-app and server, as it should be for things that the 
>>> protocol
>>> doesn't need to know about.
>>
>>
>> No, it is not fine. First, for most headers, interoperability should 
>> be
> between all clients and all servers.
>
> The person who wrote the application also controls the server. They 
> will
> interpret the byte stream how they see fit.
> It is the other parties to the exchange that won't-- forward and 
> reverse
> proxies, for instance.
>

+1000.

>> Second, it is absolutely no fun for client apps developers to solve 
>> the
> same character encoding problem again and again. It's just useless 
> work,
> prone to errors.
>
> No disagreement there. Am I wrong about such functions already being
> provided to such client app writers?
>

They are. But the problem I imagine Martin is speaking of is also 
present at that same abstract level. App developers remembering 
over-and-over whether it was serialize before X-encode, then Y-encode, 
or X-encode, de-serialize, Y-encode on the sending handler or receiving 
handler - for which header the data is being appended. Its work to 
remember, its work to lookup, its work to even remember where to lookup.

  In the popular dynamic content frameworks these decisions and 
orderings have to be done by hand in almost every page generator script. 
It is tiring and mistake prone - I just spent most of yesterday 
debugging why one graph out of a whole set would not display - due to 
serlialize data being passed to url-decode first (result being empty 
dataset), encodings only used because the header cant contain a raw 
semi-colon or comma.

Removing the need for developers to do the encoding at all would be a 
great thing. Partially this is a framework problem, partially a protocol 
one. The frameworks allow developers to write custom headers directly 
into the protocol - including customization of the standard headers. I 
don't see that changing. However,...

  For 2.0 they will have to be providing a translation layer in the app 
either changing familiar HTTP/1.1 labels to 2.0 binary ones or providing 
a better header API which enables the framework to take up the encoding 
workload a lot easier. That opens the door to UTF-8 in a safe way.


>> If you got told today that the host header can be in ASCII or 
>> EBCDIC,
> it's just between your client and your server, what would you say?
>
> I'd say to ignore EBCDIC in more colorful words :)

Then your coding partner decides to use *one* header which offers only 
EBCDIC vs UTF-8... insert identical colour full words here.

AYJ

> -=R
>
>>
>> Regards,    Martin.
Received on Monday, 6 August 2012 23:09:04 UTC