Re: draft-montenegro-httpbis-uri-encoding from Julian Reschke on 2014-03-21 (ietf-http-wg@w3.org from January to March 2014)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 21 Mar 2014 16:24:41 +0100
To: Nicolas Mailhot <nicolas.mailhot@laposte.net>
CC: Bjoern Hoehrmann <derhoermi@gmx.net>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Gabriel Montenegro <gabriel.montenegro@microsoft.com>
Message-ID: <532C59B9.9090104@gmx.de>

On 2014-03-21 16:11, Nicolas Mailhot wrote:
>
> Le Ven 21 mars 2014 15:54, Julian Reschke a écrit :
>
>> I'll ask again: please present a *concrete* example where the
>> out-of-band metadata helps. This would include a description of where
>> the request comes from, what gets on the wire, what kind of checks your
>> code does, and what it would do differently when it gets the encoding
>> metadata.
>
> I've already given everything I can without exposing our internal
> architecture which I won't do. There is nothing more complex that URL
> logging, URL regex matching, processing of results in apps (embedded,
> server or desktop side) and human checking that everything work well by
> reading logs or reportings or whatever.

Well, please allow me not to believe in use cases until a concrete one 
is brought up and explained.

> And I've already stated I don't want out of band metadata to declare if
> URLs are in UTF-8, I want out of band metadata to declare when they are
> not, and the processing in this case will be to kill connexions and avoid
> encoding guesswork down the stack.

That's not what Gabriel's draft proposes.

> There is no hidden mystery use case. There is only the basic need to be
> able to decode URLs.

An HTTP URI is a sequence of octets in the range of ASCII code points. 
It can contain percent-escapes, in which case these sequences can be 
decoded to raw octets. That's where the story ends from HTTP's point of 
view.

And yes, *most* of the time these octet sequences represent a character 
sequence that has been encoded using a character encoding scheme (such 
as ISO-8859-1, or UTF-8). I fully agree that it would be nice if you 
could rely on that always being the case, and always being UTF-8, but 
it's simply not true in practice.

I don't believe that adding out-of-band data is going to help. That 
being said, I'm interested in finding out *how* this is going to work in 
practice and how it's going to help, but so far I haven't seen any 
concrete example.

This header field adds a lot of noise to essentially every single HTTP 
request, thus I continue to be very very skeptical about this proposal.

Best regards, Julian

Received on Friday, 21 March 2014 15:31:19 UTC