Re: draft-montenegro-httpbis-uri-encoding from Nicolas Mailhot on 2014-03-21 (ietf-http-wg@w3.org from January to March 2014)

From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
Date: Fri, 21 Mar 2014 14:47:30 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Mark Nottingham" <mnot@mnot.net>, "HTTP Working Group" <ietf-http-wg@w3.org>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>
Message-ID: <4a2282830d3bea0bd430f9679a2ef023.squirrel@arekh.dyndns.org>

Le Ven 21 mars 2014 14:18, Julian Reschke a écrit :
> On 2014-03-21 14:05, Nicolas Mailhot wrote:
>> ...
>>> Practically, how is a UA supposed to *know* the encoding that was used
>>> for the URI *unless' it constructed it itself? (Which is not what
>>> browsers do; they only construct the query part).
>>
>> If the browser constructed the URL it knows damn well what is the
>> encoding
>> of its address bar and how to convert to UTF-8
>
> OK. But that is true only if the URI was constructed by parsing the
> address bar. It's not the case when following links in documents (when
> try are already percent-escaped).
>
>> If the browser got the uRL in a web page or feed or whatever all those
>> documents are supposed to declare an encoding  so they can be
>> interpreted
>> at all (and there is a default encoding in the spec if they don't) so it
>> can use that encoding and convert to utf-8 before sending
>
> That's only helps when the link wasn't percent-escaped in the first place.

I'll give you a big secret: nobody writes in percent-escaped manually if
he can avoid it, just like nobody uses html entities.

The bulk of percent-escaped urls has been produced by automatons
converting human-written plain text that used the document main encoding,
so yes I do expect both encodings to match if the automaton was coded
properly.

Regards,

-- 
Nicolas Mailhot

Received on Friday, 21 March 2014 13:48:16 UTC