Re: draft-montenegro-httpbis-uri-encoding from Nicolas Mailhot on 2014-03-21 (ietf-http-wg@w3.org from January to March 2014)

From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
Date: Fri, 21 Mar 2014 14:05:21 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Mark Nottingham" <mnot@mnot.net>, "HTTP Working Group" <ietf-http-wg@w3.org>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>
Message-ID: <d6aa523bd2ebfa9233201a97c5f551ad.squirrel@arekh.dyndns.org>

Le Ven 21 mars 2014 12:34, Julian Reschke a écrit :
> On 2014-03-21 12:29, Nicolas Mailhot wrote:
>>
>> Le Ven 21 mars 2014 12:01, Julian Reschke a écrit :
>>> On 2014-03-21 11:55, Nicolas Mailhot wrote:
>>
>>> That seems to be the same use case as #1.
>>>
>>> Why don't you just try to UTF-8 decode, and if that works, assume that
>>> it indeed is UTF-8?
>>
>> Really, can't you read the abundant documentation that was written on
>> the
>> massive FAIL duck typing is for encoding (for example, python-side)?
>> Code
>> passing unit tests then failing right and left as soon as some new
>> encoding combo or text triggering encoding differences injected in the
>> system? Piles of piles of partial workarounds till there was complete
>> loss
>> of understanding how they were all supposed to work in the first place?
>
> I understand the problems caused by not knowing what encoding something
> is in. What I don't understand is how an out-of-band signal helps if you
> really can't rely on it being accurate.
>
> Practically, how is a UA supposed to *know* the encoding that was used
> for the URI *unless' it constructed it itself? (Which is not what
> browsers do; they only construct the query part).

If the browser constructed the URL it knows damn well what is the encoding
of its address bar and how to convert to UTF-8

If the browser got the uRL in a web page or feed or whatever all those
documents are supposed to declare an encoding  so they can be interpreted
at all (and there is a default encoding in the spec if they don't) so it
can use that encoding and convert to utf-8 before sending

If the encoding declared in the document or in the http headers the web
site set is wrong things will fail but no more than if the web page author
made a typo in its link. And I want them to fail not propagate errors to
innocent bystanders.

The whole concept of attempting to silently fix problems with heuristics
till web site authors assume they can write garbage and it will be
autocorrected at the cost of security and reliability, can not work on a
large scale. There are too many people willing to exploit the holes the
autocorrection heuristics open right and left. People doing mistakes is
not an excuse to writing fuzzy specs to avoid laying responsibility and
then expect things to work out anyway. That's PHB thinking.

Anyone in 2014, who defines an URL container, and think he can avoid
specifying the encoding of this container, is in for a world of grief and
that won't change whether the http2 spec explicitly fixes this hole or
not. And I'd rather have http2 implementors avoid this particular pitfall
because the spec is clear on the subject.

-- 
Nicolas Mailhot

Received on Friday, 21 March 2014 13:06:09 UTC