Re: *** GMX Spamverdacht *** Re: UTF-8 in URIs from Nicolas Mailhot on 2014-01-17 (ietf-http-wg@w3.org from January to March 2014)

From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
Date: Fri, 17 Jan 2014 11:55:15 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>, "Zhong Yu" <zhong.j.yu@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "Osama Mazahir" <osamam@microsoft.com>, "Dave Thaler" <dthaler@microsoft.com>, "Mike Bishop" <michael.bishop@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
Message-ID: <186c930168e0ef2f38aea640d9c37852.squirrel@arekh.dyndns.org>

Le Ven 17 janvier 2014 11:28, Julian Reschke a écrit :
> On 2014-01-17 11:18, Nicolas Mailhot wrote:
>>
>> Le Jeu 16 janvier 2014 22:32, Julian Reschke a écrit :
>>
>>> A proxy does not need to normalize. Full stop. There is no issue here,
>>
>> A security proxy does need to normalize. Full stop. Otherwise malware
>> can
>> trivially bypass security blocks by fuzzing encoding enough the proxy
>> does
>> not realize anymore the block needs to be applied.
>
> Are you talking about normalization beyond removing unneeded
> percent-escapes?

I'm taking about the very common case when a botnet or malware stain
signature is an URL fragment it tries to communicate with on random zombie
hosts on the Internet. It is very common to configure proxy gateways to
block any access to an url that includes this fragment as first level
defence while more accurate and complete cleanup measures are
investigated.

(malware is the worst case, sometimes it's just misbehaving browser
plugins or other web clients that need blocking to keep the network
operational)

Obviously that only works if the gateway can recognize the URL fragment
without being confused by encoding games. So the gateway does need a
reliable way to map byte chains to the text signature (and there is a text
signature because the app writer did use text stings and not random
constants in his code). Unspecified text encoding conventions in URLs make
reliability go away.

Again, I would like http/2 to specify that URLs are transported as UTF-8
text in http2 metadata (ideally not %-escaped), with the endpoints being
responsible to converting their local representation to this form before
emission, or baring that
1. add encoding info somewhere
2. require the web client and server to fill this info.

But I really would prefer if the wire representation was unambiguous and
encoding conversions pushed to endpoints. That's the model python people
settled on after years of failing to make the "push everything as chain of
bytes, whatever needs text will manage to convert by itself" work. And
http nodes are way less flexible than a python program.

Regards,

-- 
Nicolas Mailhot

Received on Friday, 17 January 2014 10:55:49 UTC