- From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
- Date: Fri, 17 Jan 2014 11:55:15 +0100
- To: "Julian Reschke" <julian.reschke@gmx.de>
- Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>, "Zhong Yu" <zhong.j.yu@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "Osama Mazahir" <osamam@microsoft.com>, "Dave Thaler" <dthaler@microsoft.com>, "Mike Bishop" <michael.bishop@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
Le Ven 17 janvier 2014 11:28, Julian Reschke a écrit : > On 2014-01-17 11:18, Nicolas Mailhot wrote: >> >> Le Jeu 16 janvier 2014 22:32, Julian Reschke a écrit : >> >>> A proxy does not need to normalize. Full stop. There is no issue here, >> >> A security proxy does need to normalize. Full stop. Otherwise malware >> can >> trivially bypass security blocks by fuzzing encoding enough the proxy >> does >> not realize anymore the block needs to be applied. > > Are you talking about normalization beyond removing unneeded > percent-escapes? I'm taking about the very common case when a botnet or malware stain signature is an URL fragment it tries to communicate with on random zombie hosts on the Internet. It is very common to configure proxy gateways to block any access to an url that includes this fragment as first level defence while more accurate and complete cleanup measures are investigated. (malware is the worst case, sometimes it's just misbehaving browser plugins or other web clients that need blocking to keep the network operational) Obviously that only works if the gateway can recognize the URL fragment without being confused by encoding games. So the gateway does need a reliable way to map byte chains to the text signature (and there is a text signature because the app writer did use text stings and not random constants in his code). Unspecified text encoding conventions in URLs make reliability go away. Again, I would like http/2 to specify that URLs are transported as UTF-8 text in http2 metadata (ideally not %-escaped), with the endpoints being responsible to converting their local representation to this form before emission, or baring that 1. add encoding info somewhere 2. require the web client and server to fill this info. But I really would prefer if the wire representation was unambiguous and encoding conversions pushed to endpoints. That's the model python people settled on after years of failing to make the "push everything as chain of bytes, whatever needs text will manage to convert by itself" work. And http nodes are way less flexible than a python program. Regards, -- Nicolas Mailhot
Received on Friday, 17 January 2014 10:55:49 UTC