- From: Amos Jeffries <squid3@treenet.co.nz>
- Date: Mon, 06 Aug 2012 11:03:35 +1200
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Phillip Hallam-Baker <hallam@gmail.com>, <ietf-http-wg@w3.org>
On 06.08.2012 04:39, Poul-Henning Kamp wrote: > Phillip Hallam-Baker writes: >>On Sun, Aug 5, 2012 at 8:31 AM, Poul-Henning Kamp wrote: > >>> But opens you up to DoS attacks along the lines of: >>> >>> GET /ABCDEF.html >>> GET /%41BCDEF.html >>> GET /A%42CDEF.html >>> ... >> >>Those are actually the same URL. Just different encodings. > > That's exactly the point. > > Intermediaries need to decode URI and therefore the question of ASCII > vs. UTF8 performance is relevant. > > But as I said earlier: I'm not sure if the advantage goes to ASCII > with the need for further encoding, or to UTF8 with no further > encoding > needed. BUT, they don't need to know that A or B expanded to certain special directory path macros by the involved apps. Encoding != Interpretation != Character set. * We already have to decode malicious (and just plain stupid over-encodings) for ASCII characters. * We already have to encode UTF-8 characters passed by some clients/servers using UTF-8 internally. => There is no new problem created here. The only relevance this has is that 2.0 native middleware will no longer have to *encode* (memory allocate + copy) the raw-UTF emitted by dumb sources until it takes a 1.1 hop. Decoding is only a copy with no memory allocation - so not optimizing that is sad but does not make anything worse. In short we gain a needed optimization for free, which improves its benefit the further 2.0 rolls out. AYJ
Received on Sunday, 5 August 2012 23:04:04 UTC