- From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
- Date: Fri, 21 Mar 2014 14:24:02 +0100
- To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
- Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Julian Reschke" <julian.reschke@gmx.de>, "Mark Nottingham" <mnot@mnot.net>, "HTTP Working Group" <ietf-http-wg@w3.org>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>
Le Ven 21 mars 2014 13:54, Bjoern Hoehrmann a écrit : > * Nicolas Mailhot wrote: >>Really, can't you read the abundant documentation that was written on the >>massive FAIL duck typing is for encoding (for example, python-side)? Code >>passing unit tests then failing right and left as soon as some new >>encoding combo or text triggering encoding differences injected in the >>system? Piles of piles of partial workarounds till there was complete >> loss >>of understanding how they were all supposed to work in the first place? >> >>That's the last thing you want to reinvent on security equipments (and >>you'll reinvent it because the amount of non-ASCII urls is small now but >>will only grow with time). > > Julian asked for a concrete example use case. So far you have not given > one. It might help to assume the rest of us understands the subject at > hand at least as well as you do. As I wrote last time he asked the same question, on some of our networks accesses are controlled by regex-like checks on URL and not knowing the encoding of processes URLs means this processing (and the processing of security logs) is unreliable. We already had several security incidents where carefully crafted urls triggered security equipment bugs (so far, not using encoding tricks just plain ascii but the writing is on the wall). The first in-the-wild uses of punicode already triggered bugs in code that assumed everything is ascii (and that's ok we can fix this case because it is clearly defined – not fix every random encoding permutation people can invent). Not knowing encoding propagates encoding heuristics in all layers of the software stack – from security appliances, to log handlers, to the apps that process their output to inform users on their web usage, to the stupid spreadsheet macros people use to simplify reporting/billing/quick analysis tasks. The only sane way to limit the problem scope is to convert everything to a single universal well managed encoding at entry point. No different than what the python people did. No different from database schema handling (if you don't use unicode and utc in your databases by default today, you deserve the problems you get). Text is an early decoding problem space, not lazy last-mile just-as-needed decoding problem space. -- Nicolas Mailhot
Received on Friday, 21 March 2014 13:24:47 UTC