- From: Anne van Kesteren <annevk@opera.com>
- Date: Fri, 20 Apr 2012 14:52:53 +0200
On Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py at doxdesk.com> wrote: > On 2012-04-20 09:15, Anne van Kesteren wrote: >> Currently browsers differ for what happens when the code point cannot >> be encoded. >> What Gecko does [?%C2%A3] makes the resulting data impossible to >> interpret. >> What WebKit does [?%26%23163%3B] is consistent with form submission. I >> like it. > > I do not! It makes the data impossible to recover just as Gecko does... > in fact worse, because at least Gecko preserves ASCII. With the WebKit > behaviour it becomes impossible to determine from an pure ASCII string > '£' whether the user really typed '?' or '£' into the input > field. You have the same problem with Gecko's behavior and multi-byte encodings. That's actually worse, since an erroneous three byte sequence will put the multi-byte decoders off. > It has the advantage of consistency with the POST behaviour, but that > behaviour is an unpleasant legacy hack which encourages a > misunderstanding of HTML-escaping that promotes XSS vulns. I would not > like to see it spread any further than it already has. It's both GET and POST. So really the only difference here is manually constructed URLs. Also, I think we should flag all non-utf-8 usage. This is mostly about deciding behavior for legacy content, which will already be broken if it runs into this minor edge case. -- Anne van Kesteren http://annevankesteren.nl/
Received on Friday, 20 April 2012 05:52:53 UTC