W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2012

[whatwg] URL query component

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 20 Apr 2012 14:52:53 +0200
Message-ID: <op.wc2d2fn464w2qv@annevk-macbookpro.local>
On Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py at doxdesk.com> wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote:
>> Currently browsers differ for what happens when the code point cannot  
>> be encoded.
>> What Gecko does [?%C2%A3] makes the resulting data impossible to  
>> interpret.
>> What WebKit does [?%26%23163%3B] is consistent with form submission. I  
>> like it.
>
> I do not! It makes the data impossible to recover just as Gecko does...  
> in fact worse, because at least Gecko preserves ASCII. With the WebKit  
> behaviour it becomes impossible to determine from an pure ASCII string  
> '&#163;' whether the user really typed '?' or '&#163;' into the input  
> field.

You have the same problem with Gecko's behavior and multi-byte encodings.  
That's actually worse, since an erroneous three byte sequence will put the  
multi-byte decoders off.


> It has the advantage of consistency with the POST behaviour, but that  
> behaviour is an unpleasant legacy hack which encourages a  
> misunderstanding of HTML-escaping that promotes XSS vulns. I would not  
> like to see it spread any further than it already has.

It's both GET and POST. So really the only difference here is manually  
constructed URLs.

Also, I think we should flag all non-utf-8 usage. This is mostly about  
deciding behavior for legacy content, which will already be broken if it  
runs into this minor edge case.


-- 
Anne van Kesteren
http://annevankesteren.nl/
Received on Friday, 20 April 2012 05:52:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:07 GMT