W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2012

[whatwg] URL query component

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 20 Apr 2012 14:52:53 +0200
Message-ID: <op.wc2d2fn464w2qv@annevk-macbookpro.local>
On Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py at doxdesk.com> wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote:
>> Currently browsers differ for what happens when the code point cannot  
>> be encoded.
>> What Gecko does [?%C2%A3] makes the resulting data impossible to  
>> interpret.
>> What WebKit does [?%26%23163%3B] is consistent with form submission. I  
>> like it.
> I do not! It makes the data impossible to recover just as Gecko does...  
> in fact worse, because at least Gecko preserves ASCII. With the WebKit  
> behaviour it becomes impossible to determine from an pure ASCII string  
> '&#163;' whether the user really typed '?' or '&#163;' into the input  
> field.

You have the same problem with Gecko's behavior and multi-byte encodings.  
That's actually worse, since an erroneous three byte sequence will put the  
multi-byte decoders off.

> It has the advantage of consistency with the POST behaviour, but that  
> behaviour is an unpleasant legacy hack which encourages a  
> misunderstanding of HTML-escaping that promotes XSS vulns. I would not  
> like to see it spread any further than it already has.

It's both GET and POST. So really the only difference here is manually  
constructed URLs.

Also, I think we should flag all non-utf-8 usage. This is mostly about  
deciding behavior for legacy content, which will already be broken if it  
runs into this minor edge case.

Anne van Kesteren
Received on Friday, 20 April 2012 05:52:53 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:41 UTC