W3C home > Mailing lists > Public > public-html@w3.org > June 2008

Re: expected results for URI encoding tests?

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 27 Jun 2008 16:07:24 +0200
Message-ID: <4864F41C.2090703@gmx.de>
To: Philip Taylor <pjt47@cam.ac.uk>
CC: "public-html@w3.org WG" <public-html@w3.org>

Philip Taylor wrote:
> Julian Reschke wrote:
>> [...]
>>
>> But even when the document encoding is percent-escaped, there's still 
>> an issue when a character in the input "URL" can not be mapped to the 
>> document encoding; it would be nice to have a test case for that (or 
>> do we?).
> 
> I'm not sure if one of Hixie's tests covers this already, so I just 
> tried the same 002.html test case as before but with 
> 'results.cgi/\u2639?\u2639':
> 
> IE6, Opera 9.5, Safari 3.0 go to "results.cgi/%E2%98%B9??" (i.e. replace 
> unmappable characters with an ASCII "?").
> 
> FF2, FF3 go to "results.cgi/%E2%98%B9?%E2%98%B9".
> 
> In particular, FF2/FF3 appear to switch to encoding a component as UTF-8 
> if it contains a character that can't be mapped into the normal 
> character set. So in FF3:
> 
> '/\u017d?\u017d' => '/%C5%BD?%DE'
> '/\u017d?\u017d\u2639' => '/%C5%BD?%C5%BD%E2%98%B9'
> '/\u017d\u2639?\u017d' => '/%C5%BD%E2%98%B9?%DE'
> 
> i.e. the encoding of the query depends on the characters in it.
> 
> (I haven't uploaded test cases for this anywhere, since I don't have a 
> trivial way to make the results easy to interpret.)

Wow, thanks for testing.

This shows that encoding the query part using the document encoding is 
fragile, and can easily lead to data loss.

We really should try to define a way that yields UTF-8 based encoding 
independently of the document's encoding.

BR, Julian
Received on Friday, 27 June 2008 14:08:07 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:55 UTC