Re: expected results for URI encoding tests? from Julian Reschke on 2008-06-27 (public-html@w3.org from June 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 27 Jun 2008 16:07:24 +0200
To: Philip Taylor <pjt47@cam.ac.uk>
CC: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <4864F41C.2090703@gmx.de>

Philip Taylor wrote:
> Julian Reschke wrote:
>> [...]
>>
>> But even when the document encoding is percent-escaped, there's still 
>> an issue when a character in the input "URL" can not be mapped to the 
>> document encoding; it would be nice to have a test case for that (or 
>> do we?).
> 
> I'm not sure if one of Hixie's tests covers this already, so I just 
> tried the same 002.html test case as before but with 
> 'results.cgi/\u2639?\u2639':
> 
> IE6, Opera 9.5, Safari 3.0 go to "results.cgi/%E2%98%B9??" (i.e. replace 
> unmappable characters with an ASCII "?").
> 
> FF2, FF3 go to "results.cgi/%E2%98%B9?%E2%98%B9".
> 
> In particular, FF2/FF3 appear to switch to encoding a component as UTF-8 
> if it contains a character that can't be mapped into the normal 
> character set. So in FF3:
> 
> '/\u017d?\u017d' => '/%C5%BD?%DE'
> '/\u017d?\u017d\u2639' => '/%C5%BD?%C5%BD%E2%98%B9'
> '/\u017d\u2639?\u017d' => '/%C5%BD%E2%98%B9?%DE'
> 
> i.e. the encoding of the query depends on the characters in it.
> 
> (I haven't uploaded test cases for this anywhere, since I don't have a 
> trivial way to make the results easy to interpret.)

Wow, thanks for testing.

This shows that encoding the query part using the document encoding is 
fragile, and can easily lead to data loss.

We really should try to define a way that yields UTF-8 based encoding 
independently of the document's encoding.

BR, Julian

Received on Friday, 27 June 2008 14:08:07 UTC