- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 02 Nov 2010 19:23:27 +0100
- To: Adam Barth <ietf@adambarth.com>
- CC: Mark Nottingham <mnot@mnot.net>, httpbis <ietf-http-wg@w3.org>
On 02.11.2010 19:09, Adam Barth wrote: > ... > We can debate who's got the burden of proof, but it doesn't really > matter. An easier path is probably to just ask jungshik. > >> If the data is obtained by through Chrome users we will also have get access >> to the actual site names that do this, checking whether they already have >> separate code paths for different browsers. > > Yes. Another issue is selection bias. > >> Also, related to issue 263, it would be great if you could find out whether >> Chrome always use UTF-8 when percent-unescaping, or tried to follow IE. >> >> I know that Asian IE installations *did* not use UTF-8 unless the browser >> was configured for use of UTF-8 when *generating* URIs. It would be great if >> we could confirm what the exact condition is though. > > Here's what the code says: > > // Non-ASCII string is passed through and treated as UTF-8 as long as > // it's valid as UTF-8 and regardless of |referrer_charset|. > > // Non-ASCII/Non-UTF-8 string. Fall back to the referrer charset. > > // Non-ASCII/Non-UTF-8 string. Fall back to the native codepage. > // TODO(jungshik): We need to set the OS default codepage > // to a specific value before testing. On Windows, we can use > // SetThreadLocale(). Thanks for looking this up. "referrer charset" is the page from where the request comes, right? Of course there's no guarantee that this will always be the same. Also, this seems to fallback to the local codepage only for non-UTF-8 and missing referrer? That does not match my experience with IE7. > There are tests to that effect in > <http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util_unittest.cc>, > but i haven't looked at the implementation. > > If you're willing to not spec what happens to invalid header field > values, it sounds like you could spec that the filename parameter is > first %-decoded and then UTF8 decoded. The nutty behavior appears to > only rear its ugly head when your %-encoded value isn't valid UTF8 > (which you could decide what an "invalid" header field value). > ... That may be true for Chrome, but certainly *was* not true for IE when I first encountered the problem (trust me, I *was* sending UTF-8). > Now, of course, that would still leave us with UA-sniffing code on > servers until everyone implements the spec, but that at least sounds > implementable, unlike the current document, and puts us on a path to a > better future. Why do you keep saying the current document is not implementable? That's not helpful; the RFC 2231 encoding has three independant implementations (four if I'm allowed to count iCab). Best regards, Julian
Received on Tuesday, 2 November 2010 18:24:08 UTC