Re: Issue 261: Check for requirements backing test cases, was: Comments on draft-ietf-httpbis-content-disp

On Tue, Nov 2, 2010 at 10:25 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 02.11.2010 18:15, Adam Barth wrote:
>>> Maybe those sites just gave up on IE and Chrome, and have been sending
>>> filename* to everybody else for a long time?
>>>
>>> It would *really* be useful to get beyond the hearsay and actually get
>>> data.
>>
>> I don't have that data off hand.  We can gather it.  It just takes
>> time and effort.
>
> Understood. But I think you really need to support that claim with data.

We can debate who's got the burden of proof, but it doesn't really
matter.  An easier path is probably to just ask jungshik.

> If the data is obtained by through Chrome users we will also have get access
> to the actual site names that do this, checking whether they already have
> separate code paths for different browsers.

Yes.  Another issue is selection bias.

> Also, related to issue 263, it would be great if you could find out whether
> Chrome always use UTF-8 when percent-unescaping, or tried to follow IE.
>
> I know that Asian IE installations *did* not use UTF-8 unless the browser
> was configured for use of UTF-8 when *generating* URIs. It would be great if
> we could confirm what the exact condition is though.

Here's what the code says:

    // Non-ASCII string is passed through and treated as UTF-8 as long as
    // it's valid as UTF-8 and regardless of |referrer_charset|.

    // Non-ASCII/Non-UTF-8 string. Fall back to the referrer charset.

    // Non-ASCII/Non-UTF-8 string. Fall back to the native codepage.
    // TODO(jungshik): We need to set the OS default codepage
    // to a specific value before testing. On Windows, we can use
    // SetThreadLocale().

There are tests to that effect in
<http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util_unittest.cc>,
but i haven't looked at the implementation.

If you're willing to not spec what happens to invalid header field
values, it sounds like you could spec that the filename parameter is
first %-decoded and then UTF8 decoded.  The nutty behavior appears to
only rear its ugly head when your %-encoded value isn't valid UTF8
(which you could decide what an "invalid" header field value).

Now, of course, that would still leave us with UA-sniffing code on
servers until everyone implements the spec, but that at least sounds
implementable, unlike the current document, and puts us on a path to a
better future.

Adam

Received on Tuesday, 2 November 2010 18:10:22 UTC