Re: Issue 261: Check for requirements backing test cases, was: Comments on draft-ietf-httpbis-content-disp

On Tue, Nov 2, 2010 at 11:23 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 02.11.2010 19:09, Adam Barth wrote:
>> We can debate who's got the burden of proof, but it doesn't really
>> matter.  An easier path is probably to just ask jungshik.
>>
>>> If the data is obtained by through Chrome users we will also have get
>>> access
>>> to the actual site names that do this, checking whether they already have
>>> separate code paths for different browsers.
>>
>> Yes.  Another issue is selection bias.
>>
>>> Also, related to issue 263, it would be great if you could find out
>>> whether
>>> Chrome always use UTF-8 when percent-unescaping, or tried to follow IE.
>>>
>>> I know that Asian IE installations *did* not use UTF-8 unless the browser
>>> was configured for use of UTF-8 when *generating* URIs. It would be great
>>> if
>>> we could confirm what the exact condition is though.
>>
>> Here's what the code says:
>>
>>     // Non-ASCII string is passed through and treated as UTF-8 as long as
>>     // it's valid as UTF-8 and regardless of |referrer_charset|.
>>
>>     // Non-ASCII/Non-UTF-8 string. Fall back to the referrer charset.
>>
>>     // Non-ASCII/Non-UTF-8 string. Fall back to the native codepage.
>>     // TODO(jungshik): We need to set the OS default codepage
>>     // to a specific value before testing. On Windows, we can use
>>     // SetThreadLocale().
>
> Thanks for looking this up.
>
> "referrer charset" is the page from where the request comes, right? Of
> course there's no guarantee that this will always be the same.

Charset calculations are somewhat complicated.  Roughly speaking, the
referrer charset is the charset (if any) used by the resource that
cause this request to be generated.  How that gets computed exactly is
kind of hairy, but that's the idea.

> Also, this seems to fallback to the local codepage only for non-UTF-8 and
> missing referrer? That does not match my experience with IE7.

I suspect jungshik didn't match IE exactly but made an informed
decision about what might be acceptable both w.r.t. compatibility and
w.r.t. sanity.

>> There are tests to that effect in
>>
>> <http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util_unittest.cc>,
>> but i haven't looked at the implementation.
>>
>> If you're willing to not spec what happens to invalid header field
>> values, it sounds like you could spec that the filename parameter is
>> first %-decoded and then UTF8 decoded.  The nutty behavior appears to
>> only rear its ugly head when your %-encoded value isn't valid UTF8
>> (which you could decide what an "invalid" header field value).
>> ...
>
> That may be true for Chrome, but certainly *was* not true for IE when I
> first encountered the problem (trust me, I *was* sending UTF-8).

Indeed.  We don't need to slavishly copy IE.  We just need to stay
within some compatibility bound.  Usually we aim for somewhere between
99.99 and 99.999% compatibility.  That usually isn't the same as the
unanimous intersection of all browser behavior.

>> Now, of course, that would still leave us with UA-sniffing code on
>> servers until everyone implements the spec, but that at least sounds
>> implementable, unlike the current document, and puts us on a path to a
>> better future.
>
> Why do you keep saying the current document is not implementable? That's not
> helpful; the RFC 2231 encoding has three independant implementations (four
> if I'm allowed to count iCab).

The document has two pieces:

1) filename
2) filename*

What the document says about (1) isn't implementable by some number of
user agents.  What the document says about (2) is suboptimal, but
might be implementable if we can get jungshik on board (and whoever
the relevant decision makers are on the ie-team).

>From this discussion, it seems you care much more about (2) than about
(1).  One alternative is to remove the parts of the document that
discuss (1) since those appear to be more controversial than the parts
that discuss (2).

Adam

Received on Tuesday, 2 November 2010 18:37:36 UTC