Re: Content-Disposition next steps

On 04.12.2010 00:56, Adam Barth wrote:
> On Thu, Dec 2, 2010 at 4:26 AM, Julian Reschke<julian.reschke@gmx.de>  wrote:
>> I added
>>
>> <http://greenbytes.de/tech/tc2231/#attmissingdisposition2>
>>
>> which fails for FF3/Chrome/Chrome9 (I see shared bugs :-),
>
> Hum...  This one sounds a bit tricky.  It's not clear to me which
> option is better.
> ...

The specs are clear on it, and all UAs except FF and Chrome get this 
right. This seems to be a case where it's clear what's "better" (even if 
we may disagree on the metrics for "better").

>> and
>>
>> <http://greenbytes.de/tech/tc2231/#emptydisposition>
>>
>> which fails just for FF3.
>
> Thanks.

Let's hope that this can get resolved in FF when the C-D related changes 
land after release of FF4.

>> I'm not totally sure what exactly to test; please elaborate.
>
> Content-Disposition: xfilename=foo.txt

I have added a test for

   Content-Disposition: attachment; xfilename=foo.txt

(the one you proposed would have been invalid anyway). This is

   <http://greenbytes.de/tech/tc2231/#attconfusedparam>

and it fails for Chrome only (reported as 
<http://code.google.com/p/chromium/issues/detail?id=65423>).

>> <http://greenbytes.de/tech/tc2231/#dispextbadfn>
>>
>> failing in Chrome only.
>
> Oh good.  I'll update the wiki.  With a more elaborate grammar.

See also <http://code.google.com/p/chromium/issues/detail?id=65276>.

>> From my perspective, I'd like there to be a specification of how a
> user agent should consume the Content-Disposition header.  I started
> with Chrome's behavior because I'm most familiar with it and because
> there's evidence that at least one implementor is willing to ship that
> behavior.

The specification says how to consume *valid* header instances (I know 
you know that, but I'm repeating it for people who might not have read 
all of this thread).

What's up for discussion is whether we want to talk about handling 
invalid headers, how to do that, and what kind of conformance comes with 
that.

> Ideally, we'd get feedback from other user agent implementors about
> what they'd like the specification to say.  We'd then have an easier
> time polishing away the more exotic behaviors.  Instead, we're relying
> on our collective judgement.

As I said before, I'm not very interested in getting more interop for 
broken headers. What concerns me much more is interop for valid headers, 
and delaying the spec again and again doesn't help here.

So, is there any *hope* that we'll see that feedback from the other 
browser vendors? Maciej? Anne? Eric? Robert?

>> Do you want UAs to converge on that behavior?
>
> Yes.

Then I'd propose that you add an introduction to the Wiki clearly saying 
how you want it to appear in the spec.

Personally, I don't think it's a useful exercise for this Working Group, 
as the observed behavior for broken headers differs a lot (from proper 
parsing as in Konqueror to naive substring matching in some other UAs :-).

This is different, from, for instance, handling broken HTML (where we 
actually have evidence that UAs need to do this to stay in business).

>> Even those who currently reject invalid header fields?
>
> If a UA wants to reject invalid header fields, that sounds fine to me.
>   What I'd like to avoid is there being N different ways of consuming
> Content-Disposition, where N is the number of user agent
> implementations.

There should be one way to handle valid characters. I'm less interested 
in consistent behavior for invalid headers *unless* there's a related 
security risk.

As this whole discussion also applies to the actual set of HTTP specs we 
want to finish, I'd appreciated feedback from other WG members on 
whether they are interested in adding this type of information. And 
those who are please say whether you're willing to contribute test cases 
and report back from your code bases.

>> Note that we're targeting "Proposed Standard" here. It would be great to get
>> this published, see how implementations improve (see both Chrome 9 and
>> Firefox post version 4), and *then* work on an implementation report for
>> Draft Standard.
>
> In this discussion, you keep saying things that imply that we can't
> write specs for user agents until all the user agents already have the
> same behavior.  We're not mind readers.  It's quite helpful to have a
> document that explains how you're supposed to consumer these headers.

On the contrary. What I was trying to say is that this is for 
"Proposed". There's a next stage which, in the IETF process, is actually 
the first stage where we're looking at implementations. If this WG can't 
come up with a consensus on this issue, there will be another 
opportunity when we go to Draft Standard.

>> The point is that we already have spec text, which is a warning. Do you want
>> it to change?
>
> Personally, I don't feel that strongly about it.  However, I do feel
> strongly about keeping the %-decoding in the UA Appendix.  If you're
> fine with having both the warning and the %-decoding in the appendix,
> that's a workable solution.  If you feel these are in conflict, then
> I'd rather change the warning to an error and keep the %-decoding in
> the appendix than remove the %-decoding from the appendix.

I think they are in conflict, and wouldn't want to see any 
recommendation to do %-unescaping. It would make the spec incompatible 
with the previous spec (a conformance change), and it's only implemented 
in two out of six UAs I'm testing.

>> See above, I'm struggling to understand what the proposal actually is. (such
>> as: placement, introduction, implication on conformance, ...).
>
> We've been talking about putting it in the appendix.  I'm not sure
> whether you need to reference it from the introduction.  It doesn't
> affect conformance for any conformance class.

By all means please provide the complete text for the introduction of 
the Appendix. That's essential to understand what the expectations on 
implementations are.

If there aren't any, such as "we just think this is a good idea" I'd 
propose to have it in a separate document which may or not may be a WG 
work item.

>> I'm less concerned about processing invalid messages, but I'll say again
>> that there's little interoperability for those messages, so I just don't see
>> why we care.
>
> We care because we want there to be more interoperability in the
> future.  The goal of writing standards is to improve interoperability.

Yes, but we usually draw a line between things we care about and things 
we don't. We happen to disagree on where to draw that line.

>>> You write:
>>>
>>> fail (saves "oo.html" (what's going on here?, see Chrome Issue 52577))
>>>
>>> what's going on is that the "\" is being treated as a directory
>>> separator and Chrome is giving you the "leaf" name of the path.
>>
>> OK, so it fails to do the unescaping on quoted-string. It would be great if
>> this could be fixed.
>
> I'm not sure what you mean by "fixed."  It's unclear whether user
> agents want to do \-decoding on the file name, especially because \ is
> a common directory separator on some operating systems.

"Fixing" means "changing things to work as specified".

So the question here is whether it would break things because there are 
servers sending unescaped backslashes. As far as I can tell, sending 
path separators in the filename indicates a bug in the sender, or an 
attempt to trick the user agent to do something it's not supposed to do.

So the "harm" of actually doing the unescaping would be that for a 
filename that needs to be postprocessed anyway, the problematic 
character would be filtered in a different way.

Starting with

   filename="a\bc"

the broken implementation sees "a" and "bc" separated by a path 
separator, and will prost-process this to "abc", "a_bc" or "bc" (where _ 
could be a different replacement character).

A correct implementation sees "abc".

I don't think there's a problem here.

>> Looking at
>> <http://trac.tools.ietf.org/wg/httpbis/trac/wiki/ContentDispositionErrorHandling?version=7>:
>>
>>> Determining the Disposition
>>>
>>> To determine the disposition-type, parse the Content-Disposition header
>>> field using the following grammar:
>>>
>>> unparsed-string = *LWS nominal-type *OCTET
>>> nominal-type    = "inline" / "filename" / "name" / ";"
>>>
>>> If the Content-Disposition header field is non-empty and fails to parse,
>>> then the disposition type is "attachment". Otherwise, the disposition-type
>>> is "inline".
>>
>> Neither "filename" nor "name" are disposition types.
>
> Indeed.

It's confusing and makes reviewing the text harder than it needs to be.

>> It suggests that you
>> can leave out the disposition type and get it treated as attachment;
>> <http://greenbytes.de/tech/tc2231/#attmissingdisposition>  indicates
>> otherwise.
>
> I'm not sure I understand what you're saying.  The wiki text matches
> UA behavior for
> http://greenbytes.de/tech/tc2231/#attmissingdisposition.  Is there
> another test case you're worried about?

No, sorry, I got this wrong.

>>> Extracting Parameter Values From Header Fields
>>>
>>> To extract the value for a given parameter-name from an unparsed-string,
>>> parse the unparsed-string using the following grammar:
>>>
>>> unparsed-string = *OCTET name *LWS "=" value [ ";" *OCTET ]
>>> value           =<OCTET, except ";">
>>>
>>> where the name production is a gramatical production that is a
>>> case-insensitive match for the given parameter-name. If the unparsed-string
>>> can be parsed by the grammar in multple ways, choose the one in which name
>>> appears as close to the beginning of the string as possible. If the
>>> unparsed-string cannot be parsed by the grammar above, return the empty
>>> string.
>>
>> This doesn't handle quoted strings.
>
> How would you like quoted strings to be handled.  According to your
> tests, what we should do is strip off matching leading and trailing "
> characters and be careful to capture ; inside of ".  However, your
> tests show that we should not \-decode the value.  I'm happy to make
> that change.

Quoted strings should be handled as specified, removing the quotes and 
performing \-unescaping. The tests show that indeed a majority of UAs 
get this wrong but that doesn't make it magically right.

I'd prefer that we invest our time to reduce the bugs in the UAs, 
instead of documenting them.

Note that fixing the quoted-string handling has already a proposed patch 
in Mozilla.

>>> Decoding the File Name
>>>
>>> To filename-decode an encoded-string, use the following algorithm:
>>>
>>>    1. If the encoded-string contains non-ASCII characters, emit the
>>> encoded-string (decoded as ISO-8859-1) and abort these steps.
>>
>> So by adding a non-ASCII character I can prevent percent-unescaping? Is this
>> implemented anywhere?
>
> I'd encourage you to write a test and find out.  :)

Writing tests doesn't come for free, and every test I add needs to be 
maintained and re-run. So I'd prefer to understand it's worth the time 
before.

>>>    2. Let the url-unescaped-string be the encoded-string %-unescaped.
>>>    3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually
>>> more sadness here if the url-unescaped-string isn't valid UTF-8.)
>>>
>>> The emitted characters are the decoded file name.
>>
>> <permathread>Why would we recommend something that only Chrome and IE do
>> (and IE only does for some locals)</permathread>
>
> As you indicate, we've discussed this issue at length.  If you can
> convince IE to remove this behavior, then we might be able to remove
> it from this document.  Otherwise, we'd like to compete with IE in
> this respect.

So can you convince Safari, Opera, Firefox, and Konqueror to adopt this 
handling as well? Otherwise I don't think we'll make progress. Two UAs 
do something funny, the other four do not. I don't want the 
specification to reflect those implementation bugs -- even if they can't 
be realistically removed from these UAs anytime soon.

Best regards, Julian

Received on Saturday, 4 December 2010 11:18:48 UTC