Re: Content-Disposition next steps from Adam Barth on 2010-12-03 (ietf-http-wg@w3.org from October to December 2010)

From: Adam Barth <ietf@adambarth.com>
Date: Fri, 3 Dec 2010 15:56:57 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <AANLkTik8N98eu0ppHmKu6HHySd1nFrDYBJxyB5CEOpM7@mail.gmail.com>
On Thu, Dec 2, 2010 at 4:26 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 01.12.2010 23:53, Adam Barth wrote:
>>> Not to mention that it's silly to treat `x=y; filename=example.txt` as if
>>> it had an unrecognized disposition type and should thus be handled as
>>> "attachment", which, say, Internet Explorer and Opera don't do, when
>>> you treat plain `filename=example.txt` as having no disposition type.
>>
>> Perhaps Julian would be willing to add this case to his test suite?
>> Silliness isn't one of the criteria I've applied.
>
> I added
>
> <http://greenbytes.de/tech/tc2231/#attmissingdisposition2>
>
> which fails for FF3/Chrome/Chrome9 (I see shared bugs :-),

Hum...  This one sounds a bit tricky.  It's not clear to me which
option is better.

> and
>
> <http://greenbytes.de/tech/tc2231/#emptydisposition>
>
> which fails just for FF3.

Thanks.

>>> Not to mention that this is utterly silly, if you have "x<name>=..."
>>> this would be handled as if the value had a `name` parameter with the
>>> empty string as value, as opposed to the semantically correct result,
>>> which would be "there is no `name` parameter".
>>
>> Perhaps this is another good test case to add to the suite.  I'm
>> willing to believe this behavior isn't necessary, but I'd like to look
>> at some more evidence before changing it.
>> ...
>
> I'm not totally sure what exactly to test; please elaborate.

Content-Disposition: xfilename=foo.txt

>>> And as you lack higher
>>> level control logic to actually separate parameters, this results in
>>> `example="filename=example.txt"` having the filename `example.txt"`,
>>> as opposed to the correct result, namely that there is no filename.
>>
>> Sounds like another good test case.
>
> <http://greenbytes.de/tech/tc2231/#dispextbadfn>
>
> failing in Chrome only.

Oh good.  I'll update the wiki.  With a more elaborate grammar.

On Thu, Dec 2, 2010 at 4:59 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> That's good, but now I'm asking myself what the precise goal of this
> exercise is.
>
> Writing down what Chrome does was a good idea in that it helped adding a few
> more test cases, of which many fail *only* with Chrome, and some fail for
> FF3 as well (not entirely surprising as the original code was written by the
> same author).
>
> It would be cool to get the same information for closed-source implements
> like IE and Safari. (I exclude Opera here because it already works well in
> comparison).
>
> Now we're moving away from documenting "what Chrome does" to something else.
> What exactly? What's the purpose?

>From my perspective, I'd like there to be a specification of how a
user agent should consume the Content-Disposition header.  I started
with Chrome's behavior because I'm most familiar with it and because
there's evidence that at least one implementor is willing to ship that
behavior.

Ideally, we'd get feedback from other user agent implementors about
what they'd like the specification to say.  We'd then have an easier
time polishing away the more exotic behaviors.  Instead, we're relying
on our collective judgement.

> Do you want UAs to converge on that behavior?

Yes.

> Even those who currently reject invalid header fields?

If a UA wants to reject invalid header fields, that sounds fine to me.
 What I'd like to avoid is there being N different ways of consuming
Content-Disposition, where N is the number of user agent
implementations.

> Note that we're targeting "Proposed Standard" here. It would be great to get
> this published, see how implementations improve (see both Chrome 9 and
> Firefox post version 4), and *then* work on an implementation report for
> Draft Standard.

In this discussion, you keep saying things that imply that we can't
write specs for user agents until all the user agents already have the
same behavior.  We're not mind readers.  It's quite helpful to have a
document that explains how you're supposed to consumer these headers.

>> I'm not sure what point you're trying to make by quoting this part of
>> the document.  That is indeed a warning.
>
> The point is that we already have spec text, which is a warning. Do you want
> it to change?

Personally, I don't feel that strongly about it.  However, I do feel
strongly about keeping the %-decoding in the UA Appendix.  If you're
fine with having both the warning and the %-decoding in the appendix,
that's a workable solution.  If you feel these are in conflict, then
I'd rather change the warning to an error and keep the %-decoding in
the appendix than remove the %-decoding from the appendix.

>>> There's no simple answer to that, unless you're willing to be fully
>>> spec-compliant in which case the spec already tells you what to do.
>>
>> Hence my proposal.
>
> See above, I'm struggling to understand what the proposal actually is. (such
> as: placement, introduction, implication on conformance, ...).

We've been talking about putting it in the appendix.  I'm not sure
whether you need to reference it from the introduction.  It doesn't
affect conformance for any conformance class.

>>> If *all* of this is optional (you can pick whatever you like), then this
>>> is
>>> just another way to document the current state of implementations, in
>>> which
>>> case I'd recommend we just augment what's already in Appendix C.
>>
>> I don't really understand what you mean by this paragraph.  The
>> framing we've been discussing is "if you'd like more information about
>> consuming the Content-Disposition header, here's a recommendation for
>> how you might like to do that."
>
> I wouldn't want to "recommend" a behavior that makes existing
> implementations less compliant for valid messages.
>
> I'm less concerned about processing invalid messages, but I'll say again
> that there's little interoperability for those messages, so I just don't see
> why we care.

We care because we want there to be more interoperability in the
future.  The goal of writing standards is to improve interoperability.

>> You write:
>>
>> fail (saves "oo.html" (what's going on here?, see Chrome Issue 52577))
>>
>> what's going on is that the "\" is being treated as a directory
>> separator and Chrome is giving you the "leaf" name of the path.
>
> OK, so it fails to do the unescaping on quoted-string. It would be great if
> this could be fixed.

I'm not sure what you mean by "fixed."  It's unclear whether user
agents want to do \-decoding on the file name, especially because \ is
a common directory separator on some operating systems.

> Looking at
> <http://trac.tools.ietf.org/wg/httpbis/trac/wiki/ContentDispositionErrorHandling?version=7>:
>
>> Determining the Disposition
>>
>> To determine the disposition-type, parse the Content-Disposition header
>> field using the following grammar:
>>
>> unparsed-string = *LWS nominal-type *OCTET
>> nominal-type    = "inline" / "filename" / "name" / ";"
>>
>> If the Content-Disposition header field is non-empty and fails to parse,
>> then the disposition type is "attachment". Otherwise, the disposition-type
>> is "inline".
>
> Neither "filename" nor "name" are disposition types.

Indeed.

> It suggests that you
> can leave out the disposition type and get it treated as attachment;
> <http://greenbytes.de/tech/tc2231/#attmissingdisposition> indicates
> otherwise.

I'm not sure I understand what you're saying.  The wiki text matches
UA behavior for
http://greenbytes.de/tech/tc2231/#attmissingdisposition.  Is there
another test case you're worried about?

>> Extracting Parameter Values From Header Fields
>>
>> To extract the value for a given parameter-name from an unparsed-string,
>> parse the unparsed-string using the following grammar:
>>
>> unparsed-string = *OCTET name *LWS "=" value [ ";" *OCTET ]
>> value           = <OCTET, except ";">
>>
>> where the name production is a gramatical production that is a
>> case-insensitive match for the given parameter-name. If the unparsed-string
>> can be parsed by the grammar in multple ways, choose the one in which name
>> appears as close to the beginning of the string as possible. If the
>> unparsed-string cannot be parsed by the grammar above, return the empty
>> string.
>
> This doesn't handle quoted strings.

How would you like quoted strings to be handled.  According to your
tests, what we should do is strip off matching leading and trailing "
characters and be careful to capture ; inside of ".  However, your
tests show that we should not \-decode the value.  I'm happy to make
that change.

>> Decoding the File Name
>>
>> To filename-decode an encoded-string, use the following algorithm:
>>
>>   1. If the encoded-string contains non-ASCII characters, emit the
>> encoded-string (decoded as ISO-8859-1) and abort these steps.
>
> So by adding a non-ASCII character I can prevent percent-unescaping? Is this
> implemented anywhere?

I'd encourage you to write a test and find out.  :)

>>   2. Let the url-unescaped-string be the encoded-string %-unescaped.
>>   3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually
>> more sadness here if the url-unescaped-string isn't valid UTF-8.)
>>
>> The emitted characters are the decoded file name.
>
> <permathread>Why would we recommend something that only Chrome and IE do
> (and IE only does for some locals)</permathread>

As you indicate, we've discussed this issue at length.  If you can
convince IE to remove this behavior, then we might be able to remove
it from this document.  Otherwise, we'd like to compete with IE in
this respect.

Adam
Received on Friday, 3 December 2010 23:58:10 UTC