Re: Content-Disposition next steps from Julian Reschke on 2010-12-02 (ietf-http-wg@w3.org from October to December 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 02 Dec 2010 13:59:05 +0100
To: Adam Barth <ietf@adambarth.com>
CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4CF79819.603@gmx.de>
On 02.12.2010 01:25, Adam Barth wrote:
> ...
>> Ok, here's a +1 to exclude it.
>
> Done.
> ...

That's good, but now I'm asking myself what the precise goal of this 
exercise is.

Writing down what Chrome does was a good idea in that it helped adding a 
few more test cases, of which many fail *only* with Chrome, and some 
fail for FF3 as well (not entirely surprising as the original code was 
written by the same author).

It would be cool to get the same information for closed-source 
implements like IE and Safari. (I exclude Opera here because it already 
works well in comparison).

Now we're moving away from documenting "what Chrome does" to something 
else. What exactly? What's the purpose? Do you want UAs to converge on 
that behavior? Even those who currently reject invalid header fields?

Note that we're targeting "Proposed Standard" here. It would be great to 
get this published, see how implementations improve (see both Chrome 9 
and Firefox post version 4), and *then* work on an implementation report 
for Draft Standard.

>>>> That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in a
>>>> filename.
>>>
>>> Yes.
>>>
>>>> We can *warn* that some UAs misinterpret this, but unless we can
>>>> recommend something else, that's not really helpful ("something else"
>>>> would
>>>> be RFC 5987 encoding, but Safari and IE do not understand that).
>>>
>>> Well, we can warn or we can forbid.  The choice is ours.
>>
>> <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-04.html#rfc.section.3.3>:
>>
>> "Note: Many user agents do not properly handle escape characters when using
>> the quoted-string form. Furthermore, some user agents erroneously try to
>> perform unescaping of "percent" escapes (see Appendix C.2), and thus might
>> misinterpret filenames containing the percent character followed by two hex
>> digits."
>
> I'm not sure what point you're trying to make by quoting this part of
> the document.  That is indeed a warning.

The point is that we already have spec text, which is a warning. Do you 
want it to change?

>>>> We really need to separate the discussion about recovering from invalid
>>>> headers from the other thing (applying un-escapings or encodings not
>>>> supported by the specs).
>>>
>>> They seem related.  The question at hand is how to consume the
>>> Content-Disposition header.
>>
>> There's no simple answer to that, unless you're willing to be fully
>> spec-compliant in which case the spec already tells you what to do.
>
> Hence my proposal.

See above, I'm struggling to understand what the proposal actually is. 
(such as: placement, introduction, implication on conformance, ...).

> ...
>> Sorry? The spec (2616) says it's ISO-8859-1. IE takes it as ISO-8859-1.
>> Opera and Konqueror agree. I don't think there's room for choice here.
>
> I've changed this to ISO-8859-1.  One advantage of ISO-8859-1 is that
> it can't fail to decode, so we can remove some of the sadness that
> happens when UTF-8 decoding fails.
> ...

Thanks.

>> It would be helpful if you could provide text that is actually
>> self-contained and ready to appear in the spec; in particular that means
>> that this needs an explanation about who this is for, and what the
>> conformance requirements are. If some parts are optional, these need to be
>> marked.
>
> I'm happy to do that once we've agreed on the substance.  Bjoern had
> some test cases we should be sure we're happy with.  If you're willing
> to update your page with the rest of the test cases, I can look at
> them and adjust the text accordingly.
 > ...

I can't "agree" on anything unless I know what it is for, sorry.

>> If *all* of this is optional (you can pick whatever you like), then this is
>> just another way to document the current state of implementations, in which
>> case I'd recommend we just augment what's already in Appendix C.
>
> I don't really understand what you mean by this paragraph.  The
> framing we've been discussing is "if you'd like more information about
> consuming the Content-Disposition header, here's a recommendation for
> how you might like to do that."

I wouldn't want to "recommend" a behavior that makes existing 
implementations less compliant for valid messages.

I'm less concerned about processing invalid messages, but I'll say again 
that there's little interoperability for those messages, so I just don't 
see why we care.

> ...
> On Wed, Dec 1, 2010 at 3:19 PM, Julian Reschke<julian.reschke@gmx.de>  wrote:
>> On 01.12.2010 23:53, Adam Barth wrote:
>>>> Not to mention that it's silly to treat `x=y; filename=example.txt` as if
>>>> it had an unrecognized disposition type and should thus be handled as
>>>> "attachment", which, say, Internet Explorer and Opera don't do, when
>>>> you treat plain `filename=example.txt` as having no disposition type.
>>>
>>> Perhaps Julian would be willing to add this case to his test suite?
>>> Silliness isn't one of the criteria I've applied.
>>> ...
>>
>> Will do tomorrow.
>
> Great.  Let me know when it's posted and I'll take a look.  I really
> appreciate your cross-testing a number of browsers.  That's
> tremendously helpful in figuring out what to do in these cases.

See preceding mail.

> ...
>> That's probably because Chrome doesn't handle quoted-string properly:
>> <http://greenbytes.de/tech/tc2231/#attwithasciifnescapedchar>
>
> You write:
>
> fail (saves "oo.html" (what's going on here?, see Chrome Issue 52577))
>
> what's going on is that the "\" is being treated as a directory
> separator and Chrome is giving you the "leaf" name of the path.

OK, so it fails to do the unescaping on quoted-string. It would be great 
if this could be fixed.

> ...

Looking at 
<http://trac.tools.ietf.org/wg/httpbis/trac/wiki/ContentDispositionErrorHandling?version=7>:

> Determining the Disposition
>
> To determine the disposition-type, parse the Content-Disposition header field using the following grammar:
>
> unparsed-string = *LWS nominal-type *OCTET
> nominal-type    = "inline" / "filename" / "name" / ";"
>
> If the Content-Disposition header field is non-empty and fails to parse, then the disposition type is "attachment". Otherwise, the disposition-type is "inline".

Neither "filename" nor "name" are disposition types. It suggests that 
you can leave out the disposition type and get it treated as attachment; 
<http://greenbytes.de/tech/tc2231/#attmissingdisposition> indicates 
otherwise.

> Extracting Parameter Values From Header Fields ¶
>
> To extract the value for a given parameter-name from an unparsed-string, parse the unparsed-string using the following grammar:
>
> unparsed-string = *OCTET name *LWS "=" value [ ";" *OCTET ]
> value           = <OCTET, except ";">
>
> where the name production is a gramatical production that is a case-insensitive match for the given parameter-name. If the unparsed-string can be parsed by the grammar in multple ways, choose the one in which name appears as close to the beginning of the string as possible. If the unparsed-string cannot be parsed by the grammar above, return the empty string.

This doesn't handle quoted strings.

> Decoding the File Name ¶
>
> To filename-decode an encoded-string, use the following algorithm:
>
>    1. If the encoded-string contains non-ASCII characters, emit the encoded-string (decoded as ISO-8859-1) and abort these steps.

So by adding a non-ASCII character I can prevent percent-unescaping? Is 
this implemented anywhere?

>    2. Let the url-unescaped-string be the encoded-string %-unescaped.
>    3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually more sadness here if the url-unescaped-string isn't valid UTF-8.)
>
> The emitted characters are the decoded file name.

<permathread>Why would we recommend something that only Chrome and IE do 
(and IE only does for some locals)</permathread>

Best regards, Julian
Received on Thursday, 2 December 2010 12:59:48 UTC