Re: Content-Disposition next steps from Julian Reschke on 2010-12-01 (ietf-http-wg@w3.org from October to December 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 01 Dec 2010 22:44:36 +0100
To: Adam Barth <ietf@adambarth.com>
CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4CF6C1C4.5030409@gmx.de>
On 01.12.2010 22:13, Adam Barth wrote:
> Oh, I thought we discussed this before.  From my perspective, the
> current spec is optimized for server operators.  In particular, it's
> helpful if you'd like to generate the Content-Disposition header.  I
> raised the issue that user agent implementors might like information
> about how to consume the Content-Disposition header.  We then had a
> long discussion about why the optimal generation instructions and
> consumption instructions might not be identical.

I'm aware of that.

> I think, then, Mark suggested that we could include that information
> in an appendix that user agents could optionally implement, if they
> were so inclined.  One important constraint is that the consumption
> requirements agree with the rest of the spec on well-formed headers
> (i.e., on those that meet the generation requirements).

Yes.

But your proposal is already in the grey area because it conflicts with 
the literal reading of the grammar, by applying certain decoding 
operations on something that is not *supposed* to be encoded, or by 
using a charset default which isn't backed by the specs.

This *could* be justified by claiming that the filename is advisory 
only, but it's really not satisfying, in particular when we have 
evidence that many UAs get away without it.

> If it's helpful for you to think about this information as recommended
> error recovery, that's fine with me.  When consuming the

Error recovery implies a detectable error. A big part of what you 
propose changes the interpretation of valid fields.

> Content-Disposition header, it's not especially important to know
> whether the header is well-formed (i.e., generated in accordance to
> the generation requirements).

Unless you want to ignore it otherwise.

>> I note that you have handling of RFC2047-style encoding in there. That's
>> something only Chrome and Firefox are doing, so I'd like to understand why
>> you think it's needed, and whether you think Opera/Safari/Konqueror/IE
>> should implement that (given the fact that changes the semantics of values
>> that are valid).
>
> Yeah, I wasn't sure whether to include the RFC2047 encoding.  I
> certainly wouldn't recommend that servers generate Content-Disposition
> headers using that encoding.  However, if I were writing a new user
> agent today, I might well include RFC2047 support.  It boils down to a
> cost/benefit analysis.  Some comments in the Chrome code indicate that
> there are servers that do generate RFC2047-encoded Content-Disposition
> headers, so there's at least some benefit.

But there's also damage, because there's a small risk to misinterpret a 
value that just happens to look like 2047-encoded.

The same is true for the other recoveries you propose:

>             i)  If the word is a well-formed UTF-8 string, emit the word
>                 (decoded as UTF-8) and proceed to the next grammatical element.

According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror 
do exactly that (at least in my locale): 
<http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>

>         c) Let the url-unescaped-word be the word %-unescaped.
>
>         d) Emit the url-unescaped-word (decoded as UTF-8) and proceed to the
>            next grammatical element.  (There's actually more sadness here if
>            the url-unescaped-word isn't valid UTF-8.)

That overloads the syntax of the parameter, and it is not done in 
FF/Opera/Safari/Konqueror: 
<http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca>

Yes, the current landscape is a mess, but it's a *different* mess in 
each of the various UAs. Your recommendation appears to merge all the 
bad workarounds. My recommendation would be to try to get slowly rid of 
them.

Statistics on which of these workarounds are *really* used would be useful.

> One thing that might make sense is to demarcate those instructions as
> again optional, that is an optional piece of the optional error
> recovery, if you like.

That would apply to all of RFC2047, UTF-8 defaulting, and 
percent-unescaping.

Are you willing to rephrase the proposal accordingly?

Best regards, Julian
Received on Wednesday, 1 December 2010 21:51:58 UTC