Re: Content-Disposition next steps from Julian Reschke on 2010-12-01 (ietf-http-wg@w3.org from October to December 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 01 Dec 2010 23:47:41 +0100
To: Adam Barth <ietf@adambarth.com>
CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4CF6D08D.5000008@gmx.de>
On 01.12.2010 23:17, Adam Barth wrote:
> ....
>> This *could* be justified by claiming that the filename is advisory only,
>> but it's really not satisfying, in particular when we have evidence that
>> many UAs get away without it.
>
> Hum...  There are two parts to this issue:
>
> 1) How we expect user agents to behave.
> 2) What we write in the document.
>
> IMHO, the user agents who are interested in this information are not
> going to stop performing this decoding (to issue 1), and the document
> should tell the truth with respect to what these user agents are going
> to do (to issue 2).
 > ...

I'm not convinced that all UAs are going to continue to keep 
workarounds; Mozilla, for instance, is considering to drop RFC2047 
decoding, see <https://bugzilla.mozilla.org/show_bug.cgi?id=610054>.

But even if it was the case: the UAs support *different* kinds of 
decoding. So there's no single way we could document that all of them do 
now.

What we *can* do is mention what's going on, and warn about what it 
means for interop. We already do that, but I'm open to add more text to 
that part.

>> Error recovery implies a detectable error. A big part of what you propose
>> changes the interpretation of valid fields.
>
> Perhaps the term "error recovery" isn't helpful?  Another option, of
> course, is to adjust the definition of what is valid to exclude these
> pieces of syntax.

That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in 
a filename. We can *warn* that some UAs misinterpret this, but unless we 
can recommend something else, that's not really helpful ("something 
else" would be RFC 5987 encoding, but Safari and IE do not understand that).

>> Unless you want to ignore it otherwise.
>
> That is certainly one option for consumers.  However, in this
> appendix, we're concerning ourselves with user agents who wish to
> process ill-formed headers.

We really need to separate the discussion about recovering from invalid 
headers from the other thing (applying un-escapings or encodings not 
supported by the specs).

>> But there's also damage, because there's a small risk to misinterpret a
>> value that just happens to look like 2047-encoded.
>
> I'd put that into the "cost" column in the cost/benefit analysis.

Looking forward to that analysis.

>> The same is true for the other recoveries you propose:
>>
>>>             i)  If the word is a well-formed UTF-8 string, emit the word
>>>                 (decoded as UTF-8) and proceed to the next grammatical
>>> element.
>>
>> According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror do
>> exactly that (at least in my locale):
>> <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>
>
> According to your tests Firefox, Chrome, and Safari use UTF-8.  Given
> a free choice of UTF-8 or ISO-8859-1, I'd pick UTF-8, as I've done
> here.

Given a free choice, I'd pick it as well. But there is no free choice, 
as RFC 2616 says that it is ISO-8859-1, and a certain well-deployed 
browser actually does that.

>> That overloads the syntax of the parameter, and it is not done in
>> FF/Opera/Safari/Konqueror:
>> <http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca>
>
> Indeed.  We've discussed this issue at length.  For senders, I suspect
> that the optimal way of generating Content-Disposition headers is to
> avoid using the % character because that character is interpreted
> differently by different user agents.  For receivers, I suspect that
> the optimal way of consuming Content-Disposition headers is to
> %-decode them, as described above.
>
> Perhaps the best outcome, then, is to forbid servers from generating
> the % character?  That way the syntax won't be "overloaded."

"% can't occur in filenames"?

Me not happy.

>>> One thing that might make sense is to demarcate those instructions as
>>> again optional, that is an optional piece of the optional error
>>> recovery, if you like.
>>
>> That would apply to all of RFC2047, UTF-8 defaulting, and
>> percent-unescaping.
>>
>> Are you willing to rephrase the proposal accordingly?
>
> If rephrasing the proposal would be helpful, I'm happy to do that.
> What specifically would you like rephrased?

I think it would be good to

- distinguish between the error-recovery parts and the filename decoding 
parts, and

- make every non-complaint part of the filename decoding (UTF-8 instead 
of ISO-8859-1, RFC2047 decoding, percent-unescaping) truly optional

Best regards, Julian
Received on Wednesday, 1 December 2010 22:48:24 UTC