Re: Content-Disposition next steps from Julian Reschke on 2010-12-01 (ietf-http-wg@w3.org from October to December 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 02 Dec 2010 00:15:26 +0100
To: Adam Barth <ietf@adambarth.com>
CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4CF6D70E.90009@gmx.de>
On 01.12.2010 23:59, Adam Barth wrote:
> ...
>> I'm not convinced that all UAs are going to continue to keep workarounds;
>> Mozilla, for instance, is considering to drop RFC2047 decoding, see
>> <https://bugzilla.mozilla.org/show_bug.cgi?id=610054>.
>
> RFC2047 support is particularly tempting to remove.  As I wrote
> before, I debated whether to include it.
> ...

Ok, here's a +1 to exclude it.

>> But even if it was the case: the UAs support *different* kinds of decoding.
>> So there's no single way we could document that all of them do now.
>
> Indeed.  Fortunately we're not documenting all of them in this document.

Well, you're mentioning most.

>> What we *can* do is mention what's going on, and warn about what it means
>> for interop. We already do that, but I'm open to add more text to that part.
>
> That's indeed useful information for servers but not particularly
> helpful to user agent implementors.

Helpful advice for UA implementers would be "get rid of spec violations 
if you can".

>> That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in a
>> filename.
>
> Yes.
>
>> We can *warn* that some UAs misinterpret this, but unless we can
>> recommend something else, that's not really helpful ("something else" would
>> be RFC 5987 encoding, but Safari and IE do not understand that).
>
> Well, we can warn or we can forbid.  The choice is ours.

<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-04.html#rfc.section.3.3>:

"Note: Many user agents do not properly handle escape characters when 
using the quoted-string form. Furthermore, some user agents erroneously 
try to perform unescaping of "percent" escapes (see Appendix C.2), and 
thus might misinterpret filenames containing the percent character 
followed by two hex digits."

>>>> Unless you want to ignore it otherwise.
>>>
>>> That is certainly one option for consumers.  However, in this
>>> appendix, we're concerning ourselves with user agents who wish to
>>> process ill-formed headers.
>>
>> We really need to separate the discussion about recovering from invalid
>> headers from the other thing (applying un-escapings or encodings not
>> supported by the specs).
>
> They seem related.  The question at hand is how to consume the
> Content-Disposition header.

There's no simple answer to that, unless you're willing to be fully 
spec-compliant in which case the spec already tells you what to do.

>>>> The same is true for the other recoveries you propose:
>>>>
>>>>>             i)  If the word is a well-formed UTF-8 string, emit the word
>>>>>                 (decoded as UTF-8) and proceed to the next grammatical
>>>>> element.
>>>>
>>>> According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror
>>>> do
>>>> exactly that (at least in my locale):
>>>> <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>
>>>
>>> According to your tests Firefox, Chrome, and Safari use UTF-8.  Given
>>> a free choice of UTF-8 or ISO-8859-1, I'd pick UTF-8, as I've done
>>> here.
>>
>> Given a free choice, I'd pick it as well. But there is no free choice, as
>> RFC 2616 says that it is ISO-8859-1, and a certain well-deployed browser
>> actually does that.
>
> Fortunately neither of those things prevents us from making this choice here.

Sorry? The spec (2616) says it's ISO-8859-1. IE takes it as ISO-8859-1. 
Opera and Konqueror agree. I don't think there's room for choice here.

>>>>> One thing that might make sense is to demarcate those instructions as
>>>>> again optional, that is an optional piece of the optional error
>>>>> recovery, if you like.
>>>>
>>>> That would apply to all of RFC2047, UTF-8 defaulting, and
>>>> percent-unescaping.
>>>>
>>>> Are you willing to rephrase the proposal accordingly?
>>>
>>> If rephrasing the proposal would be helpful, I'm happy to do that.
>>> What specifically would you like rephrased?
>>
>> I think it would be good to
>>
>> - distinguish between the error-recovery parts and the filename decoding
>> parts, and
>
> Fortunately, this should be easy since they're textually contained in
> different sections.

Ack.

>> - make every non-complaint part of the filename decoding (UTF-8 instead of
>> ISO-8859-1, RFC2047 decoding, percent-unescaping) truly optional
>
> Everything about this appendix is optional, so we should be fine on that score.

It would be helpful if you could provide text that is actually 
self-contained and ready to appear in the spec; in particular that means 
that this needs an explanation about who this is for, and what the 
conformance requirements are. If some parts are optional, these need to 
be marked.

If *all* of this is optional (you can pick whatever you like), then this 
is just another way to document the current state of implementations, in 
which case I'd recommend we just augment what's already in Appendix C.

Best regards, Julian
Received on Wednesday, 1 December 2010 23:16:08 UTC