Re: Content-Disposition next steps

On Wed, Dec 1, 2010 at 2:47 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 01.12.2010 23:17, Adam Barth wrote:
>>> This *could* be justified by claiming that the filename is advisory only,
>>> but it's really not satisfying, in particular when we have evidence that
>>> many UAs get away without it.
>>
>> Hum...  There are two parts to this issue:
>>
>> 1) How we expect user agents to behave.
>> 2) What we write in the document.
>>
>> IMHO, the user agents who are interested in this information are not
>> going to stop performing this decoding (to issue 1), and the document
>> should tell the truth with respect to what these user agents are going
>> to do (to issue 2).
>
> I'm not convinced that all UAs are going to continue to keep workarounds;
> Mozilla, for instance, is considering to drop RFC2047 decoding, see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=610054>.

RFC2047 support is particularly tempting to remove.  As I wrote
before, I debated whether to include it.

> But even if it was the case: the UAs support *different* kinds of decoding.
> So there's no single way we could document that all of them do now.

Indeed.  Fortunately we're not documenting all of them in this document.

> What we *can* do is mention what's going on, and warn about what it means
> for interop. We already do that, but I'm open to add more text to that part.

That's indeed useful information for servers but not particularly
helpful to user agent implementors.

>>> Error recovery implies a detectable error. A big part of what you propose
>>> changes the interpretation of valid fields.
>>
>> Perhaps the term "error recovery" isn't helpful?  Another option, of
>> course, is to adjust the definition of what is valid to exclude these
>> pieces of syntax.
>
> That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in a
> filename.

Yes.

> We can *warn* that some UAs misinterpret this, but unless we can
> recommend something else, that's not really helpful ("something else" would
> be RFC 5987 encoding, but Safari and IE do not understand that).

Well, we can warn or we can forbid.  The choice is ours.

>>> Unless you want to ignore it otherwise.
>>
>> That is certainly one option for consumers.  However, in this
>> appendix, we're concerning ourselves with user agents who wish to
>> process ill-formed headers.
>
> We really need to separate the discussion about recovering from invalid
> headers from the other thing (applying un-escapings or encodings not
> supported by the specs).

They seem related.  The question at hand is how to consume the
Content-Disposition header.

>>> The same is true for the other recoveries you propose:
>>>
>>>>            i)  If the word is a well-formed UTF-8 string, emit the word
>>>>                (decoded as UTF-8) and proceed to the next grammatical
>>>> element.
>>>
>>> According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror
>>> do
>>> exactly that (at least in my locale):
>>> <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>
>>
>> According to your tests Firefox, Chrome, and Safari use UTF-8.  Given
>> a free choice of UTF-8 or ISO-8859-1, I'd pick UTF-8, as I've done
>> here.
>
> Given a free choice, I'd pick it as well. But there is no free choice, as
> RFC 2616 says that it is ISO-8859-1, and a certain well-deployed browser
> actually does that.

Fortunately neither of those things prevents us from making this choice here.

>>>> One thing that might make sense is to demarcate those instructions as
>>>> again optional, that is an optional piece of the optional error
>>>> recovery, if you like.
>>>
>>> That would apply to all of RFC2047, UTF-8 defaulting, and
>>> percent-unescaping.
>>>
>>> Are you willing to rephrase the proposal accordingly?
>>
>> If rephrasing the proposal would be helpful, I'm happy to do that.
>> What specifically would you like rephrased?
>
> I think it would be good to
>
> - distinguish between the error-recovery parts and the filename decoding
> parts, and

Fortunately, this should be easy since they're textually contained in
different sections.

> - make every non-complaint part of the filename decoding (UTF-8 instead of
> ISO-8859-1, RFC2047 decoding, percent-unescaping) truly optional

Everything about this appendix is optional, so we should be fine on that score.

Adam

Received on Wednesday, 1 December 2010 23:00:53 UTC