Re: Content-Disposition next steps

On Wed, Dec 1, 2010 at 3:15 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 01.12.2010 23:59, Adam Barth wrote:
>>> I'm not convinced that all UAs are going to continue to keep workarounds;
>>> Mozilla, for instance, is considering to drop RFC2047 decoding, see
>>> <https://bugzilla.mozilla.org/show_bug.cgi?id=610054>.
>>
>> RFC2047 support is particularly tempting to remove.  As I wrote
>> before, I debated whether to include it.
>
> Ok, here's a +1 to exclude it.

Done.

>>> That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in a
>>> filename.
>>
>> Yes.
>>
>>> We can *warn* that some UAs misinterpret this, but unless we can
>>> recommend something else, that's not really helpful ("something else"
>>> would
>>> be RFC 5987 encoding, but Safari and IE do not understand that).
>>
>> Well, we can warn or we can forbid.  The choice is ours.
>
> <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-04.html#rfc.section.3.3>:
>
> "Note: Many user agents do not properly handle escape characters when using
> the quoted-string form. Furthermore, some user agents erroneously try to
> perform unescaping of "percent" escapes (see Appendix C.2), and thus might
> misinterpret filenames containing the percent character followed by two hex
> digits."

I'm not sure what point you're trying to make by quoting this part of
the document.  That is indeed a warning.

>>> We really need to separate the discussion about recovering from invalid
>>> headers from the other thing (applying un-escapings or encodings not
>>> supported by the specs).
>>
>> They seem related.  The question at hand is how to consume the
>> Content-Disposition header.
>
> There's no simple answer to that, unless you're willing to be fully
> spec-compliant in which case the spec already tells you what to do.

Hence my proposal.

>>>>> The same is true for the other recoveries you propose:
>>>>>
>>>>>>            i)  If the word is a well-formed UTF-8 string, emit the
>>>>>> word
>>>>>>                (decoded as UTF-8) and proceed to the next grammatical
>>>>>> element.
>>>>>
>>>>> According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror
>>>>> do
>>>>> exactly that (at least in my locale):
>>>>> <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>
>>>>
>>>> According to your tests Firefox, Chrome, and Safari use UTF-8.  Given
>>>> a free choice of UTF-8 or ISO-8859-1, I'd pick UTF-8, as I've done
>>>> here.
>>>
>>> Given a free choice, I'd pick it as well. But there is no free choice, as
>>> RFC 2616 says that it is ISO-8859-1, and a certain well-deployed browser
>>> actually does that.
>>
>> Fortunately neither of those things prevents us from making this choice
>> here.
>
> Sorry? The spec (2616) says it's ISO-8859-1. IE takes it as ISO-8859-1.
> Opera and Konqueror agree. I don't think there's room for choice here.

I've changed this to ISO-8859-1.  One advantage of ISO-8859-1 is that
it can't fail to decode, so we can remove some of the sadness that
happens when UTF-8 decoding fails.

> It would be helpful if you could provide text that is actually
> self-contained and ready to appear in the spec; in particular that means
> that this needs an explanation about who this is for, and what the
> conformance requirements are. If some parts are optional, these need to be
> marked.

I'm happy to do that once we've agreed on the substance.  Bjoern had
some test cases we should be sure we're happy with.  If you're willing
to update your page with the rest of the test cases, I can look at
them and adjust the text accordingly.

> If *all* of this is optional (you can pick whatever you like), then this is
> just another way to document the current state of implementations, in which
> case I'd recommend we just augment what's already in Appendix C.

I don't really understand what you mean by this paragraph.  The
framing we've been discussing is "if you'd like more information about
consuming the Content-Disposition header, here's a recommendation for
how you might like to do that."

On Wed, Dec 1, 2010 at 3:19 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 01.12.2010 23:53, Adam Barth wrote:
>>> Not to mention that it's silly to treat `x=y; filename=example.txt` as if
>>> it had an unrecognized disposition type and should thus be handled as
>>> "attachment", which, say, Internet Explorer and Opera don't do, when
>>> you treat plain `filename=example.txt` as having no disposition type.
>>
>> Perhaps Julian would be willing to add this case to his test suite?
>> Silliness isn't one of the criteria I've applied.
>> ...
>
> Will do tomorrow.

Great.  Let me know when it's posted and I'll take a look.  I really
appreciate your cross-testing a number of browsers.  That's
tremendously helpful in figuring out what to do in these cases.

>>> I can't really make heads or tails of the rest of your proposal, for
>>> instance, if you go by the processing rules already in the draft, you
>>> would not need to discuss quote marks, but you seem to have your own
>>> rules for processing parameters and parameter values, in which case
>>> you would need to discuss quote marks, but your proposal does not.
>>
>> It's possible I've screwed up handling quote marks.  Do you have a
>> specific test case you're worried about?  I was surprised as well that
>> I didn't need to mention quote marks.
>
> That's probably because Chrome doesn't handle quoted-string properly:
> <http://greenbytes.de/tech/tc2231/#attwithasciifnescapedchar>

You write:

fail (saves "oo.html" (what's going on here?, see Chrome Issue 52577))

what's going on is that the "\" is being treated as a directory
separator and Chrome is giving you the "leaf" name of the path.
There's another level of platform-specific transformations that get
applied to the suggested filename, but I didn't include those in this
document.  It's a bit fuzzy where to draw the line between the two
transformations, but I tried to stick to segmenting the file name out
of the header and (roughly) character-by-character decoding.

Adam

Received on Thursday, 2 December 2010 00:26:07 UTC