- From: Adam Barth <ietf@adambarth.com>
- Date: Wed, 1 Dec 2010 14:59:46 -0800
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
On Wed, Dec 1, 2010 at 2:47 PM, Julian Reschke <julian.reschke@gmx.de> wrote: > On 01.12.2010 23:17, Adam Barth wrote: >>> This *could* be justified by claiming that the filename is advisory only, >>> but it's really not satisfying, in particular when we have evidence that >>> many UAs get away without it. >> >> Hum... There are two parts to this issue: >> >> 1) How we expect user agents to behave. >> 2) What we write in the document. >> >> IMHO, the user agents who are interested in this information are not >> going to stop performing this decoding (to issue 1), and the document >> should tell the truth with respect to what these user agents are going >> to do (to issue 2). > > I'm not convinced that all UAs are going to continue to keep workarounds; > Mozilla, for instance, is considering to drop RFC2047 decoding, see > <https://bugzilla.mozilla.org/show_bug.cgi?id=610054>. RFC2047 support is particularly tempting to remove. As I wrote before, I debated whether to include it. > But even if it was the case: the UAs support *different* kinds of decoding. > So there's no single way we could document that all of them do now. Indeed. Fortunately we're not documenting all of them in this document. > What we *can* do is mention what's going on, and warn about what it means > for interop. We already do that, but I'm open to add more text to that part. That's indeed useful information for servers but not particularly helpful to user agent implementors. >>> Error recovery implies a detectable error. A big part of what you propose >>> changes the interpretation of valid fields. >> >> Perhaps the term "error recovery" isn't helpful? Another option, of >> course, is to adjust the definition of what is valid to exclude these >> pieces of syntax. > > That would mean disallowing to send "%" or "%hh" (where h is HEXDIG) in a > filename. Yes. > We can *warn* that some UAs misinterpret this, but unless we can > recommend something else, that's not really helpful ("something else" would > be RFC 5987 encoding, but Safari and IE do not understand that). Well, we can warn or we can forbid. The choice is ours. >>> Unless you want to ignore it otherwise. >> >> That is certainly one option for consumers. However, in this >> appendix, we're concerning ourselves with user agents who wish to >> process ill-formed headers. > > We really need to separate the discussion about recovering from invalid > headers from the other thing (applying un-escapings or encodings not > supported by the specs). They seem related. The question at hand is how to consume the Content-Disposition header. >>> The same is true for the other recoveries you propose: >>> >>>> i) If the word is a well-formed UTF-8 string, emit the word >>>> (decoded as UTF-8) and proceed to the next grammatical >>>> element. >>> >>> According to RFC2616, the default is ISO-8859-1, and IE/Opera/Konqueror >>> do >>> exactly that (at least in my locale): >>> <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain> >> >> According to your tests Firefox, Chrome, and Safari use UTF-8. Given >> a free choice of UTF-8 or ISO-8859-1, I'd pick UTF-8, as I've done >> here. > > Given a free choice, I'd pick it as well. But there is no free choice, as > RFC 2616 says that it is ISO-8859-1, and a certain well-deployed browser > actually does that. Fortunately neither of those things prevents us from making this choice here. >>>> One thing that might make sense is to demarcate those instructions as >>>> again optional, that is an optional piece of the optional error >>>> recovery, if you like. >>> >>> That would apply to all of RFC2047, UTF-8 defaulting, and >>> percent-unescaping. >>> >>> Are you willing to rephrase the proposal accordingly? >> >> If rephrasing the proposal would be helpful, I'm happy to do that. >> What specifically would you like rephrased? > > I think it would be good to > > - distinguish between the error-recovery parts and the filename decoding > parts, and Fortunately, this should be easy since they're textually contained in different sections. > - make every non-complaint part of the filename decoding (UTF-8 instead of > ISO-8859-1, RFC2047 decoding, percent-unescaping) truly optional Everything about this appendix is optional, so we should be fine on that score. Adam
Received on Wednesday, 1 December 2010 23:00:53 UTC