- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 04 Dec 2010 12:18:11 +0100
- To: Adam Barth <ietf@adambarth.com>
- CC: Bjoern Hoehrmann <derhoermi@gmx.net>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
On 04.12.2010 00:56, Adam Barth wrote: > On Thu, Dec 2, 2010 at 4:26 AM, Julian Reschke<julian.reschke@gmx.de> wrote: >> I added >> >> <http://greenbytes.de/tech/tc2231/#attmissingdisposition2> >> >> which fails for FF3/Chrome/Chrome9 (I see shared bugs :-), > > Hum... This one sounds a bit tricky. It's not clear to me which > option is better. > ... The specs are clear on it, and all UAs except FF and Chrome get this right. This seems to be a case where it's clear what's "better" (even if we may disagree on the metrics for "better"). >> and >> >> <http://greenbytes.de/tech/tc2231/#emptydisposition> >> >> which fails just for FF3. > > Thanks. Let's hope that this can get resolved in FF when the C-D related changes land after release of FF4. >> I'm not totally sure what exactly to test; please elaborate. > > Content-Disposition: xfilename=foo.txt I have added a test for Content-Disposition: attachment; xfilename=foo.txt (the one you proposed would have been invalid anyway). This is <http://greenbytes.de/tech/tc2231/#attconfusedparam> and it fails for Chrome only (reported as <http://code.google.com/p/chromium/issues/detail?id=65423>). >> <http://greenbytes.de/tech/tc2231/#dispextbadfn> >> >> failing in Chrome only. > > Oh good. I'll update the wiki. With a more elaborate grammar. See also <http://code.google.com/p/chromium/issues/detail?id=65276>. >> From my perspective, I'd like there to be a specification of how a > user agent should consume the Content-Disposition header. I started > with Chrome's behavior because I'm most familiar with it and because > there's evidence that at least one implementor is willing to ship that > behavior. The specification says how to consume *valid* header instances (I know you know that, but I'm repeating it for people who might not have read all of this thread). What's up for discussion is whether we want to talk about handling invalid headers, how to do that, and what kind of conformance comes with that. > Ideally, we'd get feedback from other user agent implementors about > what they'd like the specification to say. We'd then have an easier > time polishing away the more exotic behaviors. Instead, we're relying > on our collective judgement. As I said before, I'm not very interested in getting more interop for broken headers. What concerns me much more is interop for valid headers, and delaying the spec again and again doesn't help here. So, is there any *hope* that we'll see that feedback from the other browser vendors? Maciej? Anne? Eric? Robert? >> Do you want UAs to converge on that behavior? > > Yes. Then I'd propose that you add an introduction to the Wiki clearly saying how you want it to appear in the spec. Personally, I don't think it's a useful exercise for this Working Group, as the observed behavior for broken headers differs a lot (from proper parsing as in Konqueror to naive substring matching in some other UAs :-). This is different, from, for instance, handling broken HTML (where we actually have evidence that UAs need to do this to stay in business). >> Even those who currently reject invalid header fields? > > If a UA wants to reject invalid header fields, that sounds fine to me. > What I'd like to avoid is there being N different ways of consuming > Content-Disposition, where N is the number of user agent > implementations. There should be one way to handle valid characters. I'm less interested in consistent behavior for invalid headers *unless* there's a related security risk. As this whole discussion also applies to the actual set of HTTP specs we want to finish, I'd appreciated feedback from other WG members on whether they are interested in adding this type of information. And those who are please say whether you're willing to contribute test cases and report back from your code bases. >> Note that we're targeting "Proposed Standard" here. It would be great to get >> this published, see how implementations improve (see both Chrome 9 and >> Firefox post version 4), and *then* work on an implementation report for >> Draft Standard. > > In this discussion, you keep saying things that imply that we can't > write specs for user agents until all the user agents already have the > same behavior. We're not mind readers. It's quite helpful to have a > document that explains how you're supposed to consumer these headers. On the contrary. What I was trying to say is that this is for "Proposed". There's a next stage which, in the IETF process, is actually the first stage where we're looking at implementations. If this WG can't come up with a consensus on this issue, there will be another opportunity when we go to Draft Standard. >> The point is that we already have spec text, which is a warning. Do you want >> it to change? > > Personally, I don't feel that strongly about it. However, I do feel > strongly about keeping the %-decoding in the UA Appendix. If you're > fine with having both the warning and the %-decoding in the appendix, > that's a workable solution. If you feel these are in conflict, then > I'd rather change the warning to an error and keep the %-decoding in > the appendix than remove the %-decoding from the appendix. I think they are in conflict, and wouldn't want to see any recommendation to do %-unescaping. It would make the spec incompatible with the previous spec (a conformance change), and it's only implemented in two out of six UAs I'm testing. >> See above, I'm struggling to understand what the proposal actually is. (such >> as: placement, introduction, implication on conformance, ...). > > We've been talking about putting it in the appendix. I'm not sure > whether you need to reference it from the introduction. It doesn't > affect conformance for any conformance class. By all means please provide the complete text for the introduction of the Appendix. That's essential to understand what the expectations on implementations are. If there aren't any, such as "we just think this is a good idea" I'd propose to have it in a separate document which may or not may be a WG work item. >> I'm less concerned about processing invalid messages, but I'll say again >> that there's little interoperability for those messages, so I just don't see >> why we care. > > We care because we want there to be more interoperability in the > future. The goal of writing standards is to improve interoperability. Yes, but we usually draw a line between things we care about and things we don't. We happen to disagree on where to draw that line. >>> You write: >>> >>> fail (saves "oo.html" (what's going on here?, see Chrome Issue 52577)) >>> >>> what's going on is that the "\" is being treated as a directory >>> separator and Chrome is giving you the "leaf" name of the path. >> >> OK, so it fails to do the unescaping on quoted-string. It would be great if >> this could be fixed. > > I'm not sure what you mean by "fixed." It's unclear whether user > agents want to do \-decoding on the file name, especially because \ is > a common directory separator on some operating systems. "Fixing" means "changing things to work as specified". So the question here is whether it would break things because there are servers sending unescaped backslashes. As far as I can tell, sending path separators in the filename indicates a bug in the sender, or an attempt to trick the user agent to do something it's not supposed to do. So the "harm" of actually doing the unescaping would be that for a filename that needs to be postprocessed anyway, the problematic character would be filtered in a different way. Starting with filename="a\bc" the broken implementation sees "a" and "bc" separated by a path separator, and will prost-process this to "abc", "a_bc" or "bc" (where _ could be a different replacement character). A correct implementation sees "abc". I don't think there's a problem here. >> Looking at >> <http://trac.tools.ietf.org/wg/httpbis/trac/wiki/ContentDispositionErrorHandling?version=7>: >> >>> Determining the Disposition >>> >>> To determine the disposition-type, parse the Content-Disposition header >>> field using the following grammar: >>> >>> unparsed-string = *LWS nominal-type *OCTET >>> nominal-type = "inline" / "filename" / "name" / ";" >>> >>> If the Content-Disposition header field is non-empty and fails to parse, >>> then the disposition type is "attachment". Otherwise, the disposition-type >>> is "inline". >> >> Neither "filename" nor "name" are disposition types. > > Indeed. It's confusing and makes reviewing the text harder than it needs to be. >> It suggests that you >> can leave out the disposition type and get it treated as attachment; >> <http://greenbytes.de/tech/tc2231/#attmissingdisposition> indicates >> otherwise. > > I'm not sure I understand what you're saying. The wiki text matches > UA behavior for > http://greenbytes.de/tech/tc2231/#attmissingdisposition. Is there > another test case you're worried about? No, sorry, I got this wrong. >>> Extracting Parameter Values From Header Fields >>> >>> To extract the value for a given parameter-name from an unparsed-string, >>> parse the unparsed-string using the following grammar: >>> >>> unparsed-string = *OCTET name *LWS "=" value [ ";" *OCTET ] >>> value =<OCTET, except ";"> >>> >>> where the name production is a gramatical production that is a >>> case-insensitive match for the given parameter-name. If the unparsed-string >>> can be parsed by the grammar in multple ways, choose the one in which name >>> appears as close to the beginning of the string as possible. If the >>> unparsed-string cannot be parsed by the grammar above, return the empty >>> string. >> >> This doesn't handle quoted strings. > > How would you like quoted strings to be handled. According to your > tests, what we should do is strip off matching leading and trailing " > characters and be careful to capture ; inside of ". However, your > tests show that we should not \-decode the value. I'm happy to make > that change. Quoted strings should be handled as specified, removing the quotes and performing \-unescaping. The tests show that indeed a majority of UAs get this wrong but that doesn't make it magically right. I'd prefer that we invest our time to reduce the bugs in the UAs, instead of documenting them. Note that fixing the quoted-string handling has already a proposed patch in Mozilla. >>> Decoding the File Name >>> >>> To filename-decode an encoded-string, use the following algorithm: >>> >>> 1. If the encoded-string contains non-ASCII characters, emit the >>> encoded-string (decoded as ISO-8859-1) and abort these steps. >> >> So by adding a non-ASCII character I can prevent percent-unescaping? Is this >> implemented anywhere? > > I'd encourage you to write a test and find out. :) Writing tests doesn't come for free, and every test I add needs to be maintained and re-run. So I'd prefer to understand it's worth the time before. >>> 2. Let the url-unescaped-string be the encoded-string %-unescaped. >>> 3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually >>> more sadness here if the url-unescaped-string isn't valid UTF-8.) >>> >>> The emitted characters are the decoded file name. >> >> <permathread>Why would we recommend something that only Chrome and IE do >> (and IE only does for some locals)</permathread> > > As you indicate, we've discussed this issue at length. If you can > convince IE to remove this behavior, then we might be able to remove > it from this document. Otherwise, we'd like to compete with IE in > this respect. So can you convince Safari, Opera, Firefox, and Konqueror to adopt this handling as well? Otherwise I don't think we'll make progress. Two UAs do something funny, the other four do not. I don't want the specification to reflect those implementation bugs -- even if they can't be realistically removed from these UAs anytime soon. Best regards, Julian
Received on Saturday, 4 December 2010 11:18:48 UTC