- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 03 Feb 2011 19:52:43 +0100
- To: Mark Nottingham <mnot@mnot.net>
- CC: Adam Barth <ietf@adambarth.com>, httpbis <ietf-http-wg@w3.org>
On 03.02.2011 04:49, Mark Nottingham wrote: > Having had some time away from this, and re-reading the thread, I notice that HTTPbis already disallows the production of many \-encoded characters: > > Producers SHOULD NOT escape characters that do not require escaping > (i.e., other than DQUOTE and the backslash character). > > (p1, 1.2.2) > > So, really, we're already talking about error handling when we talk about things like \b, etc. > > Furthermore, AFAICT neither 2616 nor bis really talk about the semantics and handling of "unusual" \-encoded characters. According to the current definition, one could plausibly decode "\n" to be a newline, since all that's really said about it is > > The backslash character ("\") MAY be used as a single-character > quoting mechanism only within quoted-string and comment constructs. > > (although I doubt that happens IRL). That has bugged me for some time, and I wasn't sure whether I was just too pedantic and the answer is obvious. We should treat this as something we need to fix. That being said: the spec specifies quoted-pair in a way that other characters are allowed. If we don't change that, we need to say what it means to use this. The grammar in HTTP was inspired by RFC 822 and successors, and RFC 5322 says: "Where any quoted-pair appears, it is to be interpreted as the character alone. That is to say, the "\" character that appears as part of a quoted-pair is semantically "invisible"." -- <http://greenbytes.de/tech/webdav/rfc5322.html#rfc.section.3.2.1> This happens to match my intuitive understanding what escape characters are for, so I'd propose that we adopt that. > At the least, then, we should continue to discourage the use of \-escaping for things other than "\" and<">. Recommending not to escape things that do not need escaping is fine. I'm not sure whether this needs to be a SHOULD. > If we were to write error-handling advice for it, it seems that we could give *weak* advice to replace with "_" or "-" (based upon<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedchar>). We should probably also consider that question for BIS, but that can wait for now. Disagreed. First of all, this is only error handling if we actually forbid those sequences. We don't do that right now. It would be bad if we ended up with different rules for processing quoted-string depending on where they occur, just to able that some broken implementations can claim that they are not. > To me the currently relevant question is whether implementers will eventually support escaping "\" and<">. Right now a few do (see<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedquote>), but many don't. However, since this is a "soft" failure / interop problem (i.e., it affects how a file is named when saved on disk, but doesn't prevent it from being saved or named), I don't see that as a reason to not specify it. Sending a filename with a literal backslash character in it is likely an attempt by the sender to trick the recipient to overwrite files in another directory. The spec already recommends: "When the value contains path separator characters, all but the last segment SHOULD be ignored. This prevents unintentional overwriting of well-known file system location (such as "/etc/passwd")." -- <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-04.html#rfc.section.3.3> So it really doesn't matter a lot at what stage the \ disappears. Escaped DQUOTEs (<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedquote>) cover a use case that has no other solution (except for 5987-encoding everything). So I believe the right thing to do is to keep this specified, and potentially warn servers about the UAs that can't handle it (similar to the way we already warn about the "%" problem). > ... Best regards, Julian
Received on Thursday, 3 February 2011 18:53:27 UTC