Re: \-decoding filename parameters [was: TICKET 259: 'treat as invalid' not defined]

On 03.02.2011 04:49, Mark Nottingham wrote:
> Having had some time away from this, and re-reading the thread, I notice that HTTPbis already disallows the production of many \-encoded characters:
>
>     Producers SHOULD NOT escape characters that do not require escaping
>     (i.e., other than DQUOTE and the backslash character).
>
> (p1, 1.2.2)
>
> So, really, we're already talking about error handling when we talk about things like \b, etc.
>
> Furthermore, AFAICT neither 2616 nor bis really talk about the semantics and handling of "unusual" \-encoded characters. According to the current definition, one could plausibly decode "\n" to be a newline, since all that's really said about it is
>
>     The backslash character ("\") MAY be used as a single-character
>     quoting mechanism only within quoted-string and comment constructs.
>
> (although I doubt that happens IRL).

That has bugged me for some time, and I wasn't sure whether I was just 
too pedantic and the answer is obvious. We should treat this as 
something we need to fix.

That being said: the spec specifies quoted-pair in a way that other 
characters are allowed. If we don't change that, we need to say what it 
means to use this.

The grammar in HTTP was inspired by RFC 822 and successors, and RFC 5322 
says:

"Where any quoted-pair appears, it is to be interpreted as the character 
alone. That is to say, the "\" character that appears as part of a 
quoted-pair is semantically "invisible"." -- 
<http://greenbytes.de/tech/webdav/rfc5322.html#rfc.section.3.2.1>

This happens to match my intuitive understanding what escape characters 
are for, so I'd propose that we adopt that.

> At the least, then, we should continue to discourage the use of \-escaping for things other than "\" and<">.

Recommending not to escape things that do not need escaping is fine. I'm 
not sure whether this needs to be a SHOULD.

> If we were to write error-handling advice for it, it seems that we could give *weak* advice to replace with "_" or "-" (based upon<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedchar>). We should probably also consider that question for BIS, but that can wait for now.

Disagreed.

First of all, this is only error handling if we actually forbid those 
sequences. We don't do that right now.

It would be bad if we ended up with different rules for processing 
quoted-string depending on where they occur, just to able that some 
broken implementations can claim that they are not.

> To me the currently relevant question is whether implementers will eventually support escaping "\" and<">. Right now a few do (see<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedquote>), but many don't. However, since this is a "soft" failure / interop problem (i.e., it affects how a file is named when saved on disk, but doesn't prevent it from being saved or named), I don't see that as a reason to not specify it.

Sending a filename with a literal backslash character in it is likely an 
attempt by the sender to trick the recipient to overwrite files in 
another directory. The spec already recommends:

"When the value contains path separator characters, all but the last 
segment SHOULD be ignored. This prevents unintentional overwriting of 
well-known file system location (such as "/etc/passwd")." -- 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-content-disp-04.html#rfc.section.3.3>

So it really doesn't matter a lot at what stage the \ disappears.

Escaped DQUOTEs 
(<http://greenbytes.de/tech/tc2231/#attwithasciifnescapedquote>) cover a 
use case that has no other solution (except for 5987-encoding 
everything). So I believe the right thing to do is to keep this 
specified, and potentially warn servers about the UAs that can't handle 
it (similar to the way we already warn about the "%" problem).

> ...

Best regards, Julian

Received on Thursday, 3 February 2011 18:53:27 UTC