- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Wed, 19 Jan 2011 13:09:16 +0100
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: "public-html@w3.org" <public-html@w3.org>
On 19.01.2011 12:35, Philip Taylor wrote: > Julian Reschke wrote: >> 6. >> Process the next character as follows: >> >> If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022 >> QUOTATION MARK ('"') (NOT immediately following an U+005C REVERSE >> SOLIDUS ("\") character) in s >> If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 >> APOSTROPHE ("'") in s >> Return the encoding corresponding to the backslash-unescaped string >> between this characters and the next earliest occurrence of this >> character. > > Say I have {charset="foo\\"bar"}. The {"} before {b} is preceded by a > {\}, so it won't match this case - surely it should do, else there's no > way for quoted-string to safely quote a string ending in {\} because the > closing {"} will never match? Good catch. > In this case the final {"} will match but then the "next earliest > occurrence of this character" is the {"} before {b}, so this will return > {foo\\} - shouldn't it collect all characters up to the non-escaped {"} > instead? Otherwise quoted-string can't safely quote any string > containing {"}. > >> "backslash-unescaping" a string replaces each sequence of U+005C >> REVERSE SOLIDUS ("\") and the following character by just that >> character. If the last character of the string is a U+005C REVERSE >> SOLIDUS ("\"), the algorithm returns nothing. > > The last character of the string before unescaping, or after? Either > way, why shouldn't I be able to quote a string like {foo\}? Before. (And yes, there's another error here). You can have quote the string foo\ using "foo\\" A single backslash is invalid, and as far I as recall, the algorithm already treated certain malformed sequences this way, so I thought it's ok to do so here as well. > (By the way, is the RCF2616 grammar ambiguous? It says > > quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) > qdtext = <any TEXT except <">> > quoted-pair = "\" CHAR > > so a string like {"\\"} could be parsed as > <"> qdtext qdtext <"> > or as > <"> quoted-pair <"> > and it's not clear whether the {\\} is meant to be interpreted as a > quoted pair or as two separate characters. I'm assuming that it should > be a pair but don't see that defined anywhere.) Yes, that's a known issue in 2616 that we fixed a long time ago, see <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/31>. Best regards, Julian
Received on Wednesday, 19 January 2011 12:10:06 UTC