W3C home > Mailing lists > Public > public-html@w3.org > January 2011

Re: Change Proposal for ISSUE-126

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 19 Jan 2011 13:09:16 +0100
Message-ID: <4D36D46C.5080004@gmx.de>
To: Philip Taylor <pjt47@cam.ac.uk>
CC: "public-html@w3.org" <public-html@w3.org>
On 19.01.2011 12:35, Philip Taylor wrote:
> Julian Reschke wrote:
>> 6.
>> Process the next character as follows:
>> If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022
>> QUOTATION MARK ('"') (NOT immediately following an U+005C REVERSE
>> SOLIDUS ("\") character) in s
>> If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027
>> APOSTROPHE ("'") in s
>> Return the encoding corresponding to the backslash-unescaped string
>> between this characters and the next earliest occurrence of this
>> character.
> Say I have {charset="foo\\"bar"}. The {"} before {b} is preceded by a
> {\}, so it won't match this case - surely it should do, else there's no
> way for quoted-string to safely quote a string ending in {\} because the
> closing {"} will never match?

Good catch.

> In this case the final {"} will match but then the "next earliest
> occurrence of this character" is the {"} before {b}, so this will return
> {foo\\} - shouldn't it collect all characters up to the non-escaped {"}
> instead? Otherwise quoted-string can't safely quote any string
> containing {"}.
>> "backslash-unescaping" a string replaces each sequence of U+005C
>> REVERSE SOLIDUS ("\") and the following character by just that
>> character. If the last character of the string is a U+005C REVERSE
>> SOLIDUS ("\"), the algorithm returns nothing.
> The last character of the string before unescaping, or after? Either
> way, why shouldn't I be able to quote a string like {foo\}?

Before. (And yes, there's another error here).

You can have quote the string




A single backslash is invalid, and as far I as recall, the algorithm 
already treated certain malformed sequences this way, so I thought it's 
ok to do so here as well.

> (By the way, is the RCF2616 grammar ambiguous? It says
> quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
> qdtext = <any TEXT except <">>
> quoted-pair = "\" CHAR
> so a string like {"\\"} could be parsed as
> <"> qdtext qdtext <">
> or as
> <"> quoted-pair <">
> and it's not clear whether the {\\} is meant to be interpreted as a
> quoted pair or as two separate characters. I'm assuming that it should
> be a pair but don't see that defined anywhere.)

Yes, that's a known issue in 2616 that we fixed a long time ago, see 

Best regards, Julian
Received on Wednesday, 19 January 2011 12:10:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:18 GMT