Re: Change Proposal for ISSUE-126 from Philip Taylor on 2011-01-19 (public-html@w3.org from January 2011)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Wed, 19 Jan 2011 11:35:56 +0000
To: Julian Reschke <julian.reschke@gmx.de>
CC: "public-html@w3.org" <public-html@w3.org>
Message-ID: <4D36CC9C.6030803@cam.ac.uk>

Julian Reschke wrote:
>    6.
>       Process the next character as follows:
> 
>       If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022 
> QUOTATION MARK ('"') (NOT immediately following an U+005C REVERSE 
> SOLIDUS ("\") character) in s
>       If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 
> APOSTROPHE ("'") in s
>           Return the encoding corresponding to the backslash-unescaped 
> string between this characters and the next earliest occurrence of this 
> character.

Say I have {charset="foo\\"bar"}. The {"} before {b} is preceded by a 
{\}, so it won't match this case - surely it should do, else there's no 
way for quoted-string to safely quote a string ending in {\} because the 
closing {"} will never match?

In this case the final {"} will match but then the "next earliest 
occurrence of this character" is the {"} before {b}, so this will return 
{foo\\} - shouldn't it collect all characters up to the non-escaped {"} 
instead? Otherwise quoted-string can't safely quote any string 
containing {"}.

> "backslash-unescaping" a string replaces each sequence of U+005C REVERSE 
> SOLIDUS ("\") and the following character by just that character. If the 
> last  character of the string is a U+005C REVERSE SOLIDUS ("\"), the 
> algorithm returns nothing.

The last character of the string before unescaping, or after? Either 
way, why shouldn't I be able to quote a string like {foo\}?

(By the way, is the RCF2616 grammar ambiguous? It says

     quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
     qdtext         = <any TEXT except <">>
     quoted-pair    = "\" CHAR

so a string like {"\\"} could be parsed as
     <"> qdtext qdtext <">
or as
     <"> quoted-pair <">
and it's not clear whether the {\\} is meant to be interpreted as a 
quoted pair or as two separate characters. I'm assuming that it should 
be a pair but don't see that defined anywhere.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Wednesday, 19 January 2011 11:36:29 UTC