- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 30 Jan 97 10:23:46 CST
- To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
On Wed, 29 Jan 1997 22:34:33 -0500 Liam Quin said:
>Quoth Gavin:
>> "<!--*"([^*]|("*"[^-])|("*-"[^-])|("*--"[^>]))*"*-->"
>
>But I think there is a flaw, as this will not match
><!--**-*-->
> ...
As Tim Bray said a few days ago in private mail,
Oooooooh! My brain hurts... make it stop.
Lee Quin's note made me look more carefully at the revised regular
expression that Gavin Nicol and I had both independently come up with,
and it too has problems. After a few more minutes (an hour, actually,
all told) of struggle, I now have the following two regular expressions.
Those with the Itch, please check these out! If it's so easy to get
this wrong, we need as many checkers as possible.
1 an expression for XML as of now, forbidding '--' within comments
[21] Comment :=
"<!--*"([µ-*]|("-"("*""-"?)*[µ-*])|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->"
2 an expression for the XML rule of the future, forbidding *-- but
not -- within comments:
[21] Comment-of-the-future :=
"<!--*"([µ*]|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->"
Since I'm not sure what the EBCDIC/ASCII translation is going to do to
these, here's another version; I hope the circumflex is right in at
least one of them:
[21] Comment :=
"<!--*"([^-*]|("-"("*""-"?)*[^-*])|(("*""-"?)+[^-*]))*("*"|"-*")+"-->"
[21] Comment-of-the-future :=
"<!--*"([^*]|(("*""-"?)+[^-*]))*("*"|"-*")+"-->"
In case it's helpful, here is my derivation for the second rule,
following the same logic as my derivation of the first rule, posted
a day or so ago.
First, define Misc as any string of characters ending in something other
than a hyphen or a star, and containing no '*--':
Misc [µ*]|(("*""-"?)+[µ-*])
And Star as any string of hyphens and stars ending in a star and
containing no '*--'
Star ("*"|"-*")+
Then a comment is a sequence of:
- the start delimiter '<!--'
- any number of Misc strings
- a Star string to start the final delimiter
- '-->'
or
[21] Comment := "<!--*"({Misc})*{Star}"-->"
Expanding out, we get
[21] Comment :=
"<!--*"([µ*]|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->"
Test cases: all these comments are legal in both rules.
<!--* this is a comment *-->
<!--* this comment - how odd - has two single hyphens *-->
<!--* the next has a whole series of hyphens and blanks *-->
<!--* - - - - - - - - - - - - - - - - - - - - - - - - - - - *-->
<!--* *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *-->
<!--* ********************************************************-->
<!--* -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-->
<!--* <p>This is a commented <q>quote</q> in a paragraph.</p> *-->
<!--* This is a comment with a PI <?MyApp bg:RED fg:black *-->
<!--* date > today *-->
<!--* This comment has ?> a pseudo-close in in it. *-->
These are illegal under both rules:
<!--* Comments cannot nest, so this comment
does not succeed in commenting out the entire
GREETING element:
<greeting>
Hello, world!
<!--* Comments can contain single hyphens - like this *-->
</greeting>
*-->
<! --* This is a bad comment. *-->
<!--* This is bad XML, though legal SGML. *-- >
And this should be legal under the future rule, but illegal for now.
<!--* Comments cannot contain double hyphens -- this is illegal --
for now *-->
-C. M. Sperberg-McQueen
Received on Thursday, 30 January 1997 12:17:59 UTC