- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 30 Jan 97 10:23:46 CST
- To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
On Wed, 29 Jan 1997 22:34:33 -0500 Liam Quin said: >Quoth Gavin: >> "<!--*"([^*]|("*"[^-])|("*-"[^-])|("*--"[^>]))*"*-->" > >But I think there is a flaw, as this will not match ><!--**-*--> > ... As Tim Bray said a few days ago in private mail, Oooooooh! My brain hurts... make it stop. Lee Quin's note made me look more carefully at the revised regular expression that Gavin Nicol and I had both independently come up with, and it too has problems. After a few more minutes (an hour, actually, all told) of struggle, I now have the following two regular expressions. Those with the Itch, please check these out! If it's so easy to get this wrong, we need as many checkers as possible. 1 an expression for XML as of now, forbidding '--' within comments [21] Comment := "<!--*"([µ-*]|("-"("*""-"?)*[µ-*])|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->" 2 an expression for the XML rule of the future, forbidding *-- but not -- within comments: [21] Comment-of-the-future := "<!--*"([µ*]|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->" Since I'm not sure what the EBCDIC/ASCII translation is going to do to these, here's another version; I hope the circumflex is right in at least one of them: [21] Comment := "<!--*"([^-*]|("-"("*""-"?)*[^-*])|(("*""-"?)+[^-*]))*("*"|"-*")+"-->" [21] Comment-of-the-future := "<!--*"([^*]|(("*""-"?)+[^-*]))*("*"|"-*")+"-->" In case it's helpful, here is my derivation for the second rule, following the same logic as my derivation of the first rule, posted a day or so ago. First, define Misc as any string of characters ending in something other than a hyphen or a star, and containing no '*--': Misc [µ*]|(("*""-"?)+[µ-*]) And Star as any string of hyphens and stars ending in a star and containing no '*--' Star ("*"|"-*")+ Then a comment is a sequence of: - the start delimiter '<!--' - any number of Misc strings - a Star string to start the final delimiter - '-->' or [21] Comment := "<!--*"({Misc})*{Star}"-->" Expanding out, we get [21] Comment := "<!--*"([µ*]|(("*""-"?)+[µ-*]))*("*"|"-*")+"-->" Test cases: all these comments are legal in both rules. <!--* this is a comment *--> <!--* this comment - how odd - has two single hyphens *--> <!--* the next has a whole series of hyphens and blanks *--> <!--* - - - - - - - - - - - - - - - - - - - - - - - - - - - *--> <!--* *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *- *--> <!--* ********************************************************--> <!--* -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*--> <!--* <p>This is a commented <q>quote</q> in a paragraph.</p> *--> <!--* This is a comment with a PI <?MyApp bg:RED fg:black *--> <!--* date > today *--> <!--* This comment has ?> a pseudo-close in in it. *--> These are illegal under both rules: <!--* Comments cannot nest, so this comment does not succeed in commenting out the entire GREETING element: <greeting> Hello, world! <!--* Comments can contain single hyphens - like this *--> </greeting> *--> <! --* This is a bad comment. *--> <!--* This is bad XML, though legal SGML. *-- > And this should be legal under the future rule, but illegal for now. <!--* Comments cannot contain double hyphens -- this is illegal -- for now *--> -C. M. Sperberg-McQueen
Received on Thursday, 30 January 1997 12:17:59 UTC