- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Mon, 07 Feb 2022 13:26:33 +0000
- To: ixml <public-ixml@w3.org>
- Message-Id: <1644234515120.3313171841.1813621065@cwi.nl>
So my solution was: comments: (comment, s?)+. -s: -[" "; #a; #9]. comment: "(*", content, ")". -content: (c*, "*"+)+~["*)"]. -c: ~["*"]. I consider the interesting bit to be the last "*" in the rule for content, which is only there to force the earlier "*"+ to match the maximal number of asterisks. So (c*, "*"+)+~["*)"] finds zero or more non asterisks, followed by one or more asterisks. If the next character is not a closing bracket, it does it again. If I expand the contained rules, it looks like comment: "(*", (~["*"]*, "*"+)+~["*)"], ")". Michael's solution is slightly longer: comment: '(*', (~['*'] | ('*'+, ~['*)']))*, '*'*, -'*)'. but has the pleasant property of starting and ending with the comment delimiters, meaning you could write: comments: (pcomment, s?)+. -s: -[" "; #a; #9]. -pcomment: -'(*', comment, -'*)'. comment: (~['*'] | ('*'+, ~['*)']))*, '*'*. giving on my test set the output of: <comments> <comment/> <comment>*</comment> <comment>**</comment> <comment>***</comment> <comment>abc</comment> <comment>*abc</comment> <comment>abc*</comment> <comment>abc*abc</comment> <comment>*abc*abc</comment> <comment>abc**abc</comment> <comment>abc*abc*</comment> <comment>abc* )(*abc</comment> <comment>abc</comment> <comment>abc</comment> </comments> Norm's solution comment: -'(*', body, -'*)' . -body: ~[")"]* ; ~['*'], [")"] . is very nice, but fails on one test case: (*abc* )(*abc*) (also note that [")"] can be simplified to ")") but we can fix that with: comment: "(*", body, "*)". -body: (~[")"];~["*"],")")*. which I think wins the prize. Steven
Received on Monday, 7 February 2022 13:26:49 UTC