- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Mon, 07 Feb 2022 13:26:33 +0000
- To: ixml <public-ixml@w3.org>
- Message-Id: <1644234515120.3313171841.1813621065@cwi.nl>
So my solution was:
comments: (comment, s?)+.
-s: -[" "; #a; #9].
comment: "(*", content, ")".
-content: (c*, "*"+)+~["*)"].
-c: ~["*"].
I consider the interesting bit to be the last "*" in the rule for content,
which is only there to force the earlier "*"+ to match the maximal number
of asterisks.
So
(c*, "*"+)+~["*)"]
finds zero or more non asterisks, followed by one or more asterisks. If the
next character is not a closing bracket, it does it again.
If I expand the contained rules, it looks like
comment: "(*", (~["*"]*, "*"+)+~["*)"], ")".
Michael's solution is slightly longer:
comment: '(*', (~['*'] | ('*'+, ~['*)']))*, '*'*, -'*)'.
but has the pleasant property of starting and ending with the comment
delimiters, meaning you could write:
comments: (pcomment, s?)+.
-s: -[" "; #a; #9].
-pcomment: -'(*', comment, -'*)'.
comment: (~['*'] | ('*'+, ~['*)']))*, '*'*.
giving on my test set the output of:
<comments>
<comment/>
<comment>*</comment>
<comment>**</comment>
<comment>***</comment>
<comment>abc</comment>
<comment>*abc</comment>
<comment>abc*</comment>
<comment>abc*abc</comment>
<comment>*abc*abc</comment>
<comment>abc**abc</comment>
<comment>abc*abc*</comment>
<comment>abc* )(*abc</comment>
<comment>abc</comment>
<comment>abc</comment>
</comments>
Norm's solution
comment: -'(*', body, -'*)' .
-body: ~[")"]* ; ~['*'], [")"] .
is very nice, but fails on one test case:
(*abc* )(*abc*)
(also note that [")"] can be simplified to ")")
but we can fix that with:
comment: "(*", body, "*)".
-body: (~[")"];~["*"],")")*.
which I think wins the prize.
Steven
Received on Monday, 7 February 2022 13:26:49 UTC