Re: Grammars are not Regexps

So my solution was:

 comments: (comment, s?)+.
 -s: -[" "; #a; #9].

 comment: "(*", content, ")".
 -content: (c*, "*"+)+~["*)"].
 -c: ~["*"].

I consider the interesting bit to be the last "*" in the rule for content, 
which is only there to force the earlier "*"+ to match the maximal number 
of asterisks.
 (c*, "*"+)+~["*)"]
finds zero or more non asterisks, followed by one or more asterisks. If the 
next character is not a closing bracket, it does it again.

If I expand the contained rules, it looks like 

 comment: "(*", (~["*"]*, "*"+)+~["*)"], ")".

Michael's solution is slightly longer:

 comment: '(*', (~['*'] | ('*'+, ~['*)']))*, '*'*, -'*)'.

but has the pleasant property of starting and ending with the comment 
delimiters, meaning you could write:

 comments: (pcomment, s?)+.
 -s: -[" "; #a; #9].
 -pcomment: -'(*', comment, -'*)'.
 comment: (~['*'] | ('*'+, ~['*)']))*, '*'*.

giving on my test set the output of:

    <comment>abc* )(*abc</comment>

Norm's solution

 comment: -'(*', body, -'*)' .
 -body: ~[")"]* ; ~['*'], [")"] .

is very nice, but fails on one test case:

 (*abc* )(*abc*)

(also note that [")"] can be simplified to ")")

but we can fix that with:

 comment: "(*", body, "*)".
 -body: (~[")"];~["*"],")")*. 

which I think wins the prize.


Received on Monday, 7 February 2022 13:26:49 UTC