- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 03 Feb 2022 18:22:07 -0700
- To: public-ixml@w3.org
- Message-ID: <87tudfo280.fsf@blackmesatech.com>
The challenge as I interpreted it: Write a grammar such that
(1) The following are all accepted as comments:
<test-input>(**)</test-input>
<test-input>(***)</test-input>
<test-input>(****)</test-input>
<test-input>(*****)</test-input>
<test-input>(*abc*)</test-input>
<test-input>(**abc*)</test-input>
<test-input>(*abc**)</test-input>
<test-input>(*abc*abc*)</test-input>
<test-input>(**abc*abc*)</test-input>
<test-input>(*abc**abc*)</test-input>
<test-input>(*abc*abc**)</test-input>
<test-input>(*abc* )(*abc*)</test-input>
<test-input>(***a*)</test-input>
<test-input>(**a*)</test-input>
<test-input>(**a*a*)</test-input>
<test-input>(**aa*)</test-input>
<test-input>(*a*)</test-input>
(2) The following are all rejected as comments:
<test-input>(*abc*)(*abc*)</test-input>
<test-input>(</test-input>
<test-input>(*</test-input>
<test-input>(**</test-input>
<test-input>(***</test-input>
<test-input>(**a</test-input>
<test-input>(**a*</test-input>
<test-input>(**aa</test-input>
<test-input>(*a</test-input>
<test-input>(*a*</test-input>
<test-input>(*aa</test-input>
<test-input>a</test-input>
<test-input>)</test-input>
<test-input>*)</test-input>
<test-input>**)</test-input>
The first item on this list is Steven's last example. I made it a
negative case because it is not one comment, and I did not want to
bother trying to write a grammar for a series of comments interspersed
with non-comment data.
If you wish to solve this challenge independently, stop reading now;
spoiler alert.
My first solution is fairly straightforward:
comment = '(*', comment-element, '*)'.
-comment-element = (nonstar | ssbnsc)*, star*.
-nonstar = ~['*'].
-star = '*'.
{ star-something, but not star-closeparen }
-ssbnsc = '*'+, ~['*)'].
Since it's not recursive, it can readily be reduced to a single
production:
comment = '(*', (~['*'] | ('*'+, ~['*)']))*, '*'*, '*)'.
(Warning: I have not tested the single-production version. It appears
to expose a bug in Aparecium. Thank you, Steven! Either that, or there
is something wrong with my manual simplification that I am not seeing.)
In order to write grammars for things like this, I find myself drawing a
finite-state automaton on paper and turning it into a regular expression
like the RHS of 'comment-element' above. A simpler approach is to draw
a finite-state automaton and instead of turning it into a regular
expression, just turn it into a regular grammar. An FSA for
Pascal-style comments is attached as a PNG image:
This can be turned into a regular grammar for comments:
{ Pascal-style comments.
This is a regular language and does not require context-free power,
so in fact a simple way to design it is as a finite state automaton,
which can be transcribed as a regular grammar.
}
comment = '(*', q1.
{ At q1, we have just read '(' }
-q1 = '*', q2
| ')', qf
| ~['*'], q3.
{ At q2, we have just read an asterisk. }
-q2 = '*', q2
| ')', qf
| ~['*)'], q3.
{ At q3, we have just read something other than an asterisk or '*)' }
-q3 = '*', q2
| ~['*'], q3.
{ At qf, we are done. }
-qf = {final}.
I find it interesting that Aparecium runs much faster on the first
grammar given than on the regular grammar; I suspect that reflects slow
algorithms in the tree-construction part of my code.
Michael
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Attachments
- image/png attachment: pascal-comments.dot.png
Received on Friday, 4 February 2022 01:22:26 UTC