- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 03 Feb 2022 18:22:07 -0700
- To: public-ixml@w3.org
- Message-ID: <87tudfo280.fsf@blackmesatech.com>
The challenge as I interpreted it: Write a grammar such that (1) The following are all accepted as comments: <test-input>(**)</test-input> <test-input>(***)</test-input> <test-input>(****)</test-input> <test-input>(*****)</test-input> <test-input>(*abc*)</test-input> <test-input>(**abc*)</test-input> <test-input>(*abc**)</test-input> <test-input>(*abc*abc*)</test-input> <test-input>(**abc*abc*)</test-input> <test-input>(*abc**abc*)</test-input> <test-input>(*abc*abc**)</test-input> <test-input>(*abc* )(*abc*)</test-input> <test-input>(***a*)</test-input> <test-input>(**a*)</test-input> <test-input>(**a*a*)</test-input> <test-input>(**aa*)</test-input> <test-input>(*a*)</test-input> (2) The following are all rejected as comments: <test-input>(*abc*)(*abc*)</test-input> <test-input>(</test-input> <test-input>(*</test-input> <test-input>(**</test-input> <test-input>(***</test-input> <test-input>(**a</test-input> <test-input>(**a*</test-input> <test-input>(**aa</test-input> <test-input>(*a</test-input> <test-input>(*a*</test-input> <test-input>(*aa</test-input> <test-input>a</test-input> <test-input>)</test-input> <test-input>*)</test-input> <test-input>**)</test-input> The first item on this list is Steven's last example. I made it a negative case because it is not one comment, and I did not want to bother trying to write a grammar for a series of comments interspersed with non-comment data. If you wish to solve this challenge independently, stop reading now; spoiler alert. My first solution is fairly straightforward: comment = '(*', comment-element, '*)'. -comment-element = (nonstar | ssbnsc)*, star*. -nonstar = ~['*']. -star = '*'. { star-something, but not star-closeparen } -ssbnsc = '*'+, ~['*)']. Since it's not recursive, it can readily be reduced to a single production: comment = '(*', (~['*'] | ('*'+, ~['*)']))*, '*'*, '*)'. (Warning: I have not tested the single-production version. It appears to expose a bug in Aparecium. Thank you, Steven! Either that, or there is something wrong with my manual simplification that I am not seeing.) In order to write grammars for things like this, I find myself drawing a finite-state automaton on paper and turning it into a regular expression like the RHS of 'comment-element' above. A simpler approach is to draw a finite-state automaton and instead of turning it into a regular expression, just turn it into a regular grammar. An FSA for Pascal-style comments is attached as a PNG image:
This can be turned into a regular grammar for comments: { Pascal-style comments. This is a regular language and does not require context-free power, so in fact a simple way to design it is as a finite state automaton, which can be transcribed as a regular grammar. } comment = '(*', q1. { At q1, we have just read '(' } -q1 = '*', q2 | ')', qf | ~['*'], q3. { At q2, we have just read an asterisk. } -q2 = '*', q2 | ')', qf | ~['*)'], q3. { At q3, we have just read something other than an asterisk or '*)' } -q3 = '*', q2 | ~['*'], q3. { At qf, we are done. } -qf = {final}. I find it interesting that Aparecium runs much faster on the first grammar given than on the regular grammar; I suspect that reflects slow algorithms in the tree-construction part of my code. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Attachments
- image/png attachment: pascal-comments.dot.png
Received on Friday, 4 February 2022 01:22:26 UTC