Re: Grammars are not Regexps

This week I fell into the same trap again of thinking in terms of greedy 
regexps while writing a grammar, and getting an immensely ambiguous result.

Since last time I posted a puzzle like this we got such an interesting set 
of solutions, I thought I'd reduce it to its simplest form and post it 
here, and see what comes up.

The input is a list of elements where each element is either a word, or a 
list.  A list is surrounded by brackets "(" and ")". Spaces allowed 
anywhere (except within a word). 

The tricky part (as far as I am concerned) is that all spaces are optional, 
except between words.

Some tests:

()
(one)
   (  one  )  
(())
   (  (  )  )  
(one two)
(()one two)
(one()two)
   (  one  (  )  two  )  
(one two())
(one two three)
((one two)three)
(one(two three))
(()(one two)three)
((()one two)three)
((one()two)three)
((one two())three)
((one two)()three)
((one two)three())
(()one(two three))
(one()(two three))
(one(()two three))
(one(two()three))
(one(two three()))
(one(two three)())
   (  Monday  (  tea  coffee  )  Tuesday  Wednesday  (  morning  (  coffee  
biscuits  )  afternoon  (  tea  cakes  )  )  Thursday  (  closed  )  )  
(Monday(tea coffee)Tuesday Wednesday(morning(coffee biscuits)afternoon(tea 
cakes))Thursday(closed))  

(All lines that begin with spaces also end with spaces)
(That line can also be used as a test)
(and that)

Steven

On Thursday 03 February 2022 22:06:10 (+01:00), Steven Pemberton wrote:

 > I was working on some grammars this week, and tripped up a couple of 
times, getting ambiguous grammars, I think because I was thinking in regexp 
terms, and grammars are not regexps.
 > The place where this was obvious was with repetition. Regexps are 
greedy, grammars aren't. So "x"* with a regexp will match a maximal number 
of x's, but not necessarily with a grammar.
 >
 > I reduced it to a small example in order to work on it more, and I 
thought maybe some of you would like to try it out too, before I give an 
example answer. I found it quite instructional.
 >
 > It's Pascal-style comments (XML comments would work similarly), comments 
surrounded by (* and *).
 >
 > Here is a test set:
 >
 > (**)
 > (***)
 > (****)
 > (*****)
 > (*abc*)
 > (**abc*)
 > (*abc**)
 > (*abc*abc*)
 > (**abc*abc*)
 > (*abc**abc*)
 > (*abc*abc**)
 > (*abc* )(*abc*)
 > (*abc*)(*abc*)
 >
 > (The last line contains two comments; the line before has a space before 
the first closing bracket).
 >
 > It was trickier than I expected at first, because as I said, although I 
was parsing the comments OK, the results were ambiguous.
 >
 > Steven

Received on Wednesday, 31 August 2022 13:27:32 UTC