- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Wed, 31 Aug 2022 13:27:11 +0000
- To: ixml <public-ixml@w3.org>
This week I fell into the same trap again of thinking in terms of greedy regexps while writing a grammar, and getting an immensely ambiguous result. Since last time I posted a puzzle like this we got such an interesting set of solutions, I thought I'd reduce it to its simplest form and post it here, and see what comes up. The input is a list of elements where each element is either a word, or a list. A list is surrounded by brackets "(" and ")". Spaces allowed anywhere (except within a word). The tricky part (as far as I am concerned) is that all spaces are optional, except between words. Some tests: () (one) ( one ) (()) ( ( ) ) (one two) (()one two) (one()two) ( one ( ) two ) (one two()) (one two three) ((one two)three) (one(two three)) (()(one two)three) ((()one two)three) ((one()two)three) ((one two())three) ((one two)()three) ((one two)three()) (()one(two three)) (one()(two three)) (one(()two three)) (one(two()three)) (one(two three())) (one(two three)()) ( Monday ( tea coffee ) Tuesday Wednesday ( morning ( coffee biscuits ) afternoon ( tea cakes ) ) Thursday ( closed ) ) (Monday(tea coffee)Tuesday Wednesday(morning(coffee biscuits)afternoon(tea cakes))Thursday(closed)) (All lines that begin with spaces also end with spaces) (That line can also be used as a test) (and that) Steven On Thursday 03 February 2022 22:06:10 (+01:00), Steven Pemberton wrote: > I was working on some grammars this week, and tripped up a couple of times, getting ambiguous grammars, I think because I was thinking in regexp terms, and grammars are not regexps. > The place where this was obvious was with repetition. Regexps are greedy, grammars aren't. So "x"* with a regexp will match a maximal number of x's, but not necessarily with a grammar. > > I reduced it to a small example in order to work on it more, and I thought maybe some of you would like to try it out too, before I give an example answer. I found it quite instructional. > > It's Pascal-style comments (XML comments would work similarly), comments surrounded by (* and *). > > Here is a test set: > > (**) > (***) > (****) > (*****) > (*abc*) > (**abc*) > (*abc**) > (*abc*abc*) > (**abc*abc*) > (*abc**abc*) > (*abc*abc**) > (*abc* )(*abc*) > (*abc*)(*abc*) > > (The last line contains two comments; the line before has a space before the first closing bracket). > > It was trickier than I expected at first, because as I said, although I was parsing the comments OK, the results were ambiguous. > > Steven
Received on Wednesday, 31 August 2022 13:27:32 UTC