Re: a surprising whitespace-related ambiguity

A suggestion, is define

LineBreak =: // comment
          |  -- comment
          | newline character

Spaces =: space character+ | /* comment */

So your example would be parsed as

(block
  (newline)
  (expr (equal a b))
  (newline (comment "------ this is a bug!"))
  (expr (equal c d))
  (newline))

And if the "newline" level is removed

(block
  (expr (equal a b))
  (comment "------ this is a bug!")
  (expr (equal c d)))

>>>>> In <87plvrlzqk.fsf@blackmesatech.com> 
>>>>>	"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> wrote:
> I'm working on an ixml grammar for a language with the usual rules that
> all binary operators are left-associative and that whitespace is
> optional around operators, and which has three kinds of comments:

>   - from // to the end of the line
>   - from -- to the end of the line
>   - from /* to */

> A common idiom in the language is a block enclosed in braces containing
> a sequence of whitespace-delimited expressions.  For example:

>   {
>     a = b ----------- this is a bug!
>     c = d
>   }

> This turned out to be ambiguous.  One parse was what I expected:

>   block
>       expression
>           equality
>               a
>               b
>       comment
>       expression
>           equality
>               c
>               d

> The other parse was unexpected:

>   block
>       expression
>           equality
>               equality
>                   a
>                   set-difference
>                     b
>                     comment
>                     c
>           d

> In the second parse, the first hyphen in the comment string is taken as
> a set-difference operator, followed immediately by a double-hyphen
> comment.  And since "a = b - c = d" is legal (and equivalent to
> "(a = (b -c)) = d", the ixml parser reported an ambiguity.

> I suspect that equality and other comparison operators are not really
> left-associative in the language (although they are in some), but
> changing that would still leave an ambiguity for input like:

>    {
>      a ----- why is this here?
>      b
>    }

> I sometimes find it challenging to write ixml grammars in such a way as
> to reproduce the behavior of two-level parsers with dedicated lexers.
> (And don't ask me about operator-precedence tables.)

> -- 
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com

Received on Tuesday, 19 March 2024 03:18:44 UTC