a surprising whitespace-related ambiguity

I'm working on an ixml grammar for a language with the usual rules that
all binary operators are left-associative and that whitespace is
optional around operators, and which has three kinds of comments:

  - from // to the end of the line
  - from -- to the end of the line
  - from /* to */

A common idiom in the language is a block enclosed in braces containing
a sequence of whitespace-delimited expressions.  For example:

  {
    a = b ----------- this is a bug!
    c = d
  }

This turned out to be ambiguous.  One parse was what I expected:

  block
      expression
          equality
              a
              b
      comment
      expression
          equality
              c
              d

The other parse was unexpected:

  block
      expression
          equality
              equality
                  a
                  set-difference
                    b
                    comment
                    c
          d

In the second parse, the first hyphen in the comment string is taken as
a set-difference operator, followed immediately by a double-hyphen
comment.  And since "a = b - c = d" is legal (and equivalent to
"(a = (b -c)) = d", the ixml parser reported an ambiguity.

I suspect that equality and other comparison operators are not really
left-associative in the language (although they are in some), but
changing that would still leave an ambiguity for input like:

   {
     a ----- why is this here?
     b
   }

I sometimes find it challenging to write ixml grammars in such a way as
to reproduce the behavior of two-level parsers with dedicated lexers.
(And don't ask me about operator-precedence tables.)

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 19 March 2024 02:30:55 UTC