Re: a surprising whitespace-related ambiguity from C. M. Sperberg-McQueen on 2024-03-19 (public-ixml@w3.org from March 2024)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 18 Mar 2024 22:09:11 -0600
To: LdBeth <andpuke@foxmail.com>
Cc: public-ixml@w3.org
Message-ID: <87le6en976.fsf@blackmesatech.com>
Thank you.  I'm not completely sure that this suggestion removes
the ambiguity.

Reduced a little further than in my earlier post, I guess the problem is
that when the string "--- this is a bug" ends a line, it can be parsed
either as a single comment, or as a minus sign followed by a single
comment.  Unless we make it illegal to break a line immediately
following a minus sign, those two parses will always be feasible, at
least at the micro level.  If the context is such that either whitespace
or the minus operator is acceptable, the result will be ambiguity in the
document.

The solution I have tentatively adopted is to say that minus sign can be
followed by optional 'cautious whitespace', which is defined as
optional whitespace beginning with at least one whitespace character,
at least one slash-star comment, or at least one double-slash comment.

Michael


LdBeth <andpuke@foxmail.com> writes:

> A suggestion, is define
>
> LineBreak =: // comment
>           |  -- comment
>           | newline character
>
> Spaces =: space character+ | /* comment */
>
> So your example would be parsed as
>
> (block
>   (newline)
>   (expr (equal a b))
>   (newline (comment "------ this is a bug!"))
>   (expr (equal c d))
>   (newline))
>
> And if the "newline" level is removed
>
> (block
>   (expr (equal a b))
>   (comment "------ this is a bug!")
>   (expr (equal c d)))
>
>>>>>> In <87plvrlzqk.fsf@blackmesatech.com> 
>>>>>>	"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> wrote:
>> I'm working on an ixml grammar for a language with the usual rules that
>> all binary operators are left-associative and that whitespace is
>> optional around operators, and which has three kinds of comments:
>
>>   - from // to the end of the line
>>   - from -- to the end of the line
>>   - from /* to */
>
>> A common idiom in the language is a block enclosed in braces containing
>> a sequence of whitespace-delimited expressions.  For example:
>
>>   {
>>     a = b ----------- this is a bug!
>>     c = d
>>   }
>
>> This turned out to be ambiguous.  One parse was what I expected:
>
>>   block
>>       expression
>>           equality
>>               a
>>               b
>>       comment
>>       expression
>>           equality
>>               c
>>               d
>
>> The other parse was unexpected:
>
>>   block
>>       expression
>>           equality
>>               equality
>>                   a
>>                   set-difference
>>                     b
>>                     comment
>>                     c
>>           d
>
>> In the second parse, the first hyphen in the comment string is taken as
>> a set-difference operator, followed immediately by a double-hyphen
>> comment.  And since "a = b - c = d" is legal (and equivalent to
>> "(a = (b -c)) = d", the ixml parser reported an ambiguity.
>
>> I suspect that equality and other comparison operators are not really
>> left-associative in the language (although they are in some), but
>> changing that would still leave an ambiguity for input like:
>
>>    {
>>      a ----- why is this here?
>>      b
>>    }
>
>> I sometimes find it challenging to write ixml grammars in such a way as
>> to reproduce the behavior of two-level parsers with dedicated lexers.
>> (And don't ask me about operator-precedence tables.)
>
>> -- 
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> http://blackmesatech.com


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Tuesday, 19 March 2024 04:21:13 UTC