Re: a surprising whitespace-related ambiguity

>>>>> In <87le6en976.fsf@blackmesatech.com> 
>>>>>	"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> wrote:
> Thank you.  I'm not completely sure that this suggestion removes
> the ambiguity.

> Reduced a little further than in my earlier post, I guess the problem is
> that when the string "--- this is a bug" ends a line, it can be parsed
> either as a single comment, or as a minus sign followed by a single
> comment.  Unless we make it illegal to break a line immediately
> following a minus sign, those two parses will always be feasible, at
> least at the micro level.  If the context is such that either whitespace
> or the minus operator is acceptable, the result will be ambiguity in the
> document.

> The solution I have tentatively adopted is to say that minus sign can be
> followed by optional 'cautious whitespace', which is defined as
> optional whitespace beginning with at least one whitespace character,
> at least one slash-star comment, or at least one double-slash comment.

Or as a generalization, '-' cannot be followed by anything that starts
with '-', unless the is a whitespace.

To just mimic the "longest match of operators", say to parse
'+++++++i' as '(++)(++)(++)+i'

the rule would be

unary-plusplus: '++', (start-with-plus; not-start-with-plus) .
unary-plus: '+', not-start-with-plus .

Adding more operators would results more complicated rules, but the idea is
creating syntactic constructs group by which character starts the group.
Hopefully, this would not require post processing of the result XML.

Received on Tuesday, 19 March 2024 17:34:10 UTC