Re: Line/string triage? from Fredrik Öhrström on 2025-03-22 (public-ixml@w3.org from March 2025)

From: Fredrik Öhrström <oehrstroem@gmail.com>
Date: Sat, 22 Mar 2025 11:07:36 +0100
To: ixml <public-ixml@w3.org>
Message-ID: <CALZT+jAsPYACZAPP+eZhH1ePq7_AV33pjqP9qrdd7-zDS=QVUw@mail.gmail.com>

The reason that you cannot use:

 identifier: [L]+, -~[L].

is that the next character will be permanently eaten into the identifier.
If the next character is whitespace, then fine. But in most programming
languages based on lexer+parser
you can write: "floor+1" or "floor   +  1" therefore the eaten character
could be a vital character for parsing.

If you try:

 identifier: [L]+, -" "*.

Then again, your are back to multiple interpretations of floor:
"fl" "oor", etc, since the spaces can be empty. I.e. the identifier rule is
not greedy when eating the letters.

//Fredrik


Den lör 22 mars 2025 kl 10:37 skrev Fredrik Öhrström <oehrstroem@gmail.com>:

> > That feels like it would be a significant departure from the current
>> semantics where I would expect ![L] to match and consume one non-letter
>> character.
>>
>
> The current semantics is for ~[L] which means match and consume one
> non-letter character. The not operator ! is merely a proposal. :-)
>
> Anchoring (matching without consuming) does feel different, but don’t we
>> already have ~[L] to match and consume one non-letter character.
>
>
> It is different, but since we are talking about a new not operator (!), I
> was merely thinking that if we have a grammar:
>
> line : !"chapter", ~[#a]+.
>
> The not operator performs a lookahead and blocks further entry into the
> rule, but there is nothing that requires the rule to actually consume the
> lookahead after it has been checked, right?
>
> Therefore we could place the not at the end of a rule! The typical use
> case would be to greedy accumulate all letters into an identifier and then
> stop when there are no more letters. End of file also satifies ![L].
>
> Usually the lexer does this, if you have one.
>
> //Fredrik
>
>

Received on Saturday, 22 March 2025 10:08:06 UTC