- From: Fredrik Öhrström <oehrstroem@gmail.com>
- Date: Sat, 22 Mar 2025 11:07:36 +0100
- To: ixml <public-ixml@w3.org>
- Message-ID: <CALZT+jAsPYACZAPP+eZhH1ePq7_AV33pjqP9qrdd7-zDS=QVUw@mail.gmail.com>
The reason that you cannot use: identifier: [L]+, -~[L]. is that the next character will be permanently eaten into the identifier. If the next character is whitespace, then fine. But in most programming languages based on lexer+parser you can write: "floor+1" or "floor + 1" therefore the eaten character could be a vital character for parsing. If you try: identifier: [L]+, -" "*. Then again, your are back to multiple interpretations of floor: "fl" "oor", etc, since the spaces can be empty. I.e. the identifier rule is not greedy when eating the letters. //Fredrik Den lör 22 mars 2025 kl 10:37 skrev Fredrik Öhrström <oehrstroem@gmail.com>: > > That feels like it would be a significant departure from the current >> semantics where I would expect ![L] to match and consume one non-letter >> character. >> > > The current semantics is for ~[L] which means match and consume one > non-letter character. The not operator ! is merely a proposal. :-) > > Anchoring (matching without consuming) does feel different, but don’t we >> already have ~[L] to match and consume one non-letter character. > > > It is different, but since we are talking about a new not operator (!), I > was merely thinking that if we have a grammar: > > line : !"chapter", ~[#a]+. > > The not operator performs a lookahead and blocks further entry into the > rule, but there is nothing that requires the rule to actually consume the > lookahead after it has been checked, right? > > Therefore we could place the not at the end of a rule! The typical use > case would be to greedy accumulate all letters into an identifier and then > stop when there are no more letters. End of file also satifies ![L]. > > Usually the lexer does this, if you have one. > > //Fredrik > >
Received on Saturday, 22 March 2025 10:08:06 UTC