Re: The NOT construct from John Lumley on 2025-01-20 (public-ixml@w3.org from January 2025)

From: John Lumley <john@saxonica.com>
Date: Mon, 20 Jan 2025 15:33:56 +0000
To: Steven Pemberton <steven.pemberton@cwi.nl>, ixml <public-ixml@w3.org>
Message-ID: <e3186257-d166-4c53-9896-00fa6a4e60e9@saxonica.com>

On 07/01/2025 12:46, Steven Pemberton wrote:
> I have informally convinced myself that the NOT proposal by 'alfsb' 
> (https://github.com/invisibleXML/ixml/issues/249#issuecomment-2549656848):
>
> * Is functionally equivalent to the subtraction proposal, but 
> computationally more efficient. (Because with subtraction, both parts 
> are always tried, but with 'not', the second is tried only if the 
> first fails.)
> * Can be implemented in all general parsing algorithms.
> * Solves the problem without the need of the complication of a lexer 
> stage (which would need a similar mechanism anyway).
> * Is pleasantly declarative.
>
> EXAMPLE
>
> identifier: !keyword, [L]+.
> keyword: "if"; "then"; "else"; "begin"; "end".
>
> I think we should study it further.
>
During the last meeting (https://www.w3.org/2025/01/07-ixml-minutes) I 
said the NOT operator was like the subtraction one I've been using. On 
further reflection this is *NOT* the case.

For example in

functionCall: QName ¬ keyword,-"(", arguments, -")".
keyword: "if" | "return" | ....

would match i(...) oriff(..) but not if(...).

But using the NOT operator:

functionCall: !keyword, QName,-"(", arguments, -")".
keyword: "if" | "return" | ....

would succeed fori(...) and fail for if(...), as expected, but would 
also fail for iff(...) which is not the intent as the leading two 
characters match keyword and cause the rest of the production to fail.

  The key to the subtraction is that it fails if the RH term completes 
over /exactly the same characters/ as the LH term, but succeeds in all 
other cases where the LH term matches.


-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com

Received on Monday, 20 January 2025 15:34:10 UTC