Re: The semantics of the disambiguation constructs from Bethan Tovey-Walsh on 2026-03-02 (public-ixml@w3.org from March 2026)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Mon, 2 Mar 2026 16:09:13 +0000
To: John Lumley <john@saxonica.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <BFE7896D-FC2E-47BA-8FF2-1292CC3A94C4@linguacelta.com>

John,

I'd like to make sure I understand exactly how you envisage this working, if that's okay?

Let's take this grammar fragment:

 A: ["a"-"z"]*.
 B: "cat" ; "bat" ; "rat".
 C: A ¬ B.

I think I understand your view of the semantics to be this:

 C is an A, unless the entirety of C also matches B, in which case it is a B

So if we had the input "caterpillar", we'd get:

 <C>
     <A>caterpillar</A>
 </C>

and if we had "cat", we'd get:

 <C>
     <B>cat</B>
 </C>

and if we had "", we'd get:

 <C>
     <A/>
 </C>

So, in the rule

 C: A ¬ B.

we have something rather like

 C: A | B.

in that C, if it matches, can be either an A or a B. The ¬ operator is simply a way to indicate that it cannot be *both* A and B.

Is that correct?

BTW
___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com

Golygydd | Editor geirfan.cymru

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 2 Mar 2026, at 15:26, John Lumley <john@saxonica.com> wrote:
> 
> In response to my action:
> 2026-02-03-a: JWL to study SP’s subtraction operator proposal
> On 02/02/2026 17:35, Steven Pemberton wrote:
>> My question, what are the proposed semantics of A!B (or A - B if you like)? For instance, is there are a requirement that A and B span the same character positions? Does B have to be a subset of A, or is the exclusion of the intersection of the two implied?
> 
> The 'subtraction' operator I implemented (A¬B)  was perhaps more restricted and designed to solve a specific problem in 'reserved keywords' in the XPath grammars.
> It was defined to match a term A (that could be any tree of productions) unless B (which could also be any tree of productions) matched across the same input character range, e.g.
> FunctionCall: FunctionName, -"("...
> FunctionName: NCName ¬ ReservedNames.
> ReservedNames: "item";"type";"element"...
> In effect is was subtracting the second set from the first set of potential matches. Unless B was a subset of A (or A and B have a potentially overlapping intersection), it reduces to just the first term. As such it does require some mechanism to look down both paths, which I implemented in my Earley parser by some cross-linking between two paths being explored simultaneously (as if they were alternatives) and failing the first match (assuming it succeeds) if the second succeeds at the same end position. So in the example above:
> elem() & elementary()
> would succeed, but
> element()
> would fail.
> I agree that this is not as powerful as the proposed negation/lookahead failure being considered, but it is a very common use case. 
> -- 
> John Lumley MA PhD CEng FIEE
> john@saxonica.com

Received on Monday, 2 March 2026 16:09:32 UTC