- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Mon, 02 Mar 2026 15:32:20 +0000
- To: public-ixml@w3.org
- Message-Id: <1772465496699.4195557378.4162711108@cwi.nl>
Ah yes, that was one of my questions, whether it was required to match the exact same characters.
Thanks!
Steven
On Monday 02 March 2026 16:26:42 (+01:00), John Lumley wrote:
In response to my action:
2026-02-03-a: JWL to study SP’s subtraction operator proposal
On 02/02/2026 17:35, Steven Pemberton wrote:
My question, what are the proposed semantics of A!B (or A - B if you like)? For instance, is there are a requirement that A and B span the same character positions? Does B have to be a subset of A, or is the exclusion of the intersection of the two implied?
The 'subtraction' operator I implemented (A¬B) was perhaps more restricted and designed to solve a specific problem in 'reserved keywords' in the XPath grammars.
It was defined to match a term A (that could be any tree of productions) unless B (which could also be any tree of productions) matched across the same input character range, e.g.
FunctionCall: FunctionName, -"("...
FunctionName: NCName ¬ ReservedNames.
ReservedNames: "item";"type";"element"...
In effect is was subtracting the second set from the first set of potential matches. Unless B was a subset of A (or A and B have a potentially overlapping intersection), it reduces to just the first term. As such it does require some mechanism to look down both paths, which I implemented in my Earley parser by some cross-linking between two paths being explored simultaneously (as if they were alternatives) and failing the first match (assuming it succeeds) if the second succeeds at the same end position. So in the example above:
elem() & elementary()
would succeed, but
element()
would fail.
I agree that this is not as powerful as the proposed negation/lookahead failure being considered, but it is a very common use case.
--
John Lumley MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
Received on Monday, 2 March 2026 15:32:27 UTC