- From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
- Date: Mon, 2 Mar 2026 17:11:36 +0000
- To: John Lumley <john@saxonica.com>
- Cc: ixml <public-ixml@w3.org>
In that case, I think you can also use your ¬ for lookahead. Example: I have a long string made up of the characters a-z, and I want to parse out the names of plants or animals that start with "cat", including "cat" itself. However, that means that I need to match "cat" if and only if it isn't part of one of the longer plant/animal names. Using a negative lookahead, which I'll notate with ! for now, we could do this: things: (animal ; plant ; char)+. animal: "cat"!lookahead ; catanimal. plant: "catnip" ; "cattails" ; "catmint". -lookahead: "nip" ; "tails" ; "mint" ; "fish" ; "erpillar". -catanimal: "caterpillar" ; "catfish". -char: -["ab"; "d"-"z"] ; "c"!catstring. -catstring: "at" , lookahead?. The string "cat" is only tagged as an animal if it is not followed by a string that would complete one of the longer animal or plant names. A "c" is only a char if it doesn't also begin any of the animal or plant names. With your syntax, it seems to me that I could get the same result by doing this: things: (animal ; plant ; chars)+. animal: cat ¬ catname ; catanimal. plant: catplant. -cat: "cat", char_plus. -catname: catplant ; catanimal. -catplant: "catnip" ; "cattails" ; "catmint". -catanimal: "catfish" ; "caterpillar". -catstring: animal ; plant. -chars: char_plus ¬ catstring. -char_plus: -["a"-"z"]+. Either way, parsing this input string: diadshupcatasbiupfdacattailsasdhuopcatfishasdbhi should give me: <things> <animal>cat</animal> <plant>cattails</plant> <animal>catfish</animal> </things> I suspect there are some types of lookahead that would be harder (or impossible) to do this way, because of the constraint that, in C: A ¬ B , C must be able to match the same span as B. That meant that I couldn't define the char nonterminal as a single character, as I did in the lookahead example. BTW ___________________________________________________ Dr. Bethan Tovey-Walsh linguacelta.com Golygydd | Editor geirfan.cymru Croeso i chi ysgrifennu ataf yn y Gymraeg. > On 2 Mar 2026, at 16:16, John Lumley <john@saxonica.com> wrote: > > On 02/03/2026 16:09, Bethan Tovey-Walsh wrote: >> Let's take this grammar fragment: >> >> A: ["a"-"z"]*. >> B: "cat" ; "bat" ; "rat". >> C: A ¬ B. >> >> I think I understand your view of the semantics to be this: >> >> C is an A, unless the entirety of C also matches B, in which case it is a B > No - it's effectively: > C is an A, unless the entirety of C also matches B, in which case not a C >> So if we had the input "caterpillar", we'd get: >> >> <C> >> <A>caterpillar</A> >> </C> >> > Yes >> >> and if we had "cat", we'd get: >> >> <C> >> <B>cat</B> >> </C> >> > No - this would fail, as in the example where element() failed >> >> and if we had "", we'd get: >> >> <C> >> <A/> >> </C> > Yes, as B would not (yet) have succeeded when A did at the end of input. >> >> So, in the rule >> >> C: A ¬ B. >> >> we have something rather like >> >> C: A | B. >> >> in that C, if it matches, can be either an A or a B. The ¬ operator is simply a way to indicate that it cannot be *both* A and B. > No - it is more like a set-reduction (difference) operator. > -- > John Lumley MA PhD CEng FIEE > john@saxonica.com
Received on Monday, 2 March 2026 17:11:56 UTC