Re: XPath support for using multiple ixml implementations from Bethan Tovey-Walsh on 2026-03-03 (public-ixml@w3.org from March 2026)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Tue, 3 Mar 2026 16:18:17 +0000
To: Wendell Piez <wapiez@wendellpiez.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <D09E4B09-23DA-4752-8110-A518D339EE86@linguacelta.com>
At the moment, the spec requires that any implementation must return a single valid parse. I think that requirement should probably be kept (you seemed to imply that it should be removed - sorry if I've misunderstood you).

This email got long, so I'm going to insert a quick summary of my conclusions.

TL;DR:

- I think that all iXML implementations should, by default, continue to return an output if the parse is successful, even in cases where there is ambiguity.
- I don't think what we currently have in the spec gives enough clarity to users about why they get different results from different processors, or about the potential for such behaviour to change without warning.

I'm going to reiterate the problem as I understand it, to try and ensure that we're talking about the same things. Apologies if this is repetitive.

Given that a requirement to return a single valid parse exist, I also believe that users need greater clarity about *which* parse they're getting and why. 

Most processors, as far as I can tell, are deterministic. (Michael S. McQ's parser is a notable exception - he deliberately selected the parse at random). Given an ambiguous grammar/input combination, most processors will return the same parse every time. 

However, processors don't necessarily all return the same parse. Some users now choose which implementation to use for a given input purely based on the parse that implementation happens to select in cases of ambiguity.

I think that's not a great situation, for a number of reasons:

a) users may be misled to expect that an implementation will reliably always return the same parse - but since this is not a conformance issue, any implementation can change its behaviour any time it likes, for any reason (or none);

b) if users are going to select an implementation purely because they like its algorithm for selecting parses, that's fine - but I would feel happier if users were aware that this behaviour isn't something special that only their favoured parser can or will deliver; in fact, there are ways of asking at least some implementations to deliver a different parse from their default option, rather than switching to a different implementation to get that parse.

That final point, I think, is important. The discussion so far leaves me thinking that some people have the impression that e.g. CoffeePot and MarkupBlitz return different results because they are somehow interpreting the grammar differently, or interpreting the rules of iXML differently, rather than just because the implementors chose different ways to select the "first" parse amongst many. 

There is a certain amount of fuzziness with some grammars, where some parsing algorithms will find ambiguity and others will not. And, of course, implementations may offer a variety of modes that don't conform strictly to the spec and that change whether all possible parses are found. But, in the majority of cases, and behaving as mandated in the spec, CoffeePot and MarkupBlitz and jωiXML and ixampl (etc. etc.) are expected to find the same set of output parses. 

So CoffeePot is not the only implementation that can or will identify the parse that it chooses to output. That parse should also be found by any conforming implementation (with the minor exceptions already noted) - the question is whether any other implementation gives the user a way to retrieve that particular parse rather than its own default.

I think that we might find a better approach, so that users are not left asking for ways to attach a given processor to a given grammar just in order to access a particular parse which happens to be the (current) default output of that processor.

BTW
___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com

Golygydd | Editor geirfan.cymru

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 3 Mar 2026, at 15:41, Wendell Piez <wapiez@wendellpiez.com> wrote:
> 
> Bethan,
> 
> My 'simply stop' option is intended only as a point of accommodation where users can test that their grammars are 'correct' to some definition, that is not subject (only) to the implementation.
> 
> Nor would one have to say that unambiguous grammars are some kind of ideal. Quite the opposite.
> 
> One probably wouldn't use the mode, or use the mode much. It is mainly so that implementors have freedom to add features without entanglements with one another. To agree on where to stop agreeing.
> 
> As for the risk of siloing, whose risk is that, exactly?
> 
> Cheers, Wendell
> 
> 
> 
> On Tue, Mar 3, 2026 at 10:37 AM Bethan Tovey-Walsh <bytheway@linguacelta.com> wrote:
> Thanks, Wendell - I hadn't thought of the possibility that one might simply stop at ambiguities. 
> 
> I'm not a fan of the idea for a couple of reasons. 
> 
> Ambiguity isn't (always) a bug, and I'm uncomfortable with implying that unambiguous grammar/input combos are the ideal for iXML.
> 
> I also fear that this approach would not mitigate the problems I'm worried about. Different implementations' "other" behaviours wrt ambiguity would, if anything, potentially become even more siloed, leading to a greater risk of lock-in. And I'm not sure that it would be any clearer to users that the behaviour they can currently expect from ProcessorX is not mandated by the spec, and thus cannot be relied on in perpetuity.
> 
> Very best,
> 
> BTW
> ___________________________________________________ 
> Dr. Bethan Tovey-Walsh 
> 
> linguacelta.com
> 
> Golygydd | Editor geirfan.cymru
> 
> Croeso i chi ysgrifennu ataf yn y Gymraeg.
> 
> > On 3 Mar 2026, at 15:13, Wendell Piez <wapiez@wendellpiez.com> wrote:
> > 
> > Hello,
> > 
> > Bethan raises an important and sensitive question, not about ambiguities as such (or not only), but about 'conformance' and how its boundaries are to be defined.
> > 
> > IMV, the spec should say only
> > 
> > - A conformant processor must offer a mode to stop and report ambiguities instead of handling them
> > - If such a mode is available, processors are free to offer any other modes and features, as features
> > 
> > Maybe the mandated 'no ambiguities' mode would only have to report, not stop. But making it stop might be better.
> > 
> > If the spec is murkier, I would foresee a need for external tools to test for ambiguity. Is that formally possible?
> > 
> > Interesting problem, thanks --
> > Wendell
> > 
> > 
> > 
> > 
> > On Tue, Mar 3, 2026 at 9:57 AM Bethan Tovey-Walsh <bytheway@linguacelta.com> wrote:
> > I am concerned that the spec's lack of rules about how the parse is chosen may be leading to confusion for users. It's particularly worrying if people believe that an implementation will always give them an expected result, since there's no requirement for that. The next iteration of MarkupBlitz or CoffeePot could, in theory, have changes to its algorithm which leads to a change in this behaviour, and this would still be fully conformant to the specification. If users rely on getting a particular parse from an ambiguous grammar, this may also lead to lock-in, which is a result we really should try to avoid if we can. 
> > 
> > I think we should discuss this problem and maybe throw around ideas about how to address it. It may be that there's no better approach than what we have now, but I do think it's at least worth a discussion. 
> > 
> > Potential approaches that come to mind immediately (I'm not saying any of these is necessarily good; I'm just trying to list things that would solve some part of the problem):
> > 
> > - require implementations to return a randomly chosen parse;
> > - require implementations to use a particular algorithm to choose the parse. 
> > 
> > Other possibilities?
> > 
> > BTW
> > 
> > ****************************************************
> > Dr. Bethan Tovey-Walsh
> > linguacelta.com
> > Golygydd | Editor geirfan.cymru
> > Croeso i chi ysgrifennu ataf yn y Gymraeg
> > 
> >> On 3 Mar 2026, at 13:12, Gunther Rademacher <grd@gmx.net> wrote:
> >> 
> >> On 3/3/2026 10:52 AM, Norm Tovey-Walsh wrote:
> >>>> From Sheila's presentation what I understood about the differences between the implementations is that, above all, Gunther Rademacher's Markup Blitz seems to have implemented some 'magic' disambiguation logic, which produces desirable (and in some way expected) results.
> >>> No, I don’t think that’s what’s happening. Given an ambiguous parse, the processor has to pick one. It just happens that Markup Blitz and NineML choose different parses and, for Sheila’s application, the arbitrary choice that Markup Blitz made worked better than the arbitrary choice NineML made.
> >> 
> >> No, the behavior in Markup Blitz is neither “magic” nor purely arbitrary.
> >> 
> >> When Markup Blitz encounters an ambiguity, it maintains, for each alternative, a queue of deferred actions. An action in this sense is the completion of a terminal or nonterminal that will eventually be delivered into the parse tree. Among competing alternatives, Markup Blitz selects one with the smallest number of pending actions.
> >> 
> >> Operationally, this corresponds to choosing an alternative that requires the fewest derivation steps. In practice, this typically results in the most compact parse tree. I have found that this strategy aligns well with my own expectations, and my understanding is that others share this assessment.
> >> 
> >> For reference, the relevant code is here:
> >> https://github.com/GuntherRademacher/markup-blitz/blob/9481b2f295110174978796dbcaf33603a46f20a5/src/main/java/de/bottlecaps/markup/blitz/Parser.java#L730-L731
> >> 
> >> Best regards
> >> Gunther
> >> 
> > 
> > 
> > -- 
> > ...Wendell Piez... ...wendellpiez.com...
> > ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez...
> 
> 
> 
> -- 
> ...Wendell Piez... ...wendellpiez.com...
> ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez...
Received on Tuesday, 3 March 2026 16:18:36 UTC