- From: Wendell Piez <wapiez@wendellpiez.com>
- Date: Tue, 3 Mar 2026 13:16:12 -0500
- To: Bethan Tovey-Walsh <bytheway@linguacelta.com>
- Cc: Sheila Thomson <discuss@bluegumtree.com>, public-ixml@w3.org
- Message-ID: <CAAO_-xzO3t=75J8A28fOo3KxV6oooAShXOiKrcshhzrKnWam6g@mail.gmail.com>
Bethan, I think you are exactly right: > I suspect that Michael chose randomly each time in order to remind users that there was no requirement that the processor should behave predictably when choosing a parse. In other words, he thought (or would or might have thought) what Norm thought, namely that my first instinct -- cry Halt! -- was too rude, and delivering something was incumbent. But he also wanted to make it clear to his user that it was their problem to think about and resolve, not the implementation's. This would be appropriate especially for a tool designed for testing and probing, as opposed to a processor built for a production environment (say). And just as you say, the finer point is that not only does any implementation not have to do any particular thing, it doesn't even always have to do the same thing as it did last time, and might specifically decide not to (perhaps to address a meta-requirement such as not promising what they do not plan to deliver). Personally I am not bothered by this as long as I know when I'm in that territory, which is why I want 'noisy' settings. Thanks again for reading, Wendell On Tue, Mar 3, 2026 at 12:53 PM Bethan Tovey-Walsh <bytheway@linguacelta.com> wrote: > Indeed, I do expect that the same version of an implementation should > always give me the same result, whether the grammar is ambiguous or not. > Definitely if the grammar is unambiguous. > > > If it's unambiguous, there's only one valid parse, so you should always > get the same output. But in cases of ambiguity, the spec quite clearly > allows a processor to return any of the valid parses, without any > constraint. That includes not constraining an implementation to returning a > predictable default parse each time. > > Norm could decide that, on April 1st, CoffeePot will return the second > parse it discovers in cases of ambiguity. That would be completely > conformant behaviour. It would make him an asshole, but it would be > conformant. > > I suspect that Michael chose randomly each time in order to remind users > that there was no requirement that the processor should behave predictably > when choosing a parse. > > BTW > > > **************************************************** > > Dr. Bethan Tovey-Walsh > > linguacelta.com > > Golygydd | Editor geirfan.cymru > > Croeso i chi ysgrifennu ataf yn y Gymraeg > > On 3 Mar 2026, at 16:59, Sheila Thomson <discuss@bluegumtree.com> wrote: > > > > >> It's particularly worrying if people believe that an implementation > will always give them an expected result, since there's no requirement for > that. > > Indeed, I do expect that the same version of an implementation should > always give me the same result, whether the grammar is ambiguous or not. > Definitely if the grammar is unambiguous. > > >> The next iteration of MarkupBlitz or CoffeePot could, in theory, have > changes to its algorithm which leads to a change in this behaviour, and > this would still be fully conformant to the specification. > > Yes, when upgrading I'd expect to need to verify that the expected > result(s) hadn't changed or, if they had, that the changes were acceptable. > > >> If users rely on getting a particular parse from an ambiguous grammar, > this may also lead to lock-in > > True. However, receiving a random parse from an ambiguous grammar would, I > think, effectively make them impractical to use for most purposes. If the > parser were to return all possible result versions then I can see a > potential workaround to identify and select the desired version with a > post-validation step, ie. RNG/C+Schematron. But I suspect that the volume > of result versions might make that a long and memory-hungry exercise. If > so, we're back to it being impractical for many purposes. > > There is already an element of lock-in, per the current Invisible XML > Specification (2022-06-20), as it's not guaranteed that a grammar that is > considered unambiguous by one iXML parser will be considered unambiguous by > all other iXML parsers: > > >> Different processors may vary in whether input is detected as > ambiguous or not." > > As far as I can tell, the spec also doesn't mandate that an implementation > must use the same algorithm for each parse. Even if it sticks to one > algorithm (always and forever), as has already been pointed out, there's no > requirement that the same variant must be returned each time it applies a > grammar it deems ambiguous. > > That aside, by allowing a grammar to be considered ambiguous per one > algorithm but not another, doesn't this inherently promote "shopping > around" for the algorithm that best suits the needs of each specific > grammar? > Sheila > > > On 2026-03-03 14:56, Bethan Tovey-Walsh wrote: > > I am concerned that the spec's lack of rules about how the parse is chosen > may be leading to confusion for users. It's particularly worrying if people > believe that an implementation will always give them an expected result, > since there's no requirement for that. The next iteration of MarkupBlitz or > CoffeePot could, in theory, have changes to its algorithm which leads to a > change in this behaviour, and this would still be fully conformant to the > specification. If users rely on getting a particular parse from an > ambiguous grammar, this may also lead to lock-in, which is a result we > really should try to avoid if we can. > > I think we should discuss this problem and maybe throw around ideas about > how to address it. It may be that there's no better approach than what we > have now, but I do think it's at least worth a discussion. > > Potential approaches that come to mind immediately (I'm not saying any of > these is necessarily good; I'm just trying to list things that would solve > some part of the problem): > > - require implementations to return a randomly chosen parse; > - require implementations to use a particular algorithm to choose the > parse. > > Other possibilities? > > BTW > > **************************************************** > > Dr. Bethan Tovey-Walsh > > linguacelta.com > > Golygydd | Editor geirfan.cymru > > Croeso i chi ysgrifennu ataf yn y Gymraeg > > On 3 Mar 2026, at 13:12, Gunther Rademacher <grd@gmx.net> <grd@gmx.net> > wrote: > > On 3/3/2026 10:52 AM, Norm Tovey-Walsh wrote: > > From Sheila's presentation what I understood about the differences between > the implementations is that, above all, Gunther Rademacher's Markup Blitz > seems to have implemented some 'magic' disambiguation logic, which produces > desirable (and in some way expected) results. > > No, I don’t think that’s what’s happening. Given an ambiguous parse, the > processor has to pick one. It just happens that Markup Blitz and NineML > choose different parses and, for Sheila’s application, the arbitrary choice > that Markup Blitz made worked better than the arbitrary choice NineML made. > > > No, the behavior in Markup Blitz is neither “magic” nor purely arbitrary. > > When Markup Blitz encounters an ambiguity, it maintains, for each > alternative, a queue of deferred actions. An action in this sense is the > completion of a terminal or nonterminal that will eventually be delivered > into the parse tree. Among competing alternatives, Markup Blitz selects one > with the smallest number of pending actions. > > Operationally, this corresponds to choosing an alternative that requires > the fewest derivation steps. In practice, this typically results in the > most compact parse tree. I have found that this strategy aligns well with > my own expectations, and my understanding is that others share this > assessment. > > For reference, the relevant code is here: > > https://github.com/GuntherRademacher/markup-blitz/blob/9481b2f295110174978796dbcaf33603a46f20a5/src/main/java/de/bottlecaps/markup/blitz/Parser.java#L730-L731 > > Best regards > Gunther > > -- ...Wendell Piez... ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez. ..
Received on Tuesday, 3 March 2026 18:16:28 UTC