- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Mon, 09 Feb 2026 17:29:19 +0000
- To: "Bethan Tovey-Walsh" <bytheway@linguacelta.com>
- Cc: public-ixml@w3.org
- Message-Id: <1770658027961.2101671133.3762852027@cwi.nl>
On Monday 09 February 2026 17:40:16 (+01:00), Bethan Tovey-Walsh wrote: I'm not sure what the use-case is here. "Because it's there." It's true that some parsing algorithms can't handle left-recursion, but ixml can As Norm noted, iXML is not a parsing algorithm; it's a syntax for notating grammars. No. It doesn't specify exactly which parser to use, but it requires a parser that can handle any grammar, and then suggests a number of them: "Known algorithms that accept and parse any context-free grammar include [Earley <https://invisiblexml.org/current/#earley> ], [Unger <https://invisiblexml.org/current/#unger> ], [CYK <https://invisiblexml.org/current/#cyk> ], [GLR <https://invisiblexml.org/current/#glr> ], and [GLL <https://invisiblexml.org/current/#gll> ]; see also [Grune <https://invisiblexml.org/current/#grune> ]." So left recursion is absolutely required. Steven An analogous case is the extended regular expression library in Python. You can absolutely write a left-recursive structure using its syntax: (?P<ant_string>(?&ant_string)?ant) But, at least on my machine, this pattern inevitably causes a memory error. The right-recursive equivalent doesn't: (?P<ant_string>ant(?&ant_string)?) So the fact that a syntax permits left-recursive structures doesn't automatically mean that such structures can be successfully parsed with any choice of algorithm. I consider avoiding left recursion as the defining reason why repetition is needed in grammars, but I take it this doesn't cover your use case. Replacing recursion with repetition doesn't preserve the parse tree from the original grammar, so it isn't a solution to the problem I raised. And, in any case, using repetition alone can't resolve the left recursion in a grammar like this: a: b, "ant" | a?, "bat". b: c, "bat". c: a?, "cat". As I said in my email, indirect recursion seems to be a greater problem than direct recursion. Are you trying to avoid left-recursion in the grammar, but want the left-parse, as if left-recursion had been used? I want to transform a grammar so that it contains no left-recursion, but, when used to parse an input, it nonetheless constructs an XML representation identical to the one that would be constructed using the original, left-recursive grammar. I'm trying to understand what you exactly want. Yeah, that's a perennial question. BTW **************************************************** Dr. Bethan Tovey-Walsh linguacelta.com <http://linguacelta.com/> Golygydd | Editor geirfan.cymru <http://geirfan.cymru/> Croeso i chi ysgrifennu ataf yn y Gymraeg On 9 Feb 2026, at 15:15, Steven Pemberton <steven.pemberton@cwi.nl> wrote: I'm trying to understand what you exactly want. Are you trying to avoid left-recursion in the grammar, but want the left-parse, as if left-recursion had been used? It's true that some parsing algorithms can't handle left-recursion, but ixml can, so again, I'm not sure what the use-case is here. The standard way to avoid left-recursion is to use repetition antstring: "ant"+. and I consider avoiding left recursion as the defining reason why repetition is needed in grammars, but I take it this doesn't cover your use case. Steven On Sunday 08 February 2026 17:15:48 (+01:00), Bethan Tovey-Walsh wrote: > Dear iXMLers, > > Has anybody worked out a way to rewrite an iXML grammar to eliminate hidden left recursion, but then use renaming, suppressions, etc. to retain the original grammar's XML output? > > I suspect that this is impossible in the general case, but I'd love to hear that I'm wrong! > > Consider this grammar: > > ant_string: ant_string?, "ant". > > This recognizes the string "ant" repeated any number of times. If we parse the input "antant" with this grammar using an iXML parser, here's our output: > > <ant_string> > <ant_string>ant</ant_string> > ant > </ant_string> > > There is a direct left recursion in this grammar. The rule ant_string can match simply the string "ant", but it can also match itself followed by the string "ant". (We could also express this same grammar in iXML as: > > ant_string: ant_string, "ant" ; "ant". > > which makes these two possibilities more explicit.) > > Left recursion poses a problem for some types of parsing algorithm. Luckily, there's a simple algorithm to rewrite a grammar to eliminate direct left recursion. This would give us a new grammar: > > ant_string: "ant", ant_string_1. > ant_string_1: "ant", ant_string_1 ; (). > > We've moved the recursion from the left to the right; the rule for ant_string can no longer begin with ant_string. > > However, since iXML uses the patterns in the grammar to structure the output XML, we can't just rewrite in this way. If we did, we'd get the wrong XML output from our input grammar. Using the replacement grammar to parse "antant": > > <ant_string> > ant > <ant_string_1> > ant > <ant_string_1/> > </ant_string_1> > </ant_string> > > In at least some cases, we can use renaming to solve this problem, although it requires rewriting to a slightly different grammar: > > ant_string: placeholder_1>ant_string, placeholder_2. > placeholder_1: "ant". > -placeholder_2: () ; -placeholder_1, placeholder_2. > > We have eliminated the left recursion, and used renaming and suppression to reconstruct the original parse structure. > > However, I can't find a way to reconstruct the original parse structure when dealing with a grammar like this one, which has indirect left recursion: > > ant_string: bat_string, "ant". > bat_string: "bat", cat_string ; ant_string, cat_string. > cat_string: "cat". > > We can parse the string "batcatantcatant" with this grammar to get: > > <ant_string> > <bat_string> > <ant_string> > <bat_string> > bat > <cat_string>cat</cat_string> > </bat_string> > ant > </ant_string> > <cat_string>cat</cat_string> > </bat_string> > ant > </ant_string> > > In this case, none of the individual rules is directly left recursive. However, the first element of ant_string is bat_string, and bat_string can begin with ant_string. Top-down parsing algorithms may have the same problem here as they have with direct left recursion: they go into an infinite loop trying to resolve ant_string, because they need to resolve bat_string in order to do so, but in order to resolve bat_string they must first resolve ant_string... > > The algorithm to resolve indirect left recursion requires first rewriting the grammar so that any left recursions are direct, and then applying the algorithm to eliminate direct left recursion. So we first need something like this: > > ant_string: ant_string, cat_string, "ant"; bat_string, "ant". > bat_string: "bat", cat_string. > cat_string: "cat". > > Now, the only left recursion is a direct left recursion in the rule for ant_string. We can now use the direct left recursion algorithm to get this: > > ant_string: bat_string, "ant", placeholder_1. > placeholder_1: cat_string, "ant", placeholder_1 ; (). > bat_string: "bat", cat_string. > cat_string: "cat". > > Using this to parse "batcatantcatant", we get: > > <ant_string> > <bat_string> > bat > <cat_string>cat</cat_string> > </bat_string> > ant > <placeholder_1> > <cat_string>cat</cat_string> > ant > <placeholder_1/> > </placeholder_1> > </ant_string> > > This is obviously quite different from the XML output achieved using the original grammar. > > I may be missing something really obvious, but I can't work out a way to rewrite this grammar to eliminate left recursion, whilst also reconstructing the original XML structure. > > If anyone has a solution, or even any questions or suggestions that might help me to work through this, I'd be immensely grateful. Alternatively, if you can prove that this is impossible for at least some grammars, I'd be grateful for that too, so that I can stop banging my head against this wall. > > Very best, > > Bethan > ___________________________________________________ > Dr. Bethan Tovey-Walsh > > linguacelta.com > > Golygydd | Editor geirfan.cymru > > Croeso i chi ysgrifennu ataf yn y Gymraeg. > > >
Received on Monday, 9 February 2026 17:29:30 UTC