Re: Indirect left recursion in iXML grammars from Steven Pemberton on 2026-02-09 (public-ixml@w3.org from February 2026)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Mon, 09 Feb 2026 15:15:31 +0000
To: public-ixml@w3.org
Message-Id: <1770649459326.2875488912.407125901@cwi.nl>
I'm trying to understand what you exactly want. Are you trying to avoid 
left-recursion in the grammar, but want the left-parse, as if 
left-recursion had been used?

It's true that some parsing algorithms can't handle left-recursion, but 
ixml can, so again, I'm not sure what the use-case is here.

The standard way to avoid left-recursion is to use repetition

 antstring: "ant"+.

and I consider avoiding left recursion as the defining reason why 
repetition is needed in grammars, but I take it this doesn't cover your use 
case.

Steven 

On Sunday 08 February 2026 17:15:48 (+01:00), Bethan Tovey-Walsh wrote:

 > Dear iXMLers,
 > 
 > Has anybody worked out a way to rewrite an iXML grammar to eliminate 
hidden left recursion, but then use renaming, suppressions, etc. to retain 
the original grammar's XML output?
 > 
 > I suspect that this is impossible in the general case, but I'd love to 
hear that I'm wrong!
 > 
 > Consider this grammar:
 > 
 >  ant_string: ant_string?, "ant".
 > 
 > This recognizes the string "ant" repeated any number of times. If we 
parse the input "antant" with this grammar using an iXML parser, here's our 
output:
 > 
 >  <ant_string>
 >     <ant_string>ant</ant_string>
 >     ant
 >  </ant_string>
 > 
 > There is a direct left recursion in this grammar. The rule ant_string 
can match simply the string "ant", but it can also match itself followed by 
the string "ant". (We could also express this same grammar in iXML as: 
 > 
 >  ant_string: ant_string, "ant" ; "ant". 
 > 
 > which makes these two possibilities more explicit.)
 > 
 > Left recursion poses a problem for some types of parsing algorithm. 
Luckily, there's a simple algorithm to rewrite a grammar to eliminate 
direct left recursion. This would give us a new grammar:
 > 
 >  ant_string: "ant", ant_string_1.
 >  ant_string_1: "ant", ant_string_1 ; ().
 > 
 > We've moved the recursion from the left to the right; the rule for 
ant_string can no longer begin with ant_string.
 > 
 > However, since iXML uses the patterns in the grammar to structure the 
output XML, we can't just rewrite in this way. If we did, we'd get the 
wrong XML output from our input grammar. Using the replacement grammar to 
parse "antant":
 > 
 >  <ant_string>
 >     ant
 >     <ant_string_1>
 >        ant
 >        <ant_string_1/>
 >     </ant_string_1>
 >  </ant_string>
 > 
 > In at least some cases, we can use renaming to solve this problem, 
although it requires rewriting to a slightly different grammar:
 > 
 >  ant_string: placeholder_1>ant_string, placeholder_2.
 >  placeholder_1:  "ant".
 >  -placeholder_2: () ; -placeholder_1, placeholder_2.
 > 
 > We have eliminated the left recursion, and used renaming and suppression 
to reconstruct the original parse structure.
 > 
 > However, I can't find a way to reconstruct the original parse structure 
when dealing with a grammar like this one, which has indirect left 
recursion:
 > 
 >  ant_string: bat_string, "ant".
 >  bat_string: "bat", cat_string ; ant_string, cat_string.
 >  cat_string: "cat".
 > 
 > We can parse the string "batcatantcatant" with this grammar to get:
 > 
 >  <ant_string>
 >     <bat_string>
 >        <ant_string>
 >           <bat_string>
 >              bat
 >              <cat_string>cat</cat_string>
 >           </bat_string>
 >           ant
 >        </ant_string>
 >        <cat_string>cat</cat_string>
 >     </bat_string>
 >     ant
 >  </ant_string>
 > 
 > In this case, none of the individual rules is directly left recursive. 
However, the first element of ant_string is bat_string, and bat_string can 
begin with ant_string. Top-down parsing algorithms may have the same 
problem here as they have with direct left recursion: they go into an 
infinite loop trying to resolve ant_string, because they need to resolve 
bat_string in order to do so, but in order to resolve bat_string they must 
first resolve ant_string...
 > 
 > The algorithm to resolve indirect left recursion requires first 
rewriting the grammar so that any left recursions are direct, and then 
applying the algorithm to eliminate direct left recursion. So we first need 
something like this:
 > 
 >  ant_string: ant_string, cat_string, "ant"; bat_string, "ant".
 >  bat_string: "bat", cat_string.
 >  cat_string: "cat".
 > 
 > Now, the only left recursion is a direct left recursion in the rule for 
ant_string. We can now use the direct left recursion algorithm to get this:
 > 
 >  ant_string: bat_string, "ant", placeholder_1.
 >  placeholder_1: cat_string, "ant", placeholder_1 ; (). 
 >  bat_string: "bat", cat_string.
 >  cat_string: "cat".
 > 
 > Using this to parse "batcatantcatant", we get:
 > 
 >  <ant_string>
 >     <bat_string>
 >        bat
 >        <cat_string>cat</cat_string>
 >     </bat_string>
 >     ant
 >    <placeholder_1>
 >        <cat_string>cat</cat_string>
 >        ant
 >        <placeholder_1/>
 >     </placeholder_1>
 >  </ant_string>
 > 
 > This is obviously quite different from the XML output achieved using the 
original grammar.
 > 
 > I may be missing something really obvious, but I can't work out a way to 
rewrite this grammar to eliminate left recursion, whilst also 
reconstructing the original XML structure.
 > 
 > If anyone has a solution, or even any questions or suggestions that 
might help me to work through this, I'd be immensely grateful. 
Alternatively, if you can prove that this is impossible for at least some 
grammars, I'd be grateful for that too, so that I can stop banging my head 
against this wall.
 > 
 > Very best,
 > 
 > Bethan
 > ___________________________________________________ 
 > Dr. Bethan Tovey-Walsh 
 > 
 > linguacelta.com
 > 
 > Golygydd | Editor geirfan.cymru
 > 
 > Croeso i chi ysgrifennu ataf yn y Gymraeg.
 > 
 > 
 >
Received on Monday, 9 February 2026 15:15:38 UTC