Reversed iXML grammars

During some discussion yesterday about the possibility of running an 
Earley parser in a reverse direction across the input string, I 
suggested that by 'reversing' an iXML grammar and running across a 
reversal of the input text, you should get a parse tree which is a 
mirror image of the normal parse tree. Using the following steps on an 
iXML grammar:

 1. Children of only an *alt *are reversed in order
 2. the characters of *@string* of a *literal* are reversed

produces a new grammar. E.g. for the *address* sample:

address: person, lf, street, lf, postcode, city, lf, country, lf;
          person, lf, street, lf, city, postcode, lf, country, lf.
person: (title, S?)?, (initials; given, S), surname, S?.
title: "Mr."; "Mrs."; "Dr."; "Ms.".
initials: initial+.
initial: LETTER, ".", S?.
surname: name.
given: name.
-name: LETTER, letters.
street: no, S?, streetname; streetname, S?, no, S?.
streetname: name, S; name, S, name.
city: name, S; name, S, name, S.
country: name, S?; name, S, name, S?.
postcode: digits, S, LETTER, LETTER, S?;
           LETTER, LETTER, digits, S, digit, LETTER, LETTER, S?.
no: digits.
-LETTER: ["A"-"Z"].
-letters: ["a"-"z"]*.
-digit: ["0"-"9"].
-digits: ["0"-"9"]+.
-S: " "+.
-lf: -#a | -#d, -#a .

we get a new grammar:

    address: lf, country, lf, city, postcode, lf, street, lf, person |
             lf, country, lf, postcode, city, lf, street, lf, person.
     person: S?, surname, (initials; S, given), (S?, title)?.
      title: ".rM"; ".srM"; ".rD"; ".sM".
   initials: initial+.
    initial: S?, ".", LETTER.
    surname: name.
      given: name.
      -name: letters, LETTER.
     street: streetname, S?, no |
             S?, no, S?, streetname.
streetname: S, name |
             name, S, name.
       city: S, name |
             S, name, S, name.
    country: S?, name |
             S?, name, S, name.
   postcode: S?, LETTER, LETTER, S, digits |
             S?, LETTER, LETTER, digit, S, digits, LETTER, LETTER.
         no: digits.
    -LETTER: ["A"-"Z"].
   -letters: ["a"-"z"]*.
     -digit: ["0"-"9"].
    -digits: ["0"-"9"]+.
         -S: " "+.
        -lf: -#a |
             -#a, -#d.

Running /backwards /over the sample:

Steven Pemberton
21 Sandridge Road
St Albans AL1 4BY
United Kingdom

Produces a tree:

<address>
     <country>modgniK detinU</country>
     <postcode>YB4 1LA</postcode>
     <city> snablA tS</city>
     <street>
         <streetname>daoR egdirdnaS</streetname>
         <no>12</no>
     </street>
     <person>
         <surname>notrebmeP</surname>
         <given>nevetS</given>
     </person>
</address>

which is an exact mirror image (document order, modulo attribute order, 
reversed *text()|attribute()*) of the normal result:

<address>
    <person>
       <given>Steven</given>
       <surname>Pemberton</surname>
    </person>
    <street>
       <no>21</no>
       <streetname>Sandridge Road</streetname>
    </street>
    <city>St Albans </city>
    <postcode>AL1 4BY</postcode>
    <country>United Kingdom</country>
</address>

This suggests that programmatic exploration of a bidirectional Earley 
parser (working from both ends towards a join somewhere in the middle) 
could be possible.

-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com

Received on Wednesday, 23 July 2025 14:07:02 UTC