- From: John Lumley <john@saxonica.com>
- Date: Wed, 23 Jul 2025 15:06:46 +0100
- To: ixml <public-ixml@w3.org>
- Message-ID: <3a8fc79f-5ab5-409d-95ef-267afbdb6210@saxonica.com>
During some discussion yesterday about the possibility of running an
Earley parser in a reverse direction across the input string, I
suggested that by 'reversing' an iXML grammar and running across a
reversal of the input text, you should get a parse tree which is a
mirror image of the normal parse tree. Using the following steps on an
iXML grammar:
1. Children of only an *alt *are reversed in order
2. the characters of *@string* of a *literal* are reversed
produces a new grammar. E.g. for the *address* sample:
address: person, lf, street, lf, postcode, city, lf, country, lf;
person, lf, street, lf, city, postcode, lf, country, lf.
person: (title, S?)?, (initials; given, S), surname, S?.
title: "Mr."; "Mrs."; "Dr."; "Ms.".
initials: initial+.
initial: LETTER, ".", S?.
surname: name.
given: name.
-name: LETTER, letters.
street: no, S?, streetname; streetname, S?, no, S?.
streetname: name, S; name, S, name.
city: name, S; name, S, name, S.
country: name, S?; name, S, name, S?.
postcode: digits, S, LETTER, LETTER, S?;
LETTER, LETTER, digits, S, digit, LETTER, LETTER, S?.
no: digits.
-LETTER: ["A"-"Z"].
-letters: ["a"-"z"]*.
-digit: ["0"-"9"].
-digits: ["0"-"9"]+.
-S: " "+.
-lf: -#a | -#d, -#a .
we get a new grammar:
address: lf, country, lf, city, postcode, lf, street, lf, person |
lf, country, lf, postcode, city, lf, street, lf, person.
person: S?, surname, (initials; S, given), (S?, title)?.
title: ".rM"; ".srM"; ".rD"; ".sM".
initials: initial+.
initial: S?, ".", LETTER.
surname: name.
given: name.
-name: letters, LETTER.
street: streetname, S?, no |
S?, no, S?, streetname.
streetname: S, name |
name, S, name.
city: S, name |
S, name, S, name.
country: S?, name |
S?, name, S, name.
postcode: S?, LETTER, LETTER, S, digits |
S?, LETTER, LETTER, digit, S, digits, LETTER, LETTER.
no: digits.
-LETTER: ["A"-"Z"].
-letters: ["a"-"z"]*.
-digit: ["0"-"9"].
-digits: ["0"-"9"]+.
-S: " "+.
-lf: -#a |
-#a, -#d.
Running /backwards /over the sample:
Steven Pemberton
21 Sandridge Road
St Albans AL1 4BY
United Kingdom
Produces a tree:
<address>
<country>modgniK detinU</country>
<postcode>YB4 1LA</postcode>
<city> snablA tS</city>
<street>
<streetname>daoR egdirdnaS</streetname>
<no>12</no>
</street>
<person>
<surname>notrebmeP</surname>
<given>nevetS</given>
</person>
</address>
which is an exact mirror image (document order, modulo attribute order,
reversed *text()|attribute()*) of the normal result:
<address>
<person>
<given>Steven</given>
<surname>Pemberton</surname>
</person>
<street>
<no>21</no>
<streetname>Sandridge Road</streetname>
</street>
<city>St Albans </city>
<postcode>AL1 4BY</postcode>
<country>United Kingdom</country>
</address>
This suggests that programmatic exploration of a bidirectional Earley
parser (working from both ends towards a join somewhere in the middle)
could be possible.
--
*John Lumley* MA PhD CEng FIEE
john@saxonica.com
Received on Wednesday, 23 July 2025 14:07:02 UTC