Backus Naur Form (BNF) to XML

We've been discussing how to transform languages into XML form so we can 
use XPointer in our evaluation descriptions of them.

I took an action item to propose a generic way to convert any language 
expressed in Backus Naur Form (BNF).  See e.g. [1] for a BNF tutorial.

I propose an XML application with just three  tags

<rule name="foo">   </rule>
<terminal name="bar"/>
<cdata value="baz" />

"rule" and "terminal" correspond to BNF rule and terminal symbols (what else?).
We could get by without cdata, but then all the terminals would be 
individual characters. This is a shortcut to avoid that level of detail.

Lets jump to an example.  Here's a simple unix-like command

     sort +increasing -unique phonebook

BNF for this sort of command is as follows.

(I use the following BNF notation:
Unquoted strings are rules.
Quoted strings and characters are terminals.
  * means 0 or more.
| means "or".
Brackets [] enclose prose definitions.

command ::= command-name (argument)*   filename
command-name ::= "sort" | "print"| "mail"
argument ::=  prefix argument-name
argument-name ::= "increasing" | "unique"
prefix ::= '+' | '-'
filename = [any sequence of alpha characters]

Using the BNF to parse the example "sort" command, which I repeat here for 

     sort +increasing -unique phonebook

we get:

<rule name="command">
       <rule name="command-name">
             <terminal name="sort"/>
       <rule name="argument"
             <rule name="prefix"
                    <terminal name="+" />
             <rule name="argument-name">
                   <terminal name="increasing" />
       <rule name="argument">
              <rule name="prefix"
                    <terminal name="-" />
             <rule name="argument-name">
                   <terminal name="unique" />
       <rule name="filename"
             <cdata value="phonebook">

Note that if you just scan down and read the terminal and cdata tags, you 
reconstruct the original unparsed data.  So it's simple to see where you 
are in the raw input stream.

And, if you go up the XML tree, you unwrap the production rules that were 
involved in the parse.
In other words, this is a just an XML version of the parse tree.

So in principle we've got a way to represent CSS, ECMAScript, C++, Java, 
Cobal, ADA, anything that can be specified by BNF.    Hmmm.  Is there an 

So... comments anyone?
Any yacc gurus out there who'd like to implement this?



I did a quick look around w3c and google and didn't find anything like 
this, but I expect that other folks have been thinking along the same 
lines.  If anyone knows of something like this, please post it, especially 
if they already have the converter coded up!

Leonard R. Kasday, Ph.D.
Institute on Disabilities/UAP and Dept. of Electrical Engineering at Temple 
(215) 204-2247 (voice)                 (800) 750-7428 (TTY)

Chair, W3C Web Accessibility Initiative Evaluation and Repair Tools Group

The WAVE web page accessibility evaluation assistant:

Received on Friday, 15 December 2000 14:47:42 UTC