New RDF Grammar-Based N3 Parser and Test Suite

Spurred on by TimBL's development of a standard grammar for N3 [1] in
RDF, I've developed an independent Notation3 parser and test suite:

    http://inamidst.com/n3p/
    - n3proc, 2004-12

The toolset is divided into three parts:

1) n3mp.py - an RDF BNF metaparser
2) n3p.py - an N3 parser
3) n3proc.py - an N3 processor

If you want to test whether n3proc works on your system, you can simply
do something along the lines of:

    $ wget http://inamidst.com/n3p/n3p.tar.gz
    $ tar -zxvf n3p.tar.gz && cd n3p
    $ ./n3proc.py test/simple-03.n3

Running n3proc.py requires only Python2.3 or later. It should convert
the Notation3 file at test/simple-03.n3 (or whichever file that you
choose) to N-Triples.

If you'd like to help me out further, you can try running the test suite
on your system and report successes:

== Test Suite ==

Running the test suite requires that you install Pyrple first, to do
graph isomorphism comparisons of the test output and reference files.
Pyrple is available from http://infomesh.net/pyrple/

    $ wget http://infomesh.net/pyrple/pyrple-2004-06-06.tar.gz
    $ tar -zxvf pyrple-2004-06-06.tar.gz
    # mv pyrple-2004-06-06 [...]/site-packages/pyrple

Or you can add pyrple-2004-06-06 to your PYTHONPATH envar briefly if you
don't want to move it into site-packages. (Which is usually in
/usr/lib/python2.3 or similar; try $ slocate site-packages).

To run the test suite:

    $ cd n3p/test
    $ ./runtest ../n3proc.py input

It should give output along the following lines:

    $ ./runtest ../n3proc.py input
    anons-01 pass
    anons-02 pass
    anons-03 pass
    dtlang-01 pass
    dtlang-02 pass
    [...]

If any "FAIL", then it'll print out a diff of the outputs, or the Python
traceback if there's been a more fundamental error. So far, I know that
it works on cygwin py2.4, linux py2.3.4, and OS X py2.3.

As for the code itself, here's some explanation, requirements, and
things that you can do to test it:

== n3mp.py ==

This converts the RDF BNF that TimBL developed into a pickled Python
object that can later be read by the parser. It's based on TimBL's
predictiveParser.py code, only smoothed out and using rdflib instead of
the SWAP tools as a backend. To run it, you'll need rdflib, which you
can get from http://rdflib.net/ Here's a quick install plan:

    $ wget http://rdflib.net/2004/10/14/rdflib-2.0.4.tgz
    $ tar -zxvf rdflib-2.0.4.tgz && cd rdflib-2.0.4/
    $ python setup.py install

To test n3mp.py out for me, you can do the following:

    $ cd n3p && rm n3meta.pkl
    $ ./n3mp.py

That's it. It should display a lot of warnings, but then create an
n3meta.pkl file; if it does so, then it's probably been successful. To
test it for sure, run the Test Suite per the instructions above. If all
of the tests pass still, then you're able to regenerate the data file
from the RDF BNF on your system.

== n3p.py ==

This is an N3 parser: in its default mode it'll just spit out an
event-tree of input that it's given. This is also based on TimBL's
predictiveParser.py code, but unlike that, n3p.py does *not* require and
RDF API to run, and should be more efficient. It requires that you have
a working n3meta.pkl file, and Python2.3 or later. Example use:

    $ ./n3p.py test/simple-02.n3

That should give the following parse tree:

    $ ./n3p.py test/simple-02.n3
    document
     statements_optional
      statement
       declaration
        @prefix @prefix
        qname :
        explicituri <#>
       /declaration
      /statement
      . .
      statements_optional
       statement
        declaration
         @prefix @prefix
         qname foaf:
         explicituri <http://xmlns.com/foaf/0.1/>
        /declaration
       /statement
       . .
       statements_optional
        statement
         declaration
          @keywords @keywords
          _:lWkznRSD44
           barename a
           _:lWkznRSD9
           /_:lWkznRSD9
          /_:lWkznRSD44
         /declaration
        /statement
        . .
        statements_optional
        /statements_optional
       /statements_optional
      /statements_optional
      eof
      /eof

For larger files it can be extremely verbose, so use with caution! The
main class in this file is used as the basis of n3proc.py.

== n3proc.py ==

An N3 processor. In its default mode it converts Notation3 files to
N-Triples. It's also designed to be easy to override. If you want to use
it in your own applications you can, for example, do:

    import n3proc
    n3proc.URI = MyURI
    n3proc.bNode = MybNode
    n3proc.Literal = MyLiteral
    n3proc.Var = MyVar

    p = n3proc.N3Processor(fn, MySink())
    p.parse()

Furninshing your own values for the My* variables. There's an example of
the sink in n3proc.py already. It's only beta code at the moment, and
wholly dependent on TimBL's n3.n3 grammar, so it may not parse all N3
files at the moment. Moreover, @forAll and @forSome aren't yet
implemented since I'm still wondering about the odd parareification
format that CWM uses and just how much of it I should be implementing.

I'm not sure what, if anything, this'll be used for. I doubt it'll be
swapped into the SWAP code as it were, but it proves that it's possible
to create an N3 processor from the RDF BNF grammar, and it may well work
as a good independent add-on for rdflib, Pyrple, and any other Python
RDF APIs that need Notation3 capabilities.

Feedback on any of the topics herein would be greatly appreciated.

Thanks,

[1] http://www.w3.org/2000/10/swap/grammar/

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Friday, 10 December 2004 05:34:28 UTC