N-Triples Parser from Sean B. Palmer on 2004-10-19 (www-archive@w3.org from October 2004)

From: Sean B. Palmer <sean+wa@infomesh.net>
Date: Tue, 19 Oct 2004 21:11:44 +0100
To: eikeon@eikeon.com
CC: www-archive@w3.org
Message-ID: <41757500.406@infomesh.net>

Hi eikeon,

Noting that both of our N-Triples parsers are rather bug-ridden, I've
written a replacement for them both:

    http://inamidst.com/proj/rdf/ntriples.py
    - N-Triples Parser

It won't slot into either rdflib or pyrple as it is at the moment, but
to make it fit into rdflib it should be a trivial matter of writing a
five-or-so line wrapper. To make it slot into pyrple might take a bit
of rewriting since having seen rdflib I've reconsidered some of the
architecture.

Generally, I think it's a good idea to have the parsers do all the
real parsing work instead of retaining any of the unparsed input, so
instead of pretending that that can be independent, ntriples.unquote
is now doing all of that work--and hopefully without any of the bugs
of our old versions! It's updated to the latest version of the
specification, so the regular expression now doesn't allow you to
parse a literal with both a language and a datatype.

I've been careful to make this code as efficient as possible, even
going so far as to benchmark a couple of different approaches for
passing through safe literal characters (regexp won over sets). It
reads buffered input via a custom readline method, and then does a
recursive descent/regexp parse on the lines, and even handles the fact
that URIs in N-Triples can use two different escaping mechanisms. It's
not as flexible as my old pyrple module (which allowed universally
quantified variables and literals as subjects, optionally), but all it
requires to modify it are a few changes to some of the highly
modularised methods.

Of course, this is just the very start--the low hanging fruit. It'd be
nice if we could start thinking about some of the fundamental design
issues; a few examples:

* Should there be separate stores for database/in-memory, or should
that be configurable as an option to Graph/TripleStore?
* What should we call Graph/TripleStore anyway? I named it Graph after
amk's sketch of an RDF API [1], where he says: "g = rdf.Graph() # Call
this model/dataset/something else?"
* Do we really need to have a separate Schema/Ontology class?

Have you had a chance to go through the pyrple code yet? I'd really
like to just take the union of features as much as possible from our
APIs, then we can have separate stuff building on top of that. For
example, rdflib's subject_predicates is a convenience function (and I
feel it makes code less readable), as is my getRules method etc.

Can't wait until we compare our query approaches :-)

[1] http://www.amk.ca/conceit/rdf-interface.html

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Tuesday, 19 October 2004 20:12:16 UTC