Re: N-Triples Parser from Daniel Krech on 2004-10-20 (www-archive@w3.org from October 2004)

From: Daniel Krech <eikeon@eikeon.com>
Date: Wed, 20 Oct 2004 16:55:37 -0400
To: "Sean B. Palmer" <sean+wa@infomesh.net>
Cc: www-archive@w3.org
Message-Id: <6507102E-22DA-11D9-B2DB-000A95C4E68C@eikeon.com>
On Oct 19, 2004, at 4:11 PM, Sean B. Palmer wrote:

>
> Hi eikeon,
>
> Noting that both of our N-Triples parsers are rather bug-ridden, I've
> written a replacement for them both:
>
>    http://inamidst.com/proj/rdf/ntriples.py
>    - N-Triples Parser

Looks much better. Nice job.

> It won't slot into either rdflib or pyrple as it is at the moment, but
> to make it fit into rdflib it should be a trivial matter of writing a
> five-or-so line wrapper. To make it slot into pyrple might take a bit
> of rewriting since having seen rdflib I've reconsidered some of the
> architecture.

If you like I'll go ahead and hook it info rdflib. It'd be nice to 
release it as part of rdflib. Is this something that interests you? If 
so, we should talk about how to coordinate the two and licensing.

> Generally, I think it's a good idea to have the parsers do all the
> real parsing work instead of retaining any of the unparsed input, so
> instead of pretending that that can be independent, ntriples.unquote
> is now doing all of that work--and hopefully without any of the bugs
> of our old versions! It's updated to the latest version of the
> specification, so the regular expression now doesn't allow you to
> parse a literal with both a language and a datatype.
>
> I've been careful to make this code as efficient as possible, even
> going so far as to benchmark a couple of different approaches for
> passing through safe literal characters (regexp won over sets). It
> reads buffered input via a custom readline method, and then does a
> recursive descent/regexp parse on the lines, and even handles the fact
> that URIs in N-Triples can use two different escaping mechanisms. It's
> not as flexible as my old pyrple module (which allowed universally
> quantified variables and literals as subjects, optionally), but all it
> requires to modify it are a few changes to some of the highly
> modularised methods.
>
> Of course, this is just the very start--the low hanging fruit. It'd be
> nice if we could start thinking about some of the fundamental design
> issues; a few examples:

As we go after some of this low hanging fruit we also need to figure 
out the shape of our joint effort. But I think going after some of the 
technical issues will be good input for that discussion.

> * Should there be separate stores for database/in-memory, or should
> that be configurable as an option to Graph/TripleStore?

rdflib has done both to date and has moved to having it be an option of 
the TripleStore (specified in the constructor).

> * What should we call Graph/TripleStore anyway? I named it Graph after
> amk's sketch of an RDF API [1], where he says: "g = rdf.Graph() # Call
> this model/dataset/something else?"

Some care was taken in aligning rdflib with the terminology used in the 
  RDF: Concepts and Abstract Syntax [1] document. One exception is the 
use of TripleStore, which should have been RDFGraph instead. So, I 
guess I like RDFGraph from that perspective.

> * Do we really need to have a separate Schema/Ontology class?

Not sure I understand the question. Separate from the Graph class or 
rdflib's schema class separate from pyrple's Ontology class? The Schema 
class is kept separate in rdflib for code organizational purpose, but 
is mixed in to the Graph (TripleStore).

> Have you had a chance to go through the pyrple code yet? I'd really
> like to just take the union of features as much as possible from our
> APIs, then we can have separate stuff building on top of that. For
> example, rdflib's subject_predicates is a convenience function (and I
> feel it makes code less readable), as is my getRules method etc.

I have had a chance to look at pyrple and like the idea of taking the 
union of the features. Any thoughts on how to proceed? We could start 
by working in parallel and cross pollinating -- this might work for the 
very short term, but not sure how appealing it is to either of us 
beyond that? Another option that might work well is to have (keep) 
rdflib as the place for the stable low level bits to fall. Then we 
could have one (or more) pieces on top of it for the less stable or 
higher level functionality. Perhaps we can chat about this topic online 
some more soon?

> Can't wait until we compare our query approaches :-)

I'll probably start by looking closer at SPARQL and what Ivan's done in 
that direction already.

> [1] http://www.amk.ca/conceit/rdf-interface.html
>
> -- 
> Sean B. Palmer, http://inamidst.com/sbp/
>


[1] http://www.w3.org/TR/rdf-concepts/

--
Daniel Krech, http://eikeon.com/
Received on Wednesday, 20 October 2004 20:57:05 UTC