Re: [ANN] Nemo (i.e. why I wrote Nemo) from Damian Steer on 2004-12-01 (www-rdf-interest@w3.org from December 2004)

From: Damian Steer <damian.steer@hp.com>
Date: Wed, 1 Dec 2004 09:46:41 +0000
To: James Cerra <jfcst24_public@yahoo.com>
Cc: www-rdf-interest@w3.org, Laurian Gridinoc <laurian@gmail.com>
Message-Id: <E74707DE-437D-11D9-8C0A-000D932B9016@hp.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30 Nov 2004, at 16:40, James Cerra wrote:

> IMHO, there are a few issues with the design of
> Treehugger and RDF Twig (and Triplestore, from an
> initial look):
>
>   * I don't think that XPath by itself should be used
> to query RDF.  It was designed for searching trees and
> not arbitrary graphs.  So RDF Twig, Treehugger, and
> others reserialize the graph into documents that are
> easier to use with XPath.  However, this makes it hard
> to construct queries if you don't know the details of
> the initial serialization.  (i.e. Should I query for
> ./@rdf:resource or ./rdf:resource?)

That's not really true of either RDFTwig or treehugger. Both work over 
jena models and are consequently unaware of the initial serialisation. 
RDFTwig allows you to issue a simple query and construct a tree (which 
xpath can work with). By contrast treehugger is a silly trick to expose 
a jena model as Saxon's internal tree model. It looks like rdf/xml, but 
that's by construction.

Should you query for @rdf:resource or rdf:resource? Well, that's a bad 
example (it's an attribute), but should you query for @foaf:name or 
foaf:name? In treehugger you can do either since either could happen in 
rdf/xml.

>   * The results are returned as another XML document
> or XML fragment.  This means that another tree has to
> be built - isn't that inefficient?  If there are a lot
> of nodes queried, this could mean a lot of memory is
> used.

Hmm. Well the idea it to go from an rdf graph to xml, so it's a feature 
:-) If you're asking about the tree constructed from the model which is 
subsequently queried: in treehugger this is lazily evaluated so it 
shouldn't be too bad. Since I wrote treehugger I can authoritatively 
tell you that it could be more inefficient :-) I've been using it on 
models with a few hundred thousand triples and it's pretty nippy.

>   * They seem to concentrate on quereies stored in
> seperate files.  For my purposes, it is more
> advantageous to query rdf data that is embedded in a
> file.

Well the queries are over models which could be database backed or/and 
have inferencing. As for data embedded in files, well if ARP can parse 
it we can use it.

You might want to look a what DAWG [1] have been doing. There are two 
things of interest there:

1) If you want to make a query that returns rdf CONSTRUCT (from Sesame 
originally) is very interesting.

2) Queries can return results in an XML format [2] that can 
subsequently be transformed with xslt or XQuery. Alberto Reggiori and 
Andy Seaborne have demonstrations of this (I'll try to find the 
references if you're interested).

Hope this has clarified some things, and good luck with Nemo

Damian

[1] http://www.w3.org/2001/sw/DataAccess/
[2] The actually format is being discussed currently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBrZMHAyLCB+mTtykRAkdfAJ9uERRkH7nwn8T4p8+gcdMWY6EIEwCgsjW8
E4CaQgROiA9nYz6urotT34k=
=nJwd
-----END PGP SIGNATURE-----

Received on Wednesday, 1 December 2004 09:47:50 UTC