Re: API for querying a set of RDF graphs? from Benja Fallenstein on 2003-03-20 (www-rdf-interest@w3.org from March 2003)

From: Benja Fallenstein <b.fallenstein@gmx.de>
Date: Thu, 20 Mar 2003 23:34:11 +0100
To: Reto Bachmann-Gmuer <reto@gmuer.ch>
CC: www-rdf-interest@w3.org
Message-ID: <3E7A41E3.2060006@gmx.de>
Hi Reto,

Reto Bachmann-Gmuer wrote:
> I'm wondering if it's a good idea to have multiple graphs. Wouldn't it 
> be more flexible to reify the statements depending on their source  in 
> the style "a says (s p v)" ? In this case the source is determined by a 
> resource within your graph instead of a separate graph. You can then add 
> property to the source, such as "a hasThrust 0.7" and use inference to 
> query all statements said by someone with trust higher than 0.5.

This would be quite nice indeed-- if it is archievable. The problem I 
see is that this allows to formulate rather complex queries crossing 
graph boundaries, like "According to someone trusted, A is related to B 
through a property which someone who according to our own data is an 
employee of W3C has said is a sub-property of a property that is used in 
a graph I have written." We could execute such a query if we loaded all 
the graphs in memory in the same time, but that doesn't seem very scalable.

If we match against each graph separately, of course such a query 
wouldn't be possible...

On a p2p system, one idea that occured to me was to use a distributed 
hashtable mapping (node occurring in graph) -> (location of graph). This 
would allow me to find graphs relevant to a query by asking the 
distributed hashtable for graphs containing the nodes from the query. 
Again, the problem seems more manageable if we only look at one graph at 
a time.

> PS: Does your project have an URL? I'd like to find out more. Too see 
> the p2p/semantic web project I'm involved check out http://wymiwyg.org/mies

Thank you for your interest, but unfortunately we currently don't have a 
homepage. :-( Sorry. We're working on it.

I'll try to give you a quick overview.

The project I'm talking about here is called Storm, and it's an 
outgrowth of a bigger project called Fenfire. Fenfire is targeted to be 
a desktop environment built around RDF, where different applications are 
implemented as views on top of a single RDF model. This will allow data 
from different applications to be interconnected at a fine-grained 
level; for example, while I'm looking at an email from some person, I 
could quickly look at appointments I've scheduled with that person, or 
at that person's FOAF data, since they're all properties of the same RDF 
node. (We're more of a hypertext project, so the connectivity is more 
important to us than the semantics, but it's obvious that the RDF data 
model used by the applications will also allow interesting semantic 
queries over the data on one's computer.)

Storm (for 'storage module') started out as a supporting technology for 
Fenfire, but will also be usable independently. (A Storm release 
independent on Fenfire should happen Real Soon Now :-) ). Storm can be 
thought of as a replacement for files, making versioning and 
distributing data among computers more convenient.

Storm stores data in *blocks*, byte sequences not unlike files, but 
identified by a cryptographic content hash (making them immutable, since 
if you change a block's content, you have a different hash and thus a 
different block). This allows you to make a reference to one specific 
version of something, authenticable with the hash (this is potentially 
very useful for importing ontologies into an RDF graph). You can 
retrieve a block from any source-- some server or peer or your local 
computer or an email attachment etc.-- since you can always check the 
authenticity. Because different blocks have different names, to 
synchronize the data on two computers, you can simply copy all blocks 
that exist on only one of the two to the other of the two (convenient 
e.g. for users with both a laptop and a desktop). When you create a new 
version of something, the old versions don't get overwritten, because 
you create a new block not affecting the old ones. The list goes on :-)

So anyway, there are a number of reasons why we need to do powerful 
queries over a set of Storm blocks. For example, since we use hashes as 
the identifiers for blocks, we don't have file names as hints to humans 
about their content; instead, we'll use RDF metadata, stored in *other* 
blocks. As a second example, on top of the unchangeable blocks, we need 
to create a notion of updateable, versioned resources. We do this by 
creating metadata blocks saying e.g., "Block X is the newest version of 
resource Y as of 2003-03-20T22:29:25Z" and searching for the newest such 
statement. (This design also provides a clean separation between 
changeable resources and individual versions of a resource, which are of 
course resources themselves.)

Of course this is just scratching the surface-- hope to have a page 
about it up sometime soon. We have publicized a short paper about this 
technology and a full article is currently under consideration for 
Hypertext'03; if you'd like, I can send you a copy by private mail.

Thanks for your interest,
- Benja
Received on Thursday, 20 March 2003 16:34:39 UTC