P2P and RDF (was Re: API for querying a set of RDF graphs?)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Benja

>> I'm wondering if it's a good idea to have multiple graphs. Wouldn't 
>> it be more flexible to reify the statements depending on their source 
>>  in the style "a says (s p v)" ? In this case the source is 
>> determined by a resource within your graph instead of a separate 
>> graph. You can then add property to the source, such as "a hasThrust 
>> 0.7" and use inference to query all statements said by someone with 
>> trust higher than 0.5.
>
> This would be quite nice indeed-- if it is archievable. The problem I 
> see is that this allows to formulate rather complex queries crossing 
> graph boundaries, like "According to someone trusted, A is related to 
> B through a property which someone who according to our own data is an 
> employee of W3C has said is a sub-property of a property that is used 
> in a graph I have written." We could execute such a query if we loaded 
> all the graphs in memory in the same time, but that doesn't seem very 
> scalable.
>
> If we match against each graph separately, of course such a query 
> wouldn't be possible...
>
> On a p2p system, one idea that occured to me was to use a distributed 
> hashtable mapping (node occurring in graph) -> (location of graph). 
> This would allow me to find graphs relevant to a query by asking the 
> distributed hashtable for graphs containing the nodes from the query. 
> Again, the problem seems more manageable if we only look at one graph 
> at a time.
I think this mechanism could be integrated in the Graph/Model's 
implementation to generally improve the speed of reifications and 
frequent related queries.
What the application layer is concerned, I think the power of RDF in a 
P2P environment gets lost, when the application separates different 
"knowledge-worlds" too much. I think a P2P application should 
essentially be able to adapt its own knowledge basing on the peers it 
is exposed too, practically this means that the application doen't 
(just) store an index of the peers but focus on conflicts and 
intersections between the models. Intersections reinforce the believe 
in the statements, contradictions weakens it. Analyzing of similarities 
between the set of conflicts and correspondences associated with each 
peer  an application could determine which peers are most likely to 
give relevant answer to a certain query (using collaborative filtering 
techniques like e.g. http://www.movielens.umn.edu/).

> (snip)
>
> Storm stores data in *blocks*, byte sequences not unlike files, but 
> identified by a cryptographic content hash (making them immutable, 
> since if you change a block's content, you have a different hash and 
> thus a different block). This allows you to make a reference to one 
> specific version of something, authenticable with the hash (this is 
> potentially very useful for importing ontologies into an RDF graph). 
> You can retrieve a block from any source-- some server or peer or your 
> local computer or an email attachment etc.-- since you can always 
> check the authenticity. Because different blocks have different names, 
> to synchronize the data on two computers, you can simply copy all 
> blocks that exist on only one of the two to the other of the two 
> (convenient e.g. for users with both a laptop and a desktop). When you 
> create a new version of something, the old versions don't get 
> overwritten, because you create a new block not affecting the old 
> ones. The list goes on :-)
I think this is a very good approach, you could use freenet conten-hash 
uris to identify the blocks. But am I right that this makes 
rdf-literals obsolete for everything but small decimals? And how do you 
split the metadata in blocks

> So anyway, there are a number of reasons why we need to do powerful 
> queries over a set of Storm blocks. For example, since we use hashes 
> as the identifiers for blocks, we don't have file names as hints to 
> humans about their content; instead, we'll use RDF metadata, stored in 
> *other* blocks. As a second example, on top of the unchangeable 
> blocks, we need to create a notion of updateable, versioned resources. 
> We do this by creating metadata blocks saying e.g., "Block X is the 
> newest version of resource Y as of 2003-03-20T22:29:25Z" and searching 
> for the newest such statement.
I don't quite understand: isn't there a regression problem if the 
metadata is itself contained in blocks? Or is at least the timestamp of 
a block something external to the blocks?

> (This design also provides a clean separation between changeable 
> resources and individual versions of a resource, which are of course 
> resources themselves.)
This would be useful also for mies, if an annotation is not related to 
url but to its current content, this should be cached and the 
annotation related to the content-hash-url, but this is not for version 
0.1.

> Of course this is just scratching the surface-- hope to have a page 
> about it up sometime soon. We have publicized a short paper about this 
> technology and a full article is currently under consideration for 
> Hypertext'03; if you'd like, I can send you a copy by private mail.

that would be great!

cheers,
reto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (Darwin)

iD8DBQE+fMkyD1pReGFYfq4RAhV9AKCLIdVk/D1teT/KvgrO9TDjr7hEVwCdHgfZ
BMkoLqSBy7DaXgR0xDv4XOI=
=MDvb
-----END PGP SIGNATURE-----

Received on Saturday, 22 March 2003 15:36:19 UTC