Re: RDF graph merging question from Thomas B. Passin on 2004-08-18 (www-rdf-interest@w3.org from August 2004)

From: Thomas B. Passin <tpassin@comcast.net>
Date: Wed, 18 Aug 2004 00:04:04 -0400
To: "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Message-ID: <4122D534.9060806@comcast.net>
Jan Algermissen wrote:

> Just to confirm: this means, that gathering separate nodes that actually
> represent the same 'thing' remains the burdon of the user of RDF (e.g.
> at the query-formulation-level), right?
> 

Well, it depends.  Suppose you are using an OWL-aware processor, and 
suppose that you have declared, in an OWLish way, that certain 
properties are inverse-functional, that is to say, identifying.  Then 
your processor should discover that two such nodes do in fact represent 
the same individual.

> Example: Suppose two computers are connected and we represent this as
> 
>   foo:host1 bar:connectedTo foo:host2
> 
> Now, in RDF graph A the triple is reified to attach the information
> 
>   foo:conn-host1-host2 baz:connectionType foo:ethernet to it
> 
> (the connection is an ethernet connection)
> 
> Now in some strore B, the same triple exists and is reified to attach
> a cable number:
> 
>   foo:connection-123 baz:cableNumber "XY-T-5665"
> 
> 
> An RDF store (at least one that does not provide some non-standard
> extension) cannot by itself provide me with the information that
> the connection is of type ethernet and has cable number "XY-T-5665",
> right?
> 

I don't see why you continue to want to reify these statements.  It does 
not do what you apparently want it to do, because the statement that is 
reified is not thereby asserted.  You would do better to follow Damian 
Steer's advice here.  Create a resource - it could be a bnode - to 
represent the connection.  Then you can hang as many actual (not 
reified) triples as you want off it, thus representing the information 
you seem to have in mind.

The main problem you have in any event is figuring out how to identify 
the connection nodes in the two graphs so that they represent the same 
resource.  You won't be able to do that by reifying.  RDF reification is 
not like topic map reification.

> It would be the burdon of the one formulating the query to traverse all
> the rdf:subject, rdf:predicate and rdf:object arcs to find the two
> seperate nodes for what is actually a single 'thing'.
> 
> Is that true? Or am I approaching this in a completely wrong way?
> 

It's not that different from merging topics in a topic map (which I 
mention because of your work on Goose, etc.).  You have to decide what 
the rules are that let you know that the topics represent the same 
thing, then you can merge them.  And it's essentially the same problem 
as merging data in two different relational databases.  In the 
relational case, keys are used for identification, and it may be hard to 
  know that two keys represent the same thing.

Again, it may be that the processor would know before the query about 
the status of two nodes that may or may not represent the same resource, 
depending on what other information it has and how it has been designed. 
  But if you knew that the rdf processor wouldn't be figuring that out, 
  then you might have to write your query to determine the equivalence 
yourself.

And that brings us back to identifying properties and possibly using 
OWL-awareness, which would let a processor relate the two nodes.

Cheers,

Tom P

-- 
Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)
http://www.manning.com/catalog/view.php?book=passin
Received on Wednesday, 18 August 2004 04:02:52 UTC