Re: Interpretation of RDF reification from Dan Brickley on 2006-03-23 (semantic-web@w3.org from March 2006)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 23 Mar 2006 14:40:43 -0500
To: Lars Marius Garshol <larsga@ontopia.net>
Cc: semantic-web@w3.org
Message-ID: <20060323194043.GA20072@postdiluvian.org>
* Lars Marius Garshol <larsga@ontopia.net> [2006-03-22 21:37+0100]
> 
> 
> I've been trying to read the answer to this answer out of the RDF  
> specs, and I think I've got it, but would like to make 100% certain.
> 
> If I create an RDF node that reifies the statement
> 
>   (winston, married-to, clementine)
> 
> what does that node represent? Specifically, does it represent the  
> *statement* that these two are married, or does it represent the  
> *marriage* relationship between them? That is, if the reifying RDF  

OK, going back to the start of this thread, and picking up a theme 
live in various blog posts on planetrdf.com lately, ... I tried 
working thru an example. Afraid it's not at the stage where I've tested
it with tools yet, but might be useful. Pasting it in here (below) for now, 
will blog when the examples are more machine-readable. --Dan



OK, let me try sketching some test cases around reification, which could
(sorry, not there yet - help
welcomed) be plugged into OWL reasoners and SPARQL query engines. Forget
superman; our scenario is 
more worldy. We are web detectives, on the trail of a would-be bigamist,
whose multiple identifiers 
aren't all familiar to the registrars who have been busy marrying him
off. Imagine that the registrars
publish their official records in RDF, and that we're consuming those
records, alongside some
other trusted evidence, with an OWL-aware software system. Further
imagine that we export some
hopefully-useful RDF from the OWL system and query it using SPARQL, with
the intent of asking questions
like "which registrar said what?". A lot of folks try to use RDF
reification in such scenarios;
I'm not convinced it works. 

registrar-1.rdf:

 <tag:danbri.org:2006:people:bob> <http://example.org/family#wife>
<tag:example.org:2005:people:alice> .

# the resource called <tag:danbri.org:2006:people:bob> has a 'wife' that
# is the resource 
# called <tag:example.org:2005:people:alice>


registrar-2.rdf:

 <tag:danbri.org:2006:people:charlie> <http://example.org/family#wife>
<tag:example.org:2005:people:mary> .

# the resource called <tag:danbri.org:2006:people:charlie> has a 'wife'
# that is the resource 
# called <tag:example.org:2005:people:mary>


nndb-example-bio.rdf:
 <tag:danbri.org:2006:people:charlie>
<http://www.w3.org/@something/.../owl#sameAs>
<tag:danbri.org:2006:people:bob> 

# <tag:danbri.org:2006:people:charlie> and
# <tag:danbri.org:2006:people:bob> are URI names for the same resource



who-said-what.rdf:
# trying to keep track of these different claims using RDF reification
# vocab. 
# (this is the thing I don't think does what people hope it does...)

 _:s1 rdf:type rdf:Statement .
 _:s1 rdf:predicate <http://example.org/family#wife> . 
 _:s1 rdf:subject <tag:danbri.org:2006:people:bob> .
 _:s1 rdf:object <tag:danbri.org:2006:people:alice> .
 _:s1 <http://purl.org/dc/elements/1.1/source> <registrar-1.rdf> .

 _:s2 rdf:type rdf:Statement .
 _:s2 rdf:predicate <http://example.org/family#wife> . 
 _:s2 rdf:subject <tag:danbri.org:2006:people:charlie> .
 _:s2 rdf:object <tag:danbri.org:2006:people:mary> .
 _:s2 <http://purl.org/dc/elements/1.1/source> <registrar-2.rdf> .

So, at face value, who-said-what.rdf captures an RDF description 
of the claims in both registrar-1.rdf and registrar-2.rdf, and
associates them with simple provenance information - in this case, 
by identifying a "dc:source" document, associated with 
some described RDF statement.

However, what happens if we believe the (perfectly reasonable) 
document, nndb-example.bio.rdf, which tells us that two 
URIs denote the same resource? ie. that the thing called  
<tag:danbri.org:2006:people:charlie> is the owl:sameAs thing 
as that called <tag:danbri.org:2006:people:bob>.

My understanding (sorry I can't quote chapter-and-verse here) is that 

 _:s1 rdf:subject <tag:danbri.org:2006:people:charlie> .
combined with 
 <tag:danbri.org:2006:people:charlie>
<http://www.w3.org/@something/.../owl#sameAs>
<tag:danbri.org:2006:people:bob> 
gives us an extra triple,
 _:s1 rdf:subject <tag:danbri.org:2006:people:charle> .

...since the two URIs are names for the same thing, there is nothing
true of 
the thing called <tag:danbri.org:2006:people:charlie>  that is not also
true of the
thing called  <tag:danbri.org:2006:people:bob>. Similarly, we should get
another 
extra triple, 
 _:s2 rdf:subject <tag:danbri.org:2006:people:bob> .


At this point, if who-said-what.rdf and nndb-example-bio.rdf are 
considered true descriptions, and we honour OWL's built-in semantics 
for owl:SameAs, we end up with an expanded bunch of triples
that use RDF reification vocabulary:

(please correct me if this is wrong - though i can't see how it could
be!)

 _:s1 rdf:type rdf:Statement .
 _:s1 rdf:predicate <http://example.org/family#wife> . 
 _:s1 rdf:subject <tag:danbri.org:2006:people:bob> .
 _:s1 rdf:subject <tag:danbri.org:2006:people:charlie> .
 _:s1 rdf:object <tag:danbri.org:2006:people:alice> .
 _:s1 <http://purl.org/dc/elements/1.1/source> <registrar-1.rdf> .

 _:s2 rdf:type rdf:Statement .
 _:s2 rdf:predicate <http://example.org/family#wife> . 
 _:s2 rdf:subject <tag:danbri.org:2006:people:charlie> .
 _:s2 rdf:subject <tag:danbri.org:2006:people:bob> .
 _:s2 rdf:object <tag:danbri.org:2006:people:alice> .
 _:s2 <http://purl.org/dc/elements/1.1/source> <registrar-2.rdf> .


So, loading up who-said-what.rdf (ostensibly, a useful file giving a
skeptical account of
which RDF documents made which claims), alongside nndb-example-bio.rdf
(another useful file,
documenting some cases in which there are multiple URI names for the
same thing), we 
get a description that can be queried with SPARQL.

 _:s1 rdf:type rdf:Statement .
 _:s1 rdf:predicate <http://example.org/family#wife> . 
 _:s1 rdf:subject <tag:danbri.org:2006:people:bob> .
 _:s1 rdf:subject <tag:danbri.org:2006:people:charlie> .
 _:s1 rdf:object <tag:danbri.org:2006:people:mary> .
 _:s1 <http://purl.org/dc/elements/1.1/source> <registrar-1.rdf> .

Let's ask it if the resource <registrar-1.rdf> is the dc:source of an
rdf:Statement
that has a predicate 'wife', subject
<tag:danbri.org:2006:people:charlie> and
object <tag:danbri.org:2006:people:alice>:

(see
http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/#queryReification
btw)


query1.rq:

	PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
	PREFIX dc:   <http://purl.org/dc/elements/1.1/>

	ASK
	{ ?s rdf:subject    <tag:danbri.org:2006:people:charlie>  .
	  ?s rdf:predicate  <http://example.org/family#wife>  .
	  ?s rdf:object     <tag:danbri.org:2006:people:alice> .
	  ?s dc:source     <registrar-1.rdf> .
        }


My understanding is that we'd get a 'yes' back from this query, but that
lots of folk would expect to get a 'no', since they read this 
as "DOes <registrar-1.rdf> contain the charlie/wife/alice triple?".

The RDF Semantics spec does contain some warning of this,
http://www.w3.org/TR/rdf-mt/#Reif 
[[
Note that this way of understanding the reification vocabulary does not 
interpret reification as a form of quotation. Rather, the reification 
describes the relationship between a token of a triple and the resources 
that triple refers to. The reification can be read intuitively as saying 
"'this piece of RDF talks about these things" rather than "this piece 
of RDF has this form".
]]

...ie., in our scenario, it is true that registrar-1.rdf *does* talk
about
the thing that has a URI name <tag:danbri.org:2006:people:charlie>, even
though that URI doesn't itself appear anywhre in the registrar-1.rdf
graph.

Combined with the RDFCore decision on statings vs statements which
allows
distinct different statements to share the same predicate, subject and 
object, RDF developers may be tempted to use RDF's reification
vocabulary
to keep track of "who said what". However, such descriptions interact in
unfortunate ways with core RDF and OWL facilities, and can give 
counter-intuitive resources. 

(Note that doing all this in pure RDF, we have no problem; owl:sameAs is
just another triple, to an RDF triplestore. It's only when the 
OWL meaning of owl:sameAs kicks in, do we get to these issues. But RDF
and
OWL systems live in the same Web; documents published from an RDF-only
shop
may be consumed, interpreted, queried etc. by OWL systems and the
results
re-published on the Web as plain RDF...)

My preference is simply to never use the W3C RDF reification vocab, and
to 
use other mechanisms for keeping track of 'who said what'....

Aside: note also that in the openworld, nobody has assured us that 
<tag:danbri.org:2006:people:alice> and <tag:danbri.org:2006:people:mary>
are
different individuals. Also that this would be a lot more complicated to 
think about if we were using bnodes and reference-by description instead
of 
simple URI identifiers for people.
Received on Thursday, 23 March 2006 19:41:55 UTC