Re: In RDF what is the best practice to represent data provenance (source)?

Hi, Richard!

On 18 Jan 2007, Richard Cyganiak wrote:

> On 18 Jan 2007, at 00:32, Michael Schneider wrote:
>>> RDF reification doesn't work for practical reasons
>>
>> Why is this so? I always had some vague feeling that reification
>> does not have many friends within the community, but I never found
>> a real reason for this: Neither a technical reasons, nor a modeling
>> reason.
> 
> Here's a real reason. Scenario: provenance tracking.
> 
> Michael wants to publish the claim "Reification is great."
> 
> Chris wants to publish the claim "That's nonsense."
> 
> Richard wants to have both claims in an RDF store and keep track of  
> who said what.
> 
> First with reification. Michael:
> 
>      :reification :is :great .
> 
> Chris disagrees. So he has to reify the statement and attach a claim  
> that the statement is nonsense.
> 
>      :michael_statement
>          rdf:subject :reification;
>          rdf:predicate :is;
>          rdf:object :great;
>          rdf:type :NonsensicalStatement .

Little break here!

Let's see what this expression means from my own point of view: The
statement that Reification is great, belongs to the set of nonsensical
statements. My understanding is that this statement is not about the
specific syntactical triple ":reification :is :great", because one could
for example substitute the URI ":reification" by the URI 
:rdfReification, without changing the intended meaning of this 
expression (where both, :reification and :rdfReification are considered
to be owl:sameAs). So, in this case, I think I would regard the use of
reification to be the right choice.

> Now Richards wants to load all these statements into an RDF store. No  
> problem. He also wants to keep track of provenance; thus he has to  
> reify each statement and attach provenance. I will use your proposed  
> shorthand notation from earlier in the thread because otherwise it  
> would be too tedious:
> 
>      :reification :is :great .
>      :michael_statement
>          rdf:subject :reification;
>          rdf:predicate :is;
>          rdf:object :great;
>          rdf:type :NonsensicalStatement .
> 
>      `:reification :is :great .`
>          :asserted_by :michael .
>      `:michael_statement rdf:subject :reification .`
>          :asserted_by :chris .
>      `:michael_statement rdf:predicate :is .`
>          :asserted_by :chris .
>      `:michael_statement rdf:object :great .`
>          :asserted_by :chris .
>      `:michael_statement rdf:type :NonsensicalStatement .`
>          :asserted_by :chris .
> 
> Richard is bothered a little bit by having to reify the reified  
> statement.

He could also assign URIs or BlankNodeIDs to each reified statement, and
then refer to them by name, in an analog way as he would reference
named singleton graphs by name.

And, as for named graphs, he does not have to add a ":asserted_by"
property to each single reified statement, if he really want's to
express that the complete statement (consisting of four sub-statements
here) is asserted by Chris. Instead, he could build a rdf:Bag of the
last four above reified statements, and then assign the ":asserted_by
:chris" property to that bag.

The real question here is, if all those ":asserted_by" statements above
are talking about the statements denoted by the triples in :richard's
triple store, or if they talk about those stored triples themself.
That depends on what you want to express here. In the first case, I
would prefer the use of reification, in the second case, I would like to
use named graphs.

> Now imagine if Bob gets into the discussion disagreeing with the  
> claim made by Chris; now Richard has to store reified reified reified  
> statements. Then Charlie and Dora get into the discussion ... I'm  
> tempted to write down the triples just to show that it gets quite  
> cumbersome.

If you really want to express such statements about statements about...,
than such a "statement cascade" is the correct modeling approach. Note,
that you effectively do this with named graphs, too! Below, you have
first a named graph ":michael_graph". Than you have another named graph
":chris_graph", which contains some statement talking about
":michael_graph" - first cascade. Then you might have another graph
":bob_graph", which contains a statement about ":chris_graph" - second
cascade, and so on. There is no conceptual difference in this regard to
reification.

And, as for named graphs, you can use names (URIs or BlankNodeIDs) for
(bags of) reified statements, too. So, I cannot see that large advantage
of named graphs in the sense that they would provide much more
convenience than reification. I can see, however, a clear difference in
what can be expressed by both constructs. See next:

> Enter named graphs.
> 
>      :michael_graph {
>          :reification :is_great_for :provenance_tracking .
>      }
> 
> Chris disagrees:
> 
>      :chris_graph {
>          :michael_graph rdf:type :Nonsense .
>      }
 >
 > Richard's store:
 >
 >      :michael_graph {
 >          :reification :is_great_for :provenance_tracking .
 >      }
 >      :chris_graph {
 >          :michael_graph rdf:type :Nonsense .
 >      }

Now, what does the last named graph here express? According to the
introduction of chapter 3 ("Formal Semantics") of [1] (the paper cited
by Chris), the name (URI) of a named graph just denotes that named
graph. So, the statement within :chris_graph above says that the named
graph, which is denoted by the URI ":michael_graph", is in some class
called ":Nonsense". We further use the convention that named /singleton/
graphs should be regarded as the single syntactical triple contained. So
we finally get the following meaning for the statement within
:chris_graph: The syntactical triple

   ":reification :is_great_for :provenance_tracking"

is a member of the ":Nonsense" class.

But what is meant by a "nonsensical" syntactical triple? Perhaps, it
might be one, for which there is no satisfying interpretation, or
something like that? Whatever the meaning actually is, it is quite
different from the meaning of the above reified statement
":michael_statement".

>      :provenance_graph {
>          :michael_graph :asserted_by :michael .
>          :chris_graph :asserted_by :michael .
>      }
> 
> And accommodating the contributions of Bob, Charlie and Dora is  
> straightforward.
> 
> See why reification does not have many friends?
> 
> Surprisingly, some people choose to use reification nonetheless. Why  
> is this so? Is it just because its unfinished empty concrete shell  
> was left in the RDF spec? I never found a real reason, neither  
> technical nor modelling.

I want to use RDF reification for referencing a relationship between
resources, in order to describe it by assigning properties. According to
my understanding of [1], I cannot do this with Named Graphs, because
their formal semantics do not allow this.

 > Yours,
 > Richard

Cheers,
Michael

[1] http://www.websemanticsjournal.org/ps/pub/2005-23

Received on Thursday, 18 January 2007 20:20:12 UTC