Re: In RDF what is the best practice to represent data provenance (source)? from Richard Cyganiak on 2007-01-18 (semantic-web@w3.org from January 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 18 Jan 2007 11:22:44 +0100
To: Michael Schneider <m_schnei@gmx.de>
Cc: chris@bizer.de, semantic-web@w3.org, semantic_web@googlegroups.com
Message-Id: <B1F5F645-E43C-4799-8466-49D6AAD34B9F@cyganiak.de>

On 18 Jan 2007, at 00:32, Michael Schneider wrote:
>> RDF reification doesn't work for practical reasons
>
> Why is this so? I always had some vague feeling that reification
> does not have many friends within the community, but I never found
> a real reason for this: Neither a technical reasons, nor a modeling
> reason.

Here's a real reason. Scenario: provenance tracking.

Michael wants to publish the claim "Reification is great."

Chris wants to publish the claim "That's nonsense."

Richard wants to have both claims in an RDF store and keep track of  
who said what.

First with reification. Michael:

     :reification :is :great .

Chris disagrees. So he has to reify the statement and attach a claim  
that the statement is nonsense.

     :michael_statement
         rdf:subject :reification;
         rdf:predicate :is;
         rdf:object :great;
         rdf:type :NonsensicalStatement .

Now Richards wants to load all these statements into an RDF store. No  
problem. He also wants to keep track of provenance; thus he has to  
reify each statement and attach provenance. I will use your proposed  
shorthand notation from earlier in the thread because otherwise it  
would be too tedious:

     :reification :is :great .
     :michael_statement
         rdf:subject :reification;
         rdf:predicate :is;
         rdf:object :great;
         rdf:type :NonsensicalStatement .

     `:reification :is :great .`
         :asserted_by :michael .
     `:michael_statement rdf:subject :reification .`
         :asserted_by :chris .
     `:michael_statement rdf:predicate :is .`
         :asserted_by :chris .
     `:michael_statement rdf:object :great .`
         :asserted_by :chris .
     `:michael_statement rdf:type :NonsensicalStatement .`
         :asserted_by :chris .

Richard is bothered a little bit by having to reify the reified  
statement.

Now imagine if Bob gets into the discussion disagreeing with the  
claim made by Chris; now Richard has to store reified reified reified  
statements. Then Charlie and Dora get into the discussion ... I'm  
tempted to write down the triples just to show that it gets quite  
cumbersome.

Enter named graphs.

     :michael_graph {
         :reification :is_great_for :provenance_tracking .
     }

Chris disagrees:

     :chris_graph {
         :michael_graph rdf:type :Nonsense .
     }

Richard's store:

     :michael_graph {
         :reification :is_great_for :provenance_tracking .
     }
     :chris_graph {
         :michael_graph rdf:type :Nonsense .
     }
     :provenance_graph {
         :michael_graph :asserted_by :michael .
         :chris_graph :asserted_by :michael .
     }

And accommodating the contributions of Bob, Charlie and Dora is  
straightforward.

See why reification does not have many friends?

Surprisingly, some people choose to use reification nonetheless. Why  
is this so? Is it just because its unfinished empty concrete shell  
was left in the RDF spec? I never found a real reason, neither  
technical nor modelling.

Yours,
Richard

Received on Thursday, 18 January 2007 10:23:50 UTC