Re: Provenance as a first-class citizen from Giovanni Tummarello on 2006-03-21 (semantic-web@w3.org from March 2006)

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Tue, 21 Mar 2006 11:04:43 +0100
To: Graham Klyne <GK@ninebynine.org>
CC: semantic-web@w3.org
Message-ID: <441FCFBB.8020402@gmail.com>
imho, the fact that the same statement would be expressed multiple times 
with reification (which is the document you point at) is a non issue 
since its merely a syntactical way to point at the very same triple. 
Even the same node can exist multiple times in the same model nowadays 
that IFPs are popular..

I am seriously convinced that there is nothing wrong with reification. 

about giving a semantic to reification, just subclass it? The plain 
vanilla reification doesn't entail existence of the triple.. and that's 
good. In DBin (www.dbin.org) we use a subclass of reification which 
means basically "if you trust this author, then these are stated triples 
in your DBin installation". So not only reification works for us, but 
its also space efficient when you combine it with the Minimum Self 
Contained graph theory by which you can speak about many triples at the 
same time with a single reification. Real world use case show a 25% 
triple overhead only in digitally signing a piece of RDF graph using 
reification and MSG theory, for examples and references see  
http://semedia.deit.univpm.it/tiki-index.php?page=RdfContextTools

Giovanni


Graham Klyne wrote:
> Giovanni Tummarello wrote:
>   
>> Except quads being just syntactic sugar to mean  "reification" thus just
>> adding complexity .. :-)
>>
>> there is absolutely nothing wrong with reification.
>>     
>
> While I agree with most of what you say in this message, there is a problem with
> reification.  When we discussed this in RDFcore, there were two distinct
> possible semantics for reification, and it wasn't entirely obvious which one was
> most useful.  In the end, the group went for the semanbtics that most closely
> reflected the use of reification for capturing provenance information - but
> there were (at the time, IIRC) applications that applied a different semantics.
>
> I think the semantic divergence is related to this:
>   http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-of-statements
> but my memory of the details has become hazy.
>
> Certainly, the resolution expressed by the test case there favours the use of
> reification for provenance.
>
> As for syntactic support, then I think "named graphs" is the way to go, and I
> note that SPARQL has some support for a syntactic framework of named graphs
> (while being silent about the semantics).
>
> #g
> --
>
> PS:  I when I tried to send that message, the spell-checker in my email client
> suggested that I replace "reification" with "deification".  Is it trying to tell
> me something? :)
>
>
>
>   
>> The fact that a reified statement doesnt imply the existence of the
>> statement itself in the graph is a feature not a bug.
>> the fact that you should keep your reasoner well behaved when making
>> inference over reified triples is simply.. normal (the superman thing).
>>
>> So, unless there are actual reasons (which i'd be happy to hear),  w3c
>> please keep RDF nice and clean as it is.
>>
>> On the other hand .. syntactic support for reificaiton *should* instead
>> be demandded at access level and in particular from the DAWG in SparQL..
>> since queries are usually written by humans and that would just make a
>> lots of sense. But last time i asked... :-)
>> my2c.
>>
>> Giovanni
>>
>>
>> Harry Halpin wrote:
>>     
>>> Is it just me or does it seem like the sensible thing to get a W3C
>>> Recommendation on using named graphs (quads) as an *optional* feature of
>>> RDF? Then people that want to publish data to URIs (Sandro) can do that
>>> without using quads, and people that are merging and aggregating
>>> information can do so in a standardized manner that's already
>>> implemented? Or have I just missed the W3C recommendation for quads?
>>> This would be a rather small move, but I think it would help deal with
>>> some of the issues around provenance.
>>>
>>> As for reification, people can just keep ignoring it :)
>>>
>>> It seems like this is precisely the sort of thing the W3C should do,
>>> which is standardize best practice as learned through experience. The
>>> real problem would be if there was really worthwhile cases of provenance
>>> that quads didn't catch...
>>>
>>>                                      -harry
>>>
>>>
>>>
>>> Dan Brickley wrote:
>>>
>>>  
>>>       
>>>> * Sandro Hawke <sandro@w3.org> [2006-03-17 15:56-0500]
>>>>  
>>>>
>>>>    
>>>>         
>>>>> Ben Syverson wrote:
>>>>>   
>>>>>      
>>>>>           
>>>>>> On Mar 17, 2006, at 11:04 AM, Garrett Wollman wrote:
>>>>>>     
>>>>>>        
>>>>>>             
>>>>>>> I'm certain that this has been said before by people better-informed
>>>>>>> than I, but the more I look at RDF the more certain I am that basing
>>>>>>> it on triples rather than 4-tuples was a serious mistake.
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>> I agree with everything you say here, except the bit about "rare", which
>>>> I'm agnostic on. WIll there be more writers than readers on the Semantic
>>>> Web? Who knows :) Publishers, as you note, should just say stuff, and
>>>> not feel the need to reify at the triple level that they've said it.
>>>> Consumers should, at some level of their application, take account of
>>>> who said what. Especially when they're merging and aggregating
>>>> (something that the RDF approach directly encourages, by being so
>>>> merge-able). I've never found triple-based reification attractive;
>>>> it's too granular, amongst other things. Publishers probably should
>>>> do a few
>>>> little things in their RDF that are at the document/graph level rather
>>>> than per-triple, eg. assert that they're the dc:creator of the RDF/XML
>>>> document, and publish some form of digital signature. Edd has a nice
>>>> writeup of a simple PGP/GPG-based approach that folk in the FOAF
>>>> community were experimenting with:
>>>> http://usefulinc.com/foaf/signingFoafFiles --- perhaps if some
>>>> techniques like that were more deployed, consumers of RDF would find
>>>> more value in quadstore techniques? Particularly as quads are now being
>>>> exposed in a standard way via SPARQL...
>>>>
>>>> Dan
>>>>
>>>>  
>>>>
>>>>    
>>>>         
>>>>>> I agree 1000%. Using triples means that by default statements are 
>>>>>> trusted and not reified. It suggests a top-down approach, rather
>>>>>> than  a bottom-up one. This is one reason that tags/keywords are
>>>>>> more  appealing to people than the SW.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> I disagree.
>>>>>
>>>>> RDF is based on triples because triples are an excellent single
>>>>> building
>>>>> block for making arbitrary statements.
>>>>>
>>>>> For making statements about statements -- which you're talking about --
>>>>> you need something more complex, like quads or reification, but that's
>>>>> relatively rare (even if it's very interesting).
>>>>>
>>>>> Publishing statements as triples makes sense.  Whatever you want your
>>>>> web page to say, just put those statements on the page.  You shouldn't
>>>>> have to put on the page a statement that those statements are on the
>>>>> page and are true.  Say "The sky is blue", not "I am now telling you
>>>>> that the sky is blue."
>>>>>
>>>>> For reasoning about statements, yes, of course use quads.  When I
>>>>> harvest RDF data, of course I keep track of what web pages said what.
>>>>> But I don't usually need to re-publish that harvester data; that's like
>>>>> my web browser publishing my browsing history along with the browser
>>>>> cache.  There are applications where that's useful, sure, but it's
>>>>> hardly the main way data moves around the web.
>>>>>
>>>>>    -- sandro
>>>>>   
>>>>>       
>>>>>           
>>>>  
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>
Received on Tuesday, 21 March 2006 10:05:12 UTC