Re: writing a simple example in prov-o, help from Paul Groth on 2011-10-21 (public-prov-wg@w3.org from October 2011)

From: Paul Groth <p.t.groth@vu.nl>
Date: Fri, 21 Oct 2011 23:24:00 +0200
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <4EA1E2F0.4040602@vu.nl>
Hi Luc, all:

That's good. I think this gives the basis for writing some simple examples.

With regards to Section 8, I wanted to clarify a couple things

- It would be good to check the ramifications of the duality of 
identifiers in particular with respect to Semweb definitions. My thought 
is that this should be alright because of the open world assumption but 
does anybody see any problems?

- The duality of reusing the identifier heavily relies on accounts. But 
in most cases people won't assert an account. Is their some default 
account? What's the policy? I think one could assume that every 
expression was in its own account unless otherwise specified. Or is 
everything in one general account?

cheers,
Paul


Luc Moreau wrote:
> Your sugestion, Paul, is indeed supported by DM. Look at section 8 And imagine an empty list of attributes.
>
> But, as you say, it's weak characterisation.
>
> Professor Luc Moreau
> Electronics and Computer Science
> University of Southampton
> Southampton SO17 1BJ
> United Kingdom
>
> On 21 Oct 2011, at 18:16, "Paul Groth"<p.t.groth@vu.nl>  wrote:
>
>> HI Stian, All:
>>
>> This is exactly what I was afraid of. At a minimum, we need really simple ways of describing the provenance of web pages. You shouldn't have to understand accounts or even the notion of characterized thing to use our vocabulary. It should just work and we should be able to interpret these statements with respect to the prov-dm world view.
>>
>> My perspective is that once you says something is of type provo:Entity then it should be "characterized" from that perspective (i.e. account). It may not be a "good" characterization but that shouldn't matter.
>>
>> It would be interesting if this suggested approach fits into the PROV-DM model. Luc, Paolo?
>>
>> cheers
>> Paul
>>
>>
>>
>> Stian Soiland-Reyes wrote:
>>> On Fri, Oct 21, 2011 at 15:41, Paul Groth<p.t.groth@vu.nl>   wrote:
>>>
>>>> I want to say that the post was derived from the video.
>>>> Here's what I naturally wrote down:
>>>> @prefix prov:<http://www.w3.org/ns/prov-o/>.
>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>>> prov:wasDerivedFrom
>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>.
>>>> This implies that both the post and the youtube video are of type
>>>> prov:Entity.
>>>> But that seems wrong because they are not characterized things. They could
>>>> change. Or is the url enough of a characterization?
>>> If you think the resource behind the URIs might change (as most can),
>>> you should provide some attributes to help describe the entity. I
>>> believe it COULD be valid for you to use the "real" URIs here, as your
>>> simple account does not cover the earlier or later versions of the two
>>> resources.
>>>
>>> You should however then include some attributes to help merge with
>>> other accounts which might have a different view, as a minimum a
>>> timestamp or description of the content.
>>>
>>> We don't really have a generic timestamp feature in PROV, but you can
>>> say when an entity was generated:
>>>
>>>
>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
>>>     prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ] .
>>>
>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>>     prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ] .
>>>
>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>>    prov:wasDerivedFrom
>>>       <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>   .
>>>
>>>
>>> (I'm not too comfortable with this approach either - because the
>>> asserter is in a way claiming that the TED talk HTML was created at
>>> 18:25, which is probably not something you as the asserter know.  By
>>> PROV-DM this should be kinda-OK, he is merely identifying an entity,
>>> which describes a thing in the world - which in this case is a web
>>> page. Different accounts don't need to agree on their entity
>>> descriptions or provenance assertions even if they are using the same
>>> identifiers (and somehow are talking about the same things).
>>>
>>> Of course, as pointed out by Satya "URIs have a global scope and are
>>> interpreted consistently regardless of context" - so I should not just
>>> make up an URI like
>>> <http://thinklinks.wordpress.com/stian-stole-your-namespace>   and claim
>>> that this URI shows the location of my slippers - we should both
>>> interpret this as a identifying the resource
>>> "stian-stole-your-namespace" on the HTTP server reachable by the DNS
>>> name thinklinks.wordpress.com.
>>>
>>>
>>> Approaches like the PAV ontology
>>> (http://code.google.com/p/pav-ontology/) solves the timestamp issue by
>>> an intermediary:
>>>
>>> :doc a pav:Sourcedocument ;
>>>    pav:retrievedFrom
>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>   ;
>>>    pav:sourceAccessedOn "2011-10-17T18:25:00Z" .
>>>
>>> However here we have introduced an intermediary :doc (similar to our
>>> prov:Entity) which you still need to mint an URI for.
>>>
>>>
>>>
>>> A different account which includes several revisions of the resource,
>>> provided by Wordpress database, for instance, would need to identify
>>> each of these using other identifiers, such as local IDs in the RDF
>>> document:
>>>
>>> @prefix prov:<http://www.w3.org/ns/prov-o/>   .
>>> @prefix time:<http://www.w3.org/2006/time#>   .
>>>
>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>>    prov:wasGeneratedAt :creationTime .
>>>
>>> :creationTime a prov:Time ;
>>>    time:inXSDDateTime "2011-10-15T15:00Z" .
>>>
>>> :blog1 a prov:Entity;
>>>    prov:wasGeneratedAt :creationTime ;
>>>    # i.e. generated at same time as:
>>>    prov:wasComplementOf
>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>> .
>>>
>>>
>>> :tedTalk a prov:Entity ;
>>>   # So this is not the generation time of the talk HTML - but
>>>   # the generation time of the overlapping entity description
>>>   # (as the author saw it and embedded its video in :blog2)
>>>   prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ] ;
>>>   prov:wasComplementOf
>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>   .
>>>
>>> :blog2 a prov:Entity ;
>>>    prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ] ;
>>>    prov:wasComplementOf
>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>> ;
>>>    notYetInProv:wasRevisionOf :blog1 ;
>>>    prov:wasDerivedFrom :blog1 ;
>>>    # Embedded the video this time
>>>    prov:wasDerivedFrom :tedTalk .
>>>
>>>
>>> I much prefer this approach, but it does become more verbose. It still
>>> makes<http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
>>> an prov:Entity - but we don't say anything more about it because we
>>> simply don't know its provenance.
>>>
>>>
>>> (I still believe that we need something stronger than wasComplementOf
>>> above - we know for a fact that :blog2 is fully within the timespan of
>>>   <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
>>> but I can't see how to express this in PROV)
>>>
>>>
>> -- 
>> Dr. Paul Groth (p.t.groth@vu.nl)
>> http://www.few.vu.nl/~pgroth/
>> Assistant Professor
>> Knowledge Representation&  Reasoning Group
>> Artificial Intelligence Section
>> Department of Computer Science
>> VU University Amsterdam
>>
>>

-- 
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth/
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam
Received on Friday, 21 October 2011 21:24:40 UTC