Re: writing a simple example in prov-o, help from Graham Klyne on 2011-10-26 (public-prov-wg@w3.org from October 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Wed, 26 Oct 2011 08:36:27 +0100
To: Simon Miles <simon.miles@kcl.ac.uk>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <4EA7B87B.9090507@ninebynine.org>
Simon,

I'm not sure if you were seeking or adding clarification.

Your example illustrates nicely why I think that it is necessary to create new 
RDF nodes (blank or otherwise) for the Entities about which provenance is 
asserted, unless the original is immutable with respect the the provenance 
assertions made.

...

I *still* find myself wanting an (asymmetric) equivalent of IVPof property that 
allows us to say something like:

   <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
     ^prov:isIVPof
       [ ex:atTime "T" ;
         ^prov:wasDerivedFrom 
<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
       ] ;

     ^prov:isIVPof
       [ ex:atTime "T+2" ;
         ^prov:wasDerivedFrom 
<http://inkings.org/2011/10/08/why-provenance-is-pointless/>
       ] .

NOTE: I'm using the N3 "^" prefix on properties above to express the inverse of 
a property; thus:

      a prop b .
      b ^prop a .

are equivalent statements.  Cf. http://www.w3.org/TeamSubmission/n3/#path

I think the ability to relate (constrained) entities back to the original origin 
resource is a key expressive feature of this structure.

#g
--

The above example could be re-written in plain Turtle 
(http://www.w3.org/TR/turtle/) thus:

_:t0 prov:isIVPof
   <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html> ;
   ex:atTime "T" .

_:t2 prov:isIVPof
   <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html> ;
   ex:atTime "T+2" .

<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
   prov:wasDerivedFrom _:t0 .

<http://inkings.org/2011/10/08/why-provenance-is-pointless/>
   prov:wasDerivedFrom _:t2 .


On 25/10/2011 16:54, Simon Miles wrote:
> Paul, all,
>
> Just to properly understand why what is being discussed is important,
> I wanted to expand your example to a larger use case.
>
> At time T, you say something about a video on your blog and assert:
> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
> prov:wasDerivedFrom
> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>.
>
> At time T+1, the video is edited to introduce a previously missing
> segment that undermines the message of your blog entry. The video URI
> stays the same.
>
> At time T+2, I say something about the (updated) video on my blog and assert:
> <http://inkings.org/2011/10/08/why-provenance-is-pointless/>
> prov:wasDerivedFrom
> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>.
>
> We could then observe:
>   - Even if the above use case doesn't happen to you, by using the
> simplest form of provenance you are opening the possibility of it
> happening and you would not even know about it.
>   - It doesn't help to say that the video owners shouldn't use the same
> URL, because it is not under the control of either those creating or
> consuming the provenance.
>   - There is nothing apparently wrong with either of our assertions
> (except the lack of characterisation), and I don't know anything about
> your blog so don't take it into account in my blog's provenance.
>   - It seems reasonable criteria for interoperability that if you read
> Prov-DM from two separate sources referring to the same entity, then
> either there is an error in (at least) one or they are mutually
> consistent. I couldn't see what this would correspond to in the
> interoperability discussion [1] though.
>
> Thanks,
> Simon
>
> [1] http://www.w3.org/2011/prov/wiki/Interoperability
>
>
> On 25 October 2011 10:02, Graham Klyne<GK@ninebynine.org>  wrote:
>> On 24/10/2011 13:43, Myers, Jim wrote:
>>> A couple thoughts:
>>>
>>> When we say B wasderivedfrom A where both A and B are changing, I think the meaning we want is that some complement of A with content fixed as of the time of the derivation was used to produce a complement of B with content fixed as at the time of derivation. If that's the case, do we just need a shorthand to define such entities, i.e. to define an entity as one that characterized a URI (perhaps at a time) without then creating an identifier for it (a blank node) or explicitly stating it as a complement of the URI it characterizes? I think this is consistent with the model in the sense that how entities characterize things is defined in terms of both attributes and their provenance (Luc in Boston who arrived by train today) - saying that I'm defining an 'entity characterizing Luc' that I can then use to assert that that entity flew on a plane out of Boston is really just an alternate way of defining Luc-in-Boston. Allowing an optional timestamp for when the entity charact
er
>> ized the living URI just fixes where in the provenance graph an entity must be (given timestamps on other processes, etc.), e.g. a shorthand that would allow integration with another account that said Luc-in-Boston arrived by train at time X.
>>
>> I'm not sure that "B wasderivedfrom A where both A and B are changing" is a
>> meaningful statement.  There's a tense mix-up there if nothing else :)
>>
>> But, more seriously, w.r.t. "to define an entity as one that characterized a URI
>> (perhaps at a time) without then creating an identifier for it (a blank node)" -
>> I feel fairly strongly that trying to avoid creating an RDF node doesn't really
>> help - if it's possible to do.  RDF statements have to be associated with a node
>> (actually 2 :), not counting the property) - whether that node is blank
>> (existential) or identified with a URI is a separate consideration.  And the
>> node, in RDF, *is* the identifier.
>>
>>> Regardless of how we identify/define such entities (whether as above or the other options in this thread), I think one can avoid having to document things like creation times that one does not know about - rather than affixing a timestamp to a generation process (prov:wasGeneratedAt examples in the thread), one could record when materials were viewed/accessed: I could say B wasderivedfrom A, A participatedIn an access process today, B participated in an access process today which would mean that whenever in the past A and B were created, they are the same entities (same content) as when I accessed them today, i.e. I'm asserting that the B as I saw it today was derived from A as I saw it today at some point in the past). Making that slightly more generic - an asserter could report whatever process they did to characterize the entity - we wouldn't be limited to talking about generation.
>>
>> I think I agree with this bit.
>>
>> #g
>> --
>>
>>>> -----Original Message-----
>>>> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
>>>> request@w3.org] On Behalf Of Paul Groth
>>>> Sent: Friday, October 21, 2011 5:24 PM
>>>> To: Luc Moreau
>>>> Cc: public-prov-wg@w3.org
>>>> Subject: Re: writing a simple example in prov-o, help
>>>>
>>>> Hi Luc, all:
>>>>
>>>> That's good. I think this gives the basis for writing some simple examples.
>>>>
>>>> With regards to Section 8, I wanted to clarify a couple things
>>>>
>>>> - It would be good to check the ramifications of the duality of identifiers in
>>>> particular with respect to Semweb definitions. My thought is that this should
>>>> be alright because of the open world assumption but does anybody see any
>>>> problems?
>>>>
>>>> - The duality of reusing the identifier heavily relies on accounts. But in most
>>>> cases people won't assert an account. Is their some default account? What's
>>>> the policy? I think one could assume that every expression was in its own
>>>> account unless otherwise specified. Or is everything in one general account?
>>>>
>>>> cheers,
>>>> Paul
>>>>
>>>>
>>>> Luc Moreau wrote:
>>>>> Your sugestion, Paul, is indeed supported by DM. Look at section 8 And
>>>> imagine an empty list of attributes.
>>>>>
>>>>> But, as you say, it's weak characterisation.
>>>>>
>>>>> Professor Luc Moreau
>>>>> Electronics and Computer Science
>>>>> University of Southampton
>>>>> Southampton SO17 1BJ
>>>>> United Kingdom
>>>>>
>>>>> On 21 Oct 2011, at 18:16, "Paul Groth"<p.t.groth@vu.nl>     wrote:
>>>>>
>>>>>> HI Stian, All:
>>>>>>
>>>>>> This is exactly what I was afraid of. At a minimum, we need really simple
>>>> ways of describing the provenance of web pages. You shouldn't have to
>>>> understand accounts or even the notion of characterized thing to use our
>>>> vocabulary. It should just work and we should be able to interpret these
>>>> statements with respect to the prov-dm world view.
>>>>>>
>>>>>> My perspective is that once you says something is of type provo:Entity
>>>> then it should be "characterized" from that perspective (i.e. account). It may
>>>> not be a "good" characterization but that shouldn't matter.
>>>>>>
>>>>>> It would be interesting if this suggested approach fits into the PROV-DM
>>>> model. Luc, Paolo?
>>>>>>
>>>>>> cheers
>>>>>> Paul
>>>>>>
>>>>>>
>>>>>>
>>>>>> Stian Soiland-Reyes wrote:
>>>>>>> On Fri, Oct 21, 2011 at 15:41, Paul Groth<p.t.groth@vu.nl>      wrote:
>>>>>>>
>>>>>>>> I want to say that the post was derived from the video.
>>>>>>>> Here's what I naturally wrote down:
>>>>>>>> @prefix prov:<http://www.w3.org/ns/prov-o/>.
>>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> funda
>>>>>>>> mental-for-people/>
>>>>>>>> prov:wasDerivedFrom
>>>>>>>>
>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>.
>>>>>>>> This implies that both the post and the youtube video are of type
>>>>>>>> prov:Entity.
>>>>>>>> But that seems wrong because they are not characterized things.
>>>>>>>> They could change. Or is the url enough of a characterization?
>>>>>>> If you think the resource behind the URIs might change (as most
>>>>>>> can), you should provide some attributes to help describe the
>>>>>>> entity. I believe it COULD be valid for you to use the "real" URIs
>>>>>>> here, as your simple account does not cover the earlier or later
>>>>>>> versions of the two resources.
>>>>>>>
>>>>>>> You should however then include some attributes to help merge with
>>>>>>> other accounts which might have a different view, as a minimum a
>>>>>>> timestamp or description of the content.
>>>>>>>
>>>>>>> We don't really have a generic timestamp feature in PROV, but you
>>>>>>> can say when an entity was generated:
>>>>>>>
>>>>>>>
>>>>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
>>>>>>>       prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ]
>>>> .
>>>>>>>
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundamental-for-people/>
>>>>>>>       prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ]
>>>> .
>>>>>>>
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundamental-for-people/>
>>>>>>>      prov:wasDerivedFrom
>>>>>>>
>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>      .
>>>>>>>
>>>>>>>
>>>>>>> (I'm not too comfortable with this approach either - because the
>>>>>>> asserter is in a way claiming that the TED talk HTML was created at
>>>>>>> 18:25, which is probably not something you as the asserter know.  By
>>>>>>> PROV-DM this should be kinda-OK, he is merely identifying an entity,
>>>>>>> which describes a thing in the world - which in this case is a web
>>>>>>> page. Different accounts don't need to agree on their entity
>>>>>>> descriptions or provenance assertions even if they are using the
>>>>>>> same identifiers (and somehow are talking about the same things).
>>>>>>>
>>>>>>> Of course, as pointed out by Satya "URIs have a global scope and are
>>>>>>> interpreted consistently regardless of context" - so I should not
>>>>>>> just make up an URI like
>>>>>>> <http://thinklinks.wordpress.com/stian-stole-your-namespace>      and
>>>> claim
>>>>>>> that this URI shows the location of my slippers - we should both
>>>>>>> interpret this as a identifying the resource
>>>>>>> "stian-stole-your-namespace" on the HTTP server reachable by the DNS
>>>>>>> name thinklinks.wordpress.com.
>>>>>>>
>>>>>>>
>>>>>>> Approaches like the PAV ontology
>>>>>>> (http://code.google.com/p/pav-ontology/) solves the timestamp issue
>>>>>>> by an intermediary:
>>>>>>>
>>>>>>> :doc a pav:Sourcedocument ;
>>>>>>>      pav:retrievedFrom
>>>>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
>>>> ;
>>>>>>>      pav:sourceAccessedOn "2011-10-17T18:25:00Z" .
>>>>>>>
>>>>>>> However here we have introduced an intermediary :doc (similar to our
>>>>>>> prov:Entity) which you still need to mint an URI for.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> A different account which includes several revisions of the
>>>>>>> resource, provided by Wordpress database, for instance, would need
>>>>>>> to identify each of these using other identifiers, such as local IDs
>>>>>>> in the RDF
>>>>>>> document:
>>>>>>>
>>>>>>> @prefix prov:<http://www.w3.org/ns/prov-o/>      .
>>>>>>> @prefix time:<http://www.w3.org/2006/time#>      .
>>>>>>>
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundamental-for-people/>
>>>>>>>      prov:wasGeneratedAt :creationTime .
>>>>>>>
>>>>>>> :creationTime a prov:Time ;
>>>>>>>      time:inXSDDateTime "2011-10-15T15:00Z" .
>>>>>>>
>>>>>>> :blog1 a prov:Entity;
>>>>>>>      prov:wasGeneratedAt :creationTime ;
>>>>>>>      # i.e. generated at same time as:
>>>>>>>      prov:wasComplementOf
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundam
>>>>>>> ental-for-people/>
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>> :tedTalk a prov:Entity ;
>>>>>>>     # So this is not the generation time of the talk HTML - but
>>>>>>>     # the generation time of the overlapping entity description
>>>>>>>     # (as the author saw it and embedded its video in :blog2)
>>>>>>>     prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ]
>>>> ;
>>>>>>>     prov:wasComplementOf
>>>>>>> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
>>>> .
>>>>>>>
>>>>>>> :blog2 a prov:Entity ;
>>>>>>>      prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ]
>>>> ;
>>>>>>>      prov:wasComplementOf
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundam
>>>>>>> ental-for-people/>
>>>>>>> ;
>>>>>>>      notYetInProv:wasRevisionOf :blog1 ;
>>>>>>>      prov:wasDerivedFrom :blog1 ;
>>>>>>>      # Embedded the video this time
>>>>>>>      prov:wasDerivedFrom :tedTalk .
>>>>>>>
>>>>>>>
>>>>>>> I much prefer this approach, but it does become more verbose. It
>>>>>>> still
>>>>>>>
>>>> makes<http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.ht
>>>>>>> ml>    an prov:Entity - but we don't say anything more about it because
>>>>>>> we simply don't know its provenance.
>>>>>>>
>>>>>>>
>>>>>>> (I still believe that we need something stronger than
>>>>>>> wasComplementOf above - we know for a fact that :blog2 is fully within
>>>> the timespan of
>>>>>>>
>>>>>>> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-
>>>> fundam
>>>>>>> ental-for-people/>    but I can't see how to express this in PROV)
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Dr. Paul Groth (p.t.groth@vu.nl)
>>>>>> http://www.few.vu.nl/~pgroth/
>>>>>> Assistant Professor
>>>>>> Knowledge Representation&     Reasoning Group Artificial Intelligence
>>>>>> Section Department of Computer Science VU University Amsterdam
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Dr. Paul Groth (p.t.groth@vu.nl)
>>>> http://www.few.vu.nl/~pgroth/
>>>> Assistant Professor
>>>> Knowledge Representation&    Reasoning Group Artificial Intelligence Section
>>>> Department of Computer Science VU University Amsterdam
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
Received on Wednesday, 26 October 2011 11:08:54 UTC