Re: writing a simple example in prov-o, help

On Fri, Oct 21, 2011 at 15:41, Paul Groth <p.t.groth@vu.nl> wrote:

> I want to say that the post was derived from the video.
> Here's what I naturally wrote down:

> @prefix prov: <http://www.w3.org/ns/prov-o/>.
> <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
> prov:wasDerivedFrom
> <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>.

> This implies that both the post and the youtube video are of type
> prov:Entity.
> But that seems wrong because they are not characterized things. They could
> change. Or is the url enough of a characterization?


If you think the resource behind the URIs might change (as most can),
you should provide some attributes to help describe the entity. I
believe it COULD be valid for you to use the "real" URIs here, as your
simple account does not cover the earlier or later versions of the two
resources.

You should however then include some attributes to help merge with
other accounts which might have a different view, as a minimum a
timestamp or description of the content.

We don't really have a generic timestamp feature in PROV, but you can
say when an entity was generated:


<http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
   prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ] .

<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
   prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ] .

<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
  prov:wasDerivedFrom
     <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html> .


(I'm not too comfortable with this approach either - because the
asserter is in a way claiming that the TED talk HTML was created at
18:25, which is probably not something you as the asserter know.  By
PROV-DM this should be kinda-OK, he is merely identifying an entity,
which describes a thing in the world - which in this case is a web
page. Different accounts don't need to agree on their entity
descriptions or provenance assertions even if they are using the same
identifiers (and somehow are talking about the same things).

Of course, as pointed out by Satya "URIs have a global scope and are
interpreted consistently regardless of context" - so I should not just
make up an URI like
<http://thinklinks.wordpress.com/stian-stole-your-namespace> and claim
that this URI shows the location of my slippers - we should both
interpret this as a identifying the resource
"stian-stole-your-namespace" on the HTTP server reachable by the DNS
name thinklinks.wordpress.com.


Approaches like the PAV ontology
(http://code.google.com/p/pav-ontology/) solves the timestamp issue by
an intermediary:

:doc a pav:Sourcedocument ;
  pav:retrievedFrom
<http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html> ;
  pav:sourceAccessedOn "2011-10-17T18:25:00Z" .

However here we have introduced an intermediary :doc (similar to our
prov:Entity) which you still need to mint an URI for.



A different account which includes several revisions of the resource,
provided by Wordpress database, for instance, would need to identify
each of these using other identifiers, such as local IDs in the RDF
document:

@prefix prov: <http://www.w3.org/ns/prov-o/> .
@prefix time: <http://www.w3.org/2006/time#> .

<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
  prov:wasGeneratedAt :creationTime .

:creationTime a prov:Time ;
  time:inXSDDateTime "2011-10-15T15:00Z" .

:blog1 a prov:Entity;
  prov:wasGeneratedAt :creationTime ;
  # i.e. generated at same time as:
  prov:wasComplementOf
<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
.


:tedTalk a prov:Entity ;
 # So this is not the generation time of the talk HTML - but
 # the generation time of the overlapping entity description
 # (as the author saw it and embedded its video in :blog2)
 prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:25:00Z" ] ;
 prov:wasComplementOf
<http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html> .

:blog2 a prov:Entity ;
  prov:wasGeneratedAt [ time:inXSDDateTime "2011-10-17T18:30:00Z" ] ;
  prov:wasComplementOf
<http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
;
  notYetInProv:wasRevisionOf :blog1 ;
  prov:wasDerivedFrom :blog1 ;
  # Embedded the video this time
  prov:wasDerivedFrom :tedTalk .


I much prefer this approach, but it does become more verbose. It still
makes  <http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html>
an prov:Entity - but we don't say anything more about it because we
simply don't know its provenance.


(I still believe that we need something stronger than wasComplementOf
above - we know for a fact that :blog2 is fully within the timespan of
 <http://thinklinks.wordpress.com/2011/07/31/why-provenance-is-fundamental-for-people/>
but I can't see how to express this in PROV)


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Friday, 21 October 2011 16:09:47 UTC