Re: Bundles explained from Stian Soiland-Reyes on 2013-10-29 (public-prov-comments@w3.org from October 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Tue, 29 Oct 2013 15:22:19 +0000
To: Mike Loll <mike.loll@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
Message-ID: <CAPRnXtnZmqUAHX=oC9Qv8f-ZcKGJpCzuDCy-pVHCw4E9GfkRdQ@mail.gmail.com>
See http://practicalprovenance.wordpress.com/2013/10/29/resources-that-change-state/

On 29 October 2013 12:19, Stian Soiland-Reyes
<soiland-reyes@cs.manchester.ac.uk> wrote:
> Cheers. A Clojure implementation sounds very promising and something I
> would personally be interested in!
>
> You might want to also look at the PROV Toolbox for Java - so you
> don't need to focus on the different serializations - see
> https://github.com/lucmoreau/ProvToolbox/
>
>
> Can I refer to your questions and your name in a blog post? I am
> thinking to write up this for my blog at
> https://practicalprovenance.wordpress.com
>
>
>
> On 29 October 2013 12:15, Mike Loll <mike.loll@gmail.com> wrote:
>> Thank you very much.  The answers I've received are great.  I hope I can
>> contribute back somehow!
>>
>> --
>> Mike Loll
>>
>>
>> On Tue, Oct 29, 2013 at 8:10 AM, Stian Soiland-Reyes
>> <soiland-reyes@cs.manchester.ac.uk> wrote:
>>>
>>> This is a very good question. I am not sure what this relates to
>>> bundles - except that perhaps you want to describe the longer-living
>>> entities in a different bundle (e.g. "the alarm database") from the
>>> more short-lived entities (e.g. "alarm events this week").
>>>
>>>
>>> In PROV we describe entities as in one way or another being 'static'.
>>> In your case, there are two abstraction levels of how 'static' an
>>> alarm is.
>>>
>>> <alarm/1> a prov:Entity, ex:Alarm ;
>>>    prov:atLocation <customer/5> .
>>>
>>> We here consider the alarm over its lifetime at a given customer, no
>>> matter its current state. So we can describe its installation date as
>>> its provenance:
>>>
>>> <alarm/1> prov:generatedAtTime "1984-05-15-02T17:19:41.146" .
>>>
>>> We can also of course list properties that are more fluctuating and
>>> might change during its lifetime:
>>>
>>> <alarm/1> ex:currentStatus "active" ;
>>>       ex:brightness 0.80 ;
>>>       ex:noiseLevel 0.50 .
>>>
>>> If I retrieve the same resource later today, this might instead be:
>>>
>>> <alarm/1> ex:currentStatus "disabled" ;
>>>       ex:brightness 0.20 ;
>>>       ex:noiseLevel 0.89 .
>>>
>>> Now what if we wanted to know how it changed from active to disabled,
>>> but don't really care about all the possible levels of brightness and
>>> noise it had inbetween? Then it might make sense to specialize the
>>> alarm entity to what we would in common language probably just call
>>> "alarm state". It is still describing the alarm, but at a finer
>>> granularity. :
>>>
>>> <alarm/1/state/123> a prov:Entity, ex:AlarmState ;
>>>   prov:specializationOf <alarm/1> ;
>>>   ex:currentStatus "active" ;
>>>   prov:generatedAtTime "2013-10-28T18:00:00Z"
>>>   prov:invalidatedAtTime "2013-10-28T23:50:00Z"
>>>
>>> <alarm/1/state/124> a prov:Entity, ex:AlarmState ;
>>>   prov:specializationOf <alarm/1> ;
>>>   ex:currentStatus "disabled" ;
>>>   prov:generatedAtTime "2013-10-28T23:50:00Z" .
>>>
>>> We might specify a new subclass "ex:AlarmState" that we know 'locks
>>> down' the state - this would allow different kind of specialization,
>>> in case you also needed a specialization like ex:BrightnessLog.
>>>
>>> Each state has a different generation and invalidation time,
>>> indicating the life span of the state. This is a continuous span, so
>>> the alarm state that was disabled last week is different from the
>>> disabled alarm state today, because the alarm was active in the mean
>>> time.
>>>
>>>
>>> You might want to organize these states in an order so you don't need
>>> to compare the start/end timestamps.
>>>
>>> <alarm/1/state/124> prov:wasRevisionOf <alarm/1/state/123> .
>>>
>>>
>>> So what if we want to describe who disabled it? A simple solution is
>>> to now just provide prov:wasAttributedTo at each state:
>>>
>>> <alarm/1/state/124> prov:wasAttributedTo <customer/5>.
>>>
>>>
>>> So now we know that customer/5 caused the alarm to be disabled somehow
>>> (it was not a supervisor at the security company).
>>>
>>> If you want to detail this more, say to record how the customer did
>>> this (e.g. clicking the alarm panel) - then you can introduce an
>>> activity to describe the transition:
>>>
>>> <alarm/1/state/123> prov:wasInvalidatedBy <activities/987> ;
>>> <alarm/1/state/124> prov:wasGeneratedBy <activities/987> .
>>>
>>> <activities/987> a prov:Activity, ex:AlarmPanelAction ;
>>>     prov:wasAssociatedWith <customer/5> .
>>>
>>>
>>> Now as to get back to the bundles - if you have separate bundles per
>>> week for instance of alarm activities, then you could refer back to
>>> the original bundle in your specialization as we say in
>>> http://www.w3.org/TR/prov-links/ :
>>>
>>> <alarm/1/state/124> prov:mentionOf <alarm/1> ;
>>>   prov:asInBundle <http://example.com/alarms> .
>>>
>>>
>>> In a way this is just a more formal way of saying:
>>>
>>> <alarm/1/state/124> prov:specializationOf <alarm/1> .
>>> <alarm/1> prov:has_provenance <http://example.com/alarms> .
>>>
>>> (using has_provenance from http://www.w3.org/TR/prov-aq/ ):
>>>
>>> as the mentionOf/asInBundle adds the  additional promise that you will
>>> find <alarm/1> described as an entity inside that bundle.
>>>
>>>
>>>
>>> On 29 October 2013 11:26, Mike Loll <mike.loll@gmail.com> wrote:
>>> > Thanks, Stian.
>>> >
>>> > My understanding is that an entity referenced in a bundle (e.g. via
>>> > wasGeneratedBy) must be in the bundle...but I do not wish to duplicate
>>> > entity definitions through out my bundles.  My entities are long lived
>>> > and
>>> > will exist in multiple bundles.
>>> >
>>> > So lets say I have a resource for alarms which contains a list of all
>>> > alarms
>>> > my company monitors.  If I turn off the alarm at /alarm/1, my
>>> > understanding
>>> > is that in prov a new entity is created for the new state of /alarm/1.
>>> > But
>>> > in my actual data store, I don't create a new record, I just toggle a
>>> > flag.
>>> >
>>> > So there is a disconnect between how my prov looks and how my data
>>> > looks.
>>> > This is by design is my understanding.  So I would have a new entity in
>>> > my
>>> > prov for the /alarm/1 in the new state which is a specialization of
>>> > /alarm/1, yes?
>>> >
>>> > Ultimately, I want to display all of the provenance for /alarm/1 so I
>>> > can
>>> > see its history from creation to invalidation.  Am I going about this
>>> > the
>>> > wrong way?
>>> >
>>> >
>>> > --
>>> > Mike Loll
>>> >
>>> >
>>> > On Mon, Oct 28, 2013 at 9:54 AM, Stian Soiland-Reyes
>>> > <soiland-reyes@cs.manchester.ac.uk> wrote:
>>> >>
>>> >> Hi!
>>> >>
>>> >> I would say that any resource that contains provenance statements (in
>>> >> particular PRO statements) is a prov:Bundle. However that fact might
>>> >> not be recorded anywhere, and it would generally only be used as a
>>> >> term when you want to describe provenance of provenance records, or if
>>> >> you are cataloguing provenance traces.
>>> >>
>>> >>
>>> >> In my application I report the provenance of a scientific workflow run.
>>> >>
>>> >> When I save this provenance to a file, it includes its own
>>> >> meta-provenance so you can tell how and when this file was recorded,
>>> >> as it could have been saved from the internal database at an arbitrary
>>> >> time after the run.
>>> >>
>>> >> In RDF this is normally quite easy by simply describing the relative
>>> >> URI <> which would mean "this document" - wherever it ends up being
>>> >> located:
>>> >>
>>> >>
>>> >> <> a prov:Bundle ;
>>> >>       foaf:primaryTopic
>>> >>
>>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>>> >> ;
>>> >>       prov:wasGeneratedBy       <#taverna-prov-export> .
>>> >>
>>> >> <#taverna-prov-export> a prov:Activity ;
>>> >>        rdfs:label                   "taverna-prov export of workflow
>>> >> run provenance"@en ;
>>> >>         prov:startedAtTime
>>> >> "2013-09-02T15:22:25.961Z"^^xsd:dateTime ;
>>> >>         prov:endedAtTime
>>> >> "2013-09-02T15:22:30.89Z"^^xsd:dateTime ;
>>> >>         prov:wasInformedBy
>>> >>
>>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>>> >> ;
>>> >>         prov:wasAssociatedWith       <#taverna-engine> ;
>>> >>         prov:qualifiedAssociation    [ a prov:Association ;
>>> >>             prov:hadPlan
>>> >> <http://ns.taverna.org.uk/2011/software/taverna-2.4.0> ;
>>> >>             prov:agent    <#taverna-engine>
>>> >>         ] .
>>> >>
>>> >>
>>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>>> >> a prov:Activity, wfprov:WorkflowRun ;
>>> >>    rdfs:label                  "Workflow run of
>>> >> GWAS_to_biomedical_c"@en ;
>>> >>    prov:startedAtTime
>>> >> "2013-09-02T17:19:31.676+02:00"^^xsd:dateTime ;
>>> >>    prov:endedAtTime
>>> >> "2013-09-02T17:20:00.662+02:00"^^xsd:dateTime
>>> >> .
>>> >>
>>> >> # .. followed by the actual workflow run provenance with many more
>>> >> activities and nested wfprov:WorkflowRuns
>>> >>
>>> >>
>>> >> I am not saying that everyone should include this meta-provenance as
>>> >> in many cases it would be self-evident or not relevant - but in my
>>> >> case it is important for three reasons.
>>> >>
>>> >> 1 - I can see the version of the software used to generate the
>>> >> provenance (as I am still developing that)
>>> >> 2 - I can see when provenance was exported compared to when it was
>>> >> run. In this case just 2 minutes after - and hence I can be fairly
>>> >> certain about statements the provenance trace makes about generated
>>> >> files etc.
>>> >> 3 - I use foaf:primaryTopic (my own convention - which makes <> also
>>> >> be a foaf:Document) to find the top-level "starting point" of the
>>> >> provenance. (this is also indicated with the slightly weaker relation
>>> >> prov:wasInformedBy)
>>> >>
>>> >>
>>> >>
>>> >> On 28 October 2013 11:26, Mike Loll <mike.loll@gmail.com> wrote:
>>> >> > I'm having some difficulty wrapping my head around when bundles would
>>> >> > be
>>> >> > used.  Is it so we can describe how a set of provenance records came
>>> >> > to
>>> >> > be
>>> >> > (the provenance of the provenance)?
>>> >> >
>>> >> > I'm having a little difficulty wrapping my head around the use cases.
>>> >> >
>>> >> > Example 40 from
>>> >> > http://www.w3.org/TR/2013/REC-prov-dm-20130430/#component4
>>> >> > shows two reports (r1, r2) being generated with r2 derived from r1.
>>> >> > It
>>> >> > then
>>> >> > describes a bundle describing that "Bob" witnessed r1 being
>>> >> > generated.
>>> >> > The
>>> >> > example goes on to show a bundle for "Alice" observing the generation
>>> >> > of
>>> >> > r2.
>>> >> >
>>> >> > How is this useful?  I think my real question is shouldn't all
>>> >> > provenance
>>> >> > events be contained in a bundle?
>>> >> >
>>> >> > Any insight is appreciated.
>>> >> >
>>> >> > I'm working on a clojure implementation of the provenance model as an
>>> >> > exercise and I want to be sure I have my understanding set.
>>> >> >
>>> >> > Thanks.
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Mike Loll
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Stian Soiland-Reyes, myGrid team
>>> >> School of Computer Science
>>> >> The University of Manchester
>>> >> http://soiland-reyes.com/stian/work/
>>> >> http://orcid.org/0000-0001-9842-9718
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Stian Soiland-Reyes, myGrid team
>>> School of Computer Science
>>> The University of Manchester
>>> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
>>
>>
>
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Tuesday, 29 October 2013 15:23:09 UTC