Re: Bundles explained from Stian Soiland-Reyes on 2013-10-29 (public-prov-comments@w3.org from October 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Tue, 29 Oct 2013 12:19:03 +0000
To: Mike Loll <mike.loll@gmail.com>, "public-prov-comments@w3.org" <public-prov-comments@w3.org>
Message-ID: <CAPRnXtnBy2Mp9EPnX2U0-T7hUJife3eV_Si9StWfUjqeedpTNw@mail.gmail.com>
Cheers. A Clojure implementation sounds very promising and something I
would personally be interested in!

You might want to also look at the PROV Toolbox for Java - so you
don't need to focus on the different serializations - see
https://github.com/lucmoreau/ProvToolbox/


Can I refer to your questions and your name in a blog post? I am
thinking to write up this for my blog at
https://practicalprovenance.wordpress.com



On 29 October 2013 12:15, Mike Loll <mike.loll@gmail.com> wrote:
> Thank you very much.  The answers I've received are great.  I hope I can
> contribute back somehow!
>
> --
> Mike Loll
>
>
> On Tue, Oct 29, 2013 at 8:10 AM, Stian Soiland-Reyes
> <soiland-reyes@cs.manchester.ac.uk> wrote:
>>
>> This is a very good question. I am not sure what this relates to
>> bundles - except that perhaps you want to describe the longer-living
>> entities in a different bundle (e.g. "the alarm database") from the
>> more short-lived entities (e.g. "alarm events this week").
>>
>>
>> In PROV we describe entities as in one way or another being 'static'.
>> In your case, there are two abstraction levels of how 'static' an
>> alarm is.
>>
>> <alarm/1> a prov:Entity, ex:Alarm ;
>>    prov:atLocation <customer/5> .
>>
>> We here consider the alarm over its lifetime at a given customer, no
>> matter its current state. So we can describe its installation date as
>> its provenance:
>>
>> <alarm/1> prov:generatedAtTime "1984-05-15-02T17:19:41.146" .
>>
>> We can also of course list properties that are more fluctuating and
>> might change during its lifetime:
>>
>> <alarm/1> ex:currentStatus "active" ;
>>       ex:brightness 0.80 ;
>>       ex:noiseLevel 0.50 .
>>
>> If I retrieve the same resource later today, this might instead be:
>>
>> <alarm/1> ex:currentStatus "disabled" ;
>>       ex:brightness 0.20 ;
>>       ex:noiseLevel 0.89 .
>>
>> Now what if we wanted to know how it changed from active to disabled,
>> but don't really care about all the possible levels of brightness and
>> noise it had inbetween? Then it might make sense to specialize the
>> alarm entity to what we would in common language probably just call
>> "alarm state". It is still describing the alarm, but at a finer
>> granularity. :
>>
>> <alarm/1/state/123> a prov:Entity, ex:AlarmState ;
>>   prov:specializationOf <alarm/1> ;
>>   ex:currentStatus "active" ;
>>   prov:generatedAtTime "2013-10-28T18:00:00Z"
>>   prov:invalidatedAtTime "2013-10-28T23:50:00Z"
>>
>> <alarm/1/state/124> a prov:Entity, ex:AlarmState ;
>>   prov:specializationOf <alarm/1> ;
>>   ex:currentStatus "disabled" ;
>>   prov:generatedAtTime "2013-10-28T23:50:00Z" .
>>
>> We might specify a new subclass "ex:AlarmState" that we know 'locks
>> down' the state - this would allow different kind of specialization,
>> in case you also needed a specialization like ex:BrightnessLog.
>>
>> Each state has a different generation and invalidation time,
>> indicating the life span of the state. This is a continuous span, so
>> the alarm state that was disabled last week is different from the
>> disabled alarm state today, because the alarm was active in the mean
>> time.
>>
>>
>> You might want to organize these states in an order so you don't need
>> to compare the start/end timestamps.
>>
>> <alarm/1/state/124> prov:wasRevisionOf <alarm/1/state/123> .
>>
>>
>> So what if we want to describe who disabled it? A simple solution is
>> to now just provide prov:wasAttributedTo at each state:
>>
>> <alarm/1/state/124> prov:wasAttributedTo <customer/5>.
>>
>>
>> So now we know that customer/5 caused the alarm to be disabled somehow
>> (it was not a supervisor at the security company).
>>
>> If you want to detail this more, say to record how the customer did
>> this (e.g. clicking the alarm panel) - then you can introduce an
>> activity to describe the transition:
>>
>> <alarm/1/state/123> prov:wasInvalidatedBy <activities/987> ;
>> <alarm/1/state/124> prov:wasGeneratedBy <activities/987> .
>>
>> <activities/987> a prov:Activity, ex:AlarmPanelAction ;
>>     prov:wasAssociatedWith <customer/5> .
>>
>>
>> Now as to get back to the bundles - if you have separate bundles per
>> week for instance of alarm activities, then you could refer back to
>> the original bundle in your specialization as we say in
>> http://www.w3.org/TR/prov-links/ :
>>
>> <alarm/1/state/124> prov:mentionOf <alarm/1> ;
>>   prov:asInBundle <http://example.com/alarms> .
>>
>>
>> In a way this is just a more formal way of saying:
>>
>> <alarm/1/state/124> prov:specializationOf <alarm/1> .
>> <alarm/1> prov:has_provenance <http://example.com/alarms> .
>>
>> (using has_provenance from http://www.w3.org/TR/prov-aq/ ):
>>
>> as the mentionOf/asInBundle adds the  additional promise that you will
>> find <alarm/1> described as an entity inside that bundle.
>>
>>
>>
>> On 29 October 2013 11:26, Mike Loll <mike.loll@gmail.com> wrote:
>> > Thanks, Stian.
>> >
>> > My understanding is that an entity referenced in a bundle (e.g. via
>> > wasGeneratedBy) must be in the bundle...but I do not wish to duplicate
>> > entity definitions through out my bundles.  My entities are long lived
>> > and
>> > will exist in multiple bundles.
>> >
>> > So lets say I have a resource for alarms which contains a list of all
>> > alarms
>> > my company monitors.  If I turn off the alarm at /alarm/1, my
>> > understanding
>> > is that in prov a new entity is created for the new state of /alarm/1.
>> > But
>> > in my actual data store, I don't create a new record, I just toggle a
>> > flag.
>> >
>> > So there is a disconnect between how my prov looks and how my data
>> > looks.
>> > This is by design is my understanding.  So I would have a new entity in
>> > my
>> > prov for the /alarm/1 in the new state which is a specialization of
>> > /alarm/1, yes?
>> >
>> > Ultimately, I want to display all of the provenance for /alarm/1 so I
>> > can
>> > see its history from creation to invalidation.  Am I going about this
>> > the
>> > wrong way?
>> >
>> >
>> > --
>> > Mike Loll
>> >
>> >
>> > On Mon, Oct 28, 2013 at 9:54 AM, Stian Soiland-Reyes
>> > <soiland-reyes@cs.manchester.ac.uk> wrote:
>> >>
>> >> Hi!
>> >>
>> >> I would say that any resource that contains provenance statements (in
>> >> particular PRO statements) is a prov:Bundle. However that fact might
>> >> not be recorded anywhere, and it would generally only be used as a
>> >> term when you want to describe provenance of provenance records, or if
>> >> you are cataloguing provenance traces.
>> >>
>> >>
>> >> In my application I report the provenance of a scientific workflow run.
>> >>
>> >> When I save this provenance to a file, it includes its own
>> >> meta-provenance so you can tell how and when this file was recorded,
>> >> as it could have been saved from the internal database at an arbitrary
>> >> time after the run.
>> >>
>> >> In RDF this is normally quite easy by simply describing the relative
>> >> URI <> which would mean "this document" - wherever it ends up being
>> >> located:
>> >>
>> >>
>> >> <> a prov:Bundle ;
>> >>       foaf:primaryTopic
>> >>
>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>> >> ;
>> >>       prov:wasGeneratedBy       <#taverna-prov-export> .
>> >>
>> >> <#taverna-prov-export> a prov:Activity ;
>> >>        rdfs:label                   "taverna-prov export of workflow
>> >> run provenance"@en ;
>> >>         prov:startedAtTime
>> >> "2013-09-02T15:22:25.961Z"^^xsd:dateTime ;
>> >>         prov:endedAtTime
>> >> "2013-09-02T15:22:30.89Z"^^xsd:dateTime ;
>> >>         prov:wasInformedBy
>> >>
>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>> >> ;
>> >>         prov:wasAssociatedWith       <#taverna-engine> ;
>> >>         prov:qualifiedAssociation    [ a prov:Association ;
>> >>             prov:hadPlan
>> >> <http://ns.taverna.org.uk/2011/software/taverna-2.4.0> ;
>> >>             prov:agent    <#taverna-engine>
>> >>         ] .
>> >>
>> >>
>> >> <http://ns.taverna.org.uk/2011/run/5e93cdba-27ec-4757-addf-fc91be12c7a4/>
>> >> a prov:Activity, wfprov:WorkflowRun ;
>> >>    rdfs:label                  "Workflow run of
>> >> GWAS_to_biomedical_c"@en ;
>> >>    prov:startedAtTime
>> >> "2013-09-02T17:19:31.676+02:00"^^xsd:dateTime ;
>> >>    prov:endedAtTime
>> >> "2013-09-02T17:20:00.662+02:00"^^xsd:dateTime
>> >> .
>> >>
>> >> # .. followed by the actual workflow run provenance with many more
>> >> activities and nested wfprov:WorkflowRuns
>> >>
>> >>
>> >> I am not saying that everyone should include this meta-provenance as
>> >> in many cases it would be self-evident or not relevant - but in my
>> >> case it is important for three reasons.
>> >>
>> >> 1 - I can see the version of the software used to generate the
>> >> provenance (as I am still developing that)
>> >> 2 - I can see when provenance was exported compared to when it was
>> >> run. In this case just 2 minutes after - and hence I can be fairly
>> >> certain about statements the provenance trace makes about generated
>> >> files etc.
>> >> 3 - I use foaf:primaryTopic (my own convention - which makes <> also
>> >> be a foaf:Document) to find the top-level "starting point" of the
>> >> provenance. (this is also indicated with the slightly weaker relation
>> >> prov:wasInformedBy)
>> >>
>> >>
>> >>
>> >> On 28 October 2013 11:26, Mike Loll <mike.loll@gmail.com> wrote:
>> >> > I'm having some difficulty wrapping my head around when bundles would
>> >> > be
>> >> > used.  Is it so we can describe how a set of provenance records came
>> >> > to
>> >> > be
>> >> > (the provenance of the provenance)?
>> >> >
>> >> > I'm having a little difficulty wrapping my head around the use cases.
>> >> >
>> >> > Example 40 from
>> >> > http://www.w3.org/TR/2013/REC-prov-dm-20130430/#component4
>> >> > shows two reports (r1, r2) being generated with r2 derived from r1.
>> >> > It
>> >> > then
>> >> > describes a bundle describing that "Bob" witnessed r1 being
>> >> > generated.
>> >> > The
>> >> > example goes on to show a bundle for "Alice" observing the generation
>> >> > of
>> >> > r2.
>> >> >
>> >> > How is this useful?  I think my real question is shouldn't all
>> >> > provenance
>> >> > events be contained in a bundle?
>> >> >
>> >> > Any insight is appreciated.
>> >> >
>> >> > I'm working on a clojure implementation of the provenance model as an
>> >> > exercise and I want to be sure I have my understanding set.
>> >> >
>> >> > Thanks.
>> >> >
>> >> >
>> >> > --
>> >> > Mike Loll
>> >>
>> >>
>> >>
>> >> --
>> >> Stian Soiland-Reyes, myGrid team
>> >> School of Computer Science
>> >> The University of Manchester
>> >> http://soiland-reyes.com/stian/work/
>> >> http://orcid.org/0000-0001-9842-9718
>> >
>> >
>>
>>
>>
>> --
>> Stian Soiland-Reyes, myGrid team
>> School of Computer Science
>> The University of Manchester
>> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Tuesday, 29 October 2013 12:19:55 UTC