Re: How to associate a Plan with a Bundle

On Tue, Jan 29, 2013 at 4:37 PM, Timothy Lebo <lebot@rpi.edu> wrote:
> The key is that the Bundle was generated by the Activity of executing the workflow, and the Activity's Association had a Plan that was followed to perform the Activity.

This kind of "upper level activity" is very similar to how we include
this kind of 'meta provenance' in the workflow engine Taverna's
provenance traces.

Just a slight change ; we make a distinction from the activities of
exporting provenance and executing the workflow, as export can be done
at an arbitrary point in the future, while the provenance data is kept
in an internal database. The reason is also that in Taverna, the
workflow definition does not say anything about making provenance
traces or where outputs should be saved.


The following is not literally exactly what comes out of the workflow
engine, as I have still to massage to Sesame's AliBaba to be a bit
more verbose about class membership and superproperties. (It will not
bother declaring class membership when it's given by domain/range of a
property).


Note that we use two extensions of PROV:

* <http://purl.org/wf4ever/wfprov#> (An attempt to form a general
dataflow provenance model, see <http://purl.org/wf4ever/model> for
pretty picture)
* <http://ns.taverna.org.uk/2012/tavernaprov/> (which is Taverna specialized)

The first might be relevant for you if your workflow system is
dataflow oriented, but it would need further work if you need to cover
BPEL-like or Keppler-like workflows.


Example, based on
https://github.com/wf4ever/provenance-corpus/blob/master/Taverna_repository/workflow_3152_version_1/run_1/workflowrun.prov.ttl

### Self-documenting metadata - we don't bother with named graphs for
this purpose

<> a prov:Bundle ;
    prov:wasGeneratedBy :taverna-prov-export .

# This is the activity that has generated all the files in
https://github.com/wf4ever/provenance-corpus/tree/master/Taverna_repository/workflow_3152_version_1/run_1

:taverna-prov-export a prov:Activity ;
        rdfs:label "taverna-prov export of workflow run provenance" ;
 prov:wasAssociatedWith :taverna-engine ;
 prov:qualifiedAssociation [
            a prov:Association ;
            prov:agent :taverna-engine ;
            # For the export, the plan is the workflow engine software
            prov:hadPlan
<http://ns.taverna.org.uk/2011/software/taverna-2.4.0> .
        ]
 prov:startedAtTime "2012-10-05T14:14:08.171+02:00"^^xsd:dateTime ;
 prov:endedAtTime "2012-10-05T14:14:37.750+02:00"^^xsd:dateTime ;
        # The link to the workflow run activity which provided the
provenance data
 prov:wasInformedBy
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/>
.

#### End of metadata

:taverna-engine a prov:SoftwareAgent, wfprov:WorkflowEngine, foaf:Agent .

# We don't say much about the engine instance at all as we found it
difficult to identify it (our engine run on desktop computers, servers
and cloud instances and hence don't easily have a resolvable URI), and
also difficult to scope (Is it one execution of a particular workflow
(which technically is how the "engine" is instantiated in Taverna), an
execution of the software code (an Operating system process), one
installation on a particular machine, that particular software/plugin
combination on any machine, or any version of Taverna software?)


# The upper workflow run, corresponding to the master workflow
# We have discussed making an even higher-level activity, which would
cover "Starting the workflow run" which would not have a plan, and
would be associated with the user who clicked the Run button and cover
further actions beyond the workflow definition, such as saving of
outputs.

<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/>
a wfprov:WorkflowRun, prov:Activity ;
 rdfs:label "Workflow run of Workflow21" ;
 prov:startedAtTime "2012-10-05T14:13:14.921+02:00"^^xsd:dateTime ;
 prov:endedAtTime "2012-10-05T14:13:17.281+02:00"^^xsd:dateTime ;

        # The wfprov shortcut for showing the plan in the below association
  wfprov:describedByWorkflow
<http://ns.taverna.org.uk/2010/workflowBundle/9f925cdf-98b9-4034-8d25-6b44b15a4635/workflow/Workflow21/>
;

 wfprov:wasEnactedBy :taverna-engine ;
 prov:wasAssociatedWith :taverna-engine ;
 prov:qualifiedAssociation [
     a prov:Association ;
     prov:agent :taverna-engine ;
            # This is the identifier for the workflow definition.
            # Note: A workflowBundle is not a prov:Bundle, btw, it's
just a zip of RDFs
     prov:hadPlan
<http://ns.taverna.org.uk/2010/workflowBundle/9f925cdf-98b9-4034-8d25-6b44b15a4635/workflow/Workflow21/>
.
        ] ;

        # The PROV WG recommendation for associating this activity
with 'sub activities'
        dcterms:hasPart
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/process/1c4240de-4217-4fb7-b2f5-11626f584071/>
, <http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/process/d0a6455a-7c47-42f8-9bc7-8515bcd445aa/>
.


# An execution of a particular step/process in the workflow
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/process/d0a6455a-7c47-42f8-9bc7-8515bcd445aa/>
 a wfprov:ProcessRun, prov:Activity ;
 rdfs:label "Processor execution Beanshell (facade0:Workflow21:Beanshell)" ;
 prov:startedAtTime "2012-10-05T14:13:17.171+02:00"^^xsd:dateTime ;
 prov:endedAtTime "2012-10-05T14:13:17.250+02:00"^^xsd:dateTime ;

        # A stronger link to the master run than dcterms:hasPart above
        wfprov:wasPartOfWorkflowRun
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/>
;

 wfprov:usedInput
<http://ns.taverna.org.uk/2011/data/f975fc2f-cb00-4421-874f-b123f8d11998/ref/52530175-ee7c-467a-aa7e-137289540b6a>
;
 prov:used <http://ns.taverna.org.uk/2011/data/f975fc2f-cb00-4421-874f-b123f8d11998/ref/52530175-ee7c-467a-aa7e-137289540b6a>
.

        # Link to the plan (definition) for this particular process
        # Note: We distinguish between the activity of a particular
execution of the process,
        # and the process/service definition, which would remain the same across
        # multiple executions of the same workflow definition.
        wfprov:describedByProcess
<http://ns.taverna.org.uk/2010/workflowBundle/9f925cdf-98b9-4034-8d25-6b44b15a4635/workflow/Workflow21/processor/Beanshell/>
;
        prov:wasAssociatedWith :taverna-engine ;
 prov:qualifiedAssociation [
     a prov:Association ;
            # We consider each step of a workflow to be run by the same engine,
            # not by the parent activity.
     prov:agent :taverna-engine ;
     prov:hadPlan
<http://ns.taverna.org.uk/2010/workflowBundle/9f925cdf-98b9-4034-8d25-6b44b15a4635/workflow/Workflow21/processor/Beanshell/>
.
 ] .


This allows us to make workflow-level inputs and outputs to be
generated/used by both the upper workflow run and the individual
processes.

<http://ns.taverna.org.uk/2011/data/f975fc2f-cb00-4421-874f-b123f8d11998/ref/a2282da2-fc5a-47e5-ad0e-106adaec7715>
a prov:Entity ;
        tavernaprov:content <Beanshell_startTimeRange.txt> ;
 wfprov:wasOutputFrom
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/>
, <http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/process/d0a6455a-7c47-42f8-9bc7-8515bcd445aa/>
;
 prov:wasGeneratedBy
<http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/>
, <http://ns.taverna.org.uk/2011/run/f975fc2f-cb00-4421-874f-b123f8d11998/process/d0a6455a-7c47-42f8-9bc7-8515bcd445aa/>
.



>
> A quick example:
>
> :workflow_plan_1 a bpl:Workflow .  # bpl namespace is made up; I'm not a workflow guy.
>
> :workflow_engine_5 a bpl:WorkflowEngine .
>
> :workflow_execution_17
>   a prov:Activity;
>   prov:wasAssociatedWith :workflow_engine_5;
>   prov:qualifiedAssociation [ # Don't use bnodes in practice.
>       a prov:Association;
>       prov:hadPlan :workflow_plan_1;
>       prov:agent :workflow_engine_5;
>  ];
> .
>
> :my_bundle {  # Note that the PROV recommendation say nothing about _how_ one associates provenance assertions to a bundle. Named graphs is one way.
>     :my_bundle
>            a prov:Bundle;
>            prov:wasGeneratedBy :workflow_execution_17;  # This is the link from a Bundle to a Plan (via an Activity's Association).
>     .
>     :cake a prov:Entity;
>         prov:wasAttributedTo :jacco . # Whatever your execution engine wanted to say….
> }
>
>
> Regards,
> Tim
>
>
>
>> Thanks,
>>
>> Jacco
>>
>>
>>
>>
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Wednesday, 30 January 2013 10:00:42 UTC