Re: Reproducible software experiments through semantic configurations

On Fri, 19 May 2017 16:42:29 +0100, Hugh Glaser <hugh@glasers.org> wrote:
> In case the cultural memory is lost, so people don't remember this.
> 
> https://www.myexperiment.org is a Workflow management tool that supports Linked Data; it ran for many years, finishing a while ago, and seems to be impressive in that it is still being used (see https://www.myexperiment.org/workflows/4984 for the latest).
> I am nothing to do with it, so may be wrong about:
> It morphed into the Taverna project, which is now in Apache - https://en.wikipedia.org/wiki/Apache_Taverna, but I think the Linked Data may have got a bit lost on the way.
> However, the latest developments (https://taverna.incubator.apache.org/documentation/scufl2/ ) seem to suggest they are getting Linked Data capabilities again.

Hi, thanks for remembering us :)  Adjust your screens, another
Soiland-Reyes rant below.. (If only I was able to express such in
paper-style instead!)


Ruben does cite our https://doi.org/10.1016/j.websem.2015.01.003 which
shows Research Objects used with myExperiment to preserve (in
particular) workflows.

myExperiment.org is still up and running - but we are not adding any new
functionality at the moment. (Insert $ in slot) - it's used for more
than 20 different workflow systems -- somehow SPARQL is in there too :)


We expose RDF metadata about the workflow entries on 
http://rdf.myexperiment.org/ -- note that this is mainly its own
vocabulary and have not been updated to use any of the newer approaches.
Where supported it also includes inner details and annotations from the
uploaded workflows (e.g. listing of steps).


I don't think it's fair to say myXP morphed into Taverna (although it
has myXP support built-in, but so do RapidMiner). Rather 
Research Objects evolved from what we learnt with myExperiment and
Taverna, but is neutral to the execution technology, and 
aggregate both RDF annotations, local and remote resources. 

Recent work from a NIH large scale genome sequencing perspective
have moved into capturing the manifest and annotations as part of a "Big
Data Bag" (based on BagIt, which is popular in library communities) -
which adds long-term archival identifiers for the resources:
https://static.aminer.org/pdf/fa/bigdata2016/BigD418.pdf


See http://www.researchobject.org/publications/ for more, and feel
free to come chat at https://gitter.im/ResearchObject/ResearchObject :)


-

Apache Taverna has correctly got a RDF-based workflow definition format
SCUFL2 (in fact XML that just happens to be RDF/XML ... - no, don't do
this at home, kids) - but that is specific to our workflow engine and
such RDF resources are mainly useful as annotation targets (e.g. "What
happens in this part of the workflow").

BTW, we had fun with how to generate identifiers here, as 
workflows lives as ZIP files on random desktops and servers, see
https://taverna.incubator.apache.org/ns/ 
how we solved that for SCUFL2, and 
https://w3id.org/bundle/#absolute-uris for Research Object Bundles.


Research Object defined the wfdesc model - 
https://w3id.org/ro/2016-01-28/wfdesc - this can be useful to describe a
dataflow in abstract as RDF (e.g. even a shellscript). 

https://w3id.org/ro/2016-01-28/wfprov shows how to describe history of a
particular run of a wfdesc model in PROV.


Common Workflow Language http://www.commonwl.org/ borrows the wfdesc
model, but is executable (again) and with a strong focus on
reproducibility, portability and reuse, e.g. with Docker. Multiple
workflow engines have or are building CWL support - if you are doing
computational workflows do have a look!

CWL workflows can also be exported as RDF, try 
"cwltool --print-rdf" or see 
https://github.com/common-workflow-language/workflows#sparql


-- 
Stian Soiland-Reyes
The University of Manchester
http://www.esciencelab.org.uk/
http://orcid.org/0000-0001-9842-9718

Received on Thursday, 25 May 2017 15:04:18 UTC