Re: Reproducible software experiments through semantic configurations from Idafen Santana Pérez on 2017-05-19 (public-lod@w3.org from May 2017)

From: Idafen Santana Pérez <isantana@fi.upm.es>
Date: Fri, 19 May 2017 17:15:02 +0200
To: Ruben Taelman <ruben.taelman@ugent.be>
Cc: public-lod <public-lod@w3.org>
Message-ID: <CAHrn7Z8aeo4bjoq0Gn35W0rHft9RvUBXH=bkdKoEnPJ+dqofsg@mail.gmail.com>

Hi Ruben,

thanks for sharing your paper. The base idea of publishing LD dataset
containing
description of software packages and their dependencies is really helpful
for dealing with reproducibility, specially from the developer point of
view in this case, which I assume is the main target of your approach, as
discussed on the paper. During my PhD I also explored how semantic
technologies can be applied for experimental reproducibility, focusing on
scientific workflows [1], but we didn't cover its publication as proper
Linked Data, which in my opinion is a really strong point of your work.

In our case we developed a more generic approach, not restricted to one
technology or software framework, so at the end we had to rely on scripts
for deploying and generic parameters for the infrastructure configuration.
We developed a set of ontologies [2], for describing concept on a more
top-level manner, as we assumed that in general, most scientist using
computational tools don't have the required development skills. We applied
it over several computational experiments, belonging to different
scientific areas, testing how they allow to reproduce the experiments.

As I said, I think this is a great contribution for supporting semantic
descriptions of experiments, and I would like to see more papers using this
kind of initiatives coming in the future, not only within our community,
but also in those not related to the semantic web or computational science
in general.

I will also try to add some comments/question on some concrete parts of the
paper itself.

Regards,
Idafen

[1] http://dx.doi.org/10.1016/j.future.2015.12.017
[2] http://purl.org/net/wicus

On Fri, May 19, 2017 at 9:55 AM, Ruben Taelman <ruben.taelman@ugent.be>
wrote:

> Dear all,
>
> Some of you may recognise the following problem:
>
> Let’s say you just read an article that is based on a software-driven
> experiment,
> and want to reproduce the results.
> While the article mentions what software it uses,
> it doesn’t mention the versions of that software and its dependencies,
> or the configuration that was used to run the experiment.
> Yet, these are essential details for reproducing experimental results,
> as a slightly different configuration might lead to significantly
> different reslts.
>
> That is why we (Joachim Van Herwegen, Sarven Capadisli, Ruben Verborgh and
> myself),
> decided to eat our own dogfood by describing and publishing these software
> configurations as Linked Data.
>
> In reply to the ISWC 2017 call for in-use papers,
> we wrote an article titled:
> Reproducible software experiments through semantic configurations
>
> In this work, we introduce ontologies to describe software components
> and their configurations to facilitate reproducible software experiments.
> For semantic interlinking between these components and their
> configurations,
> we publish the the metadata of all 480,000+ JavaScript libraries on npm as
> 174,000,000+ RDF triples [1].
> Furthermore, we introduce a dependency injection framework [2] that
> understands these configurations
> and is able to instantiate software based on this.
>
> This article is self-published on:
> https://linkedsoftwaredependencies.org/articles/reproducibility/
>
> Public reviews, feedback or other comments on the article itself are
> welcome.
> This can be done by signing in and commenting with your WebID, which is
> powered by dokieli [3].
>
> [1] https://fragments.linkedsoftwaredependencies.org/npm
> [2] https://github.com/LinkedSoftwareDependencies/Components.js
> [3] https://dokie.li/
>
> Kind regards,
> Ruben Taelman
>

-- 
PhD, Ontology Engineering Group

Received on Friday, 19 May 2017 15:15:57 UTC