- From: M. Scott Marshall <mscottmarshall@gmail.com>
- Date: Thu, 19 Aug 2010 17:30:40 -0700
- To: Tom Morris <tfmorris@gmail.com>
- Cc: HCLS <public-semweb-lifesci@w3.org>, ostolop@ebi.ac.uk, James Malone <malone@ebi.ac.uk>, Helen Parkinson <parkinson@ebi.ac.uk>
I agree Tom with the need to record the provenance that you can. That's why it's particularly useful that James Malone (CC'd) told us about the Software Ontology http://www.ebi.ac.uk/efo/swo that is being developed in the context of the Experimental Factor Ontology work at EBI, quoting from the web page: <QUOTE> The software ontology (SWO) was a project initiated by Dr Helen Parkinson and Dr James Malone at EBI and implemented by Nandini Badarinarayan to describe software used in bioinformatics. The SWO describes components of software such as the software type, the manufacturer, the data inputs and outputs and the objectives of the software. The SWO uses a slim version of the Basic Formal Ontology (BFO) upper ontology and subclasses and relations from the Experimental Factor Ontology (EFO), the Ontology of Biomedical Investigations (OBI) and the Information Artifact Ontology (IAO). </QUOTE> For example, the identifier for (the BioConductor implementation of) LIMMA in SWO is http://www.ebi.ac.uk/efo/swo/SWO_0000593 . If you find the above identifier along with a gene list that is associated with an microarray study article (imagine for a moment that a gene list is provided in the associated MAGE-TAB of the data), it is far better than having to guess at the gene list yourself or having to read the article to decide if you want to use the gene list. Suppose that you 1) have access to gene lists and 2) prefer to only make use of genelists produced by LIMMA. Then, you can encode your inclusion criteria into a SPARQL query. BTW, Gene Atlas http://www.ebi.ac.uk/gxa/ provides gene lists that have been uniformly selected using LIMMA from a subset of ArrayExpress. Currently, it is possible to access some of the service output as, for example, a list of strings in JSON format. Misha Kapushesky (CC'd) and colleagues are interested in eventually providing RDF renderings of the data as well. -Scott On Thu, Jul 22, 2010 at 1:49 PM, Tom Morris <tfmorris@gmail.com> wrote: > This discussion about provenance: > > "Lena: But software packages change so any reference to the software > will be stale over the years. > > "Scott: Many types of provenance will go stale but essential > information about the origins of the information (provenance), such as > the method used to produce the p-values, is important to anyone > reusing the data. They want to know whether it's from LIMMA or MANOVA, > just as they want to know Affy vs. other types of arrays." > > seems to assume that provenance information will unavoidably get stale. > > I don't think that needs to be the case. With a little forethought, I > think one can collect enough information that you have a good chance > of unambiguously identifying something like a software package. If > rather than "LIMMA" you record something like "LIMMA v3.4.4 Windows > 64-bit" (or even better, a structured version of that), you should be > able to trace even things which are version specific or > platform/compiler specific. If the package has multiple methods that > might have been used for a task, include a reference to the > method/process also. > > Tom -- M. Scott Marshall, W3C HCLS IG co-chair Leiden University Medical Center / University of Amsterdam http://staff.science.uva.nl/~marshall
Received on Friday, 20 August 2010 00:31:08 UTC