Re: Task proposal: Distributed self-publishing of experiments from Tim Clark on 2006-05-09 (public-semweb-lifesci@w3.org from May 2006)

From: Tim Clark <tim_clark@harvard.edu>
Date: Tue, 9 May 2006 13:48:40 -0400
To: larry.hunter@uchsc.edu, public-semweb-lifesci@w3.org
Message-Id: <C66805E0-D02A-4091-A4EE-8F1DEE94687F@harvard.edu>
Larry & Group

I personally see the key elements here as placement of data  
publication in semantic relationship to *all the other elements of  
the knowledge lifecycle*, and supporting use cases that scientists  
would actually use, rather than semantic structure of the published  
data itself.

AFAIK  data is basically never published except as supporting  
evidence for a publication.  There are many variants on how this is  
done, but generally they are ad-hoc and none (again afaik) use formal  
semantics.

I would also claim that data is *not useful* unless linked to and  
explicated by a publication by those who derived it.  You know:  
"Hypothesis", "Materials and Methods", "Interpretation", etc. all the  
stuff you find in peer-reviewed papers (not to mention "References")!!

This is what we are attempting to work out in the SWAN project, for  
one research community, in basically a very simple way, without  
imposing any internal structure on the data itself.  The goal is  
rather different from reasoning across heterogeneous sets of  
data...it is much more about "finding useful stuff for my research I  
didn't know about already".  Which is why we deliberately treat the  
contents of published data files as semantically opaque.

This is not to say that we would object to someone exposing contents  
structured in semantically rigorous way, just that we think it has  
more limited usefulness. People make the very deceptive analogy to  
what was done in creating and using the various DAN sequence  
databases, but that was quite a special situation, and I suspect not  
so generally applicable across science.  Many reasons for this...but  
even microarrays, I would assert, are not reasonable to compare  
across experiments and experimenters except with extreme caution and  
very thorough understanding of the experimental design and conditions.

In my opinion, *re-implementing* a huge unwieldy square wheel in a  
nice simple round version is not always so bad.  :-) :-)

Best

Tim
On May 9, 2006, at 12:14 PM, Larry Hunter wrote:

>
> On Mon, 2006-05-08 at 22:53 -0700, AJ Chen wrote:
>
>> Proposed task: Distributed self-publishing of experiments
>
>>      1. Ontology for publishing projects and experiments.  There are
>>         some domain-specific ontologies, such as microarray  
>> experiment
>>         ontology, already existed today.  This task is intended to
>>         develop a general purpose ontology for describing projects  
>> and
>>         experiments in such a way that search and comparison of
>>         components of experiments is possible.
>
> Please, please do not reinvent this wheel.  There are already several
> existing ontologies for describing experiments and scientific  
> projects.
> In addition to the MGED microarray ontology (see
> http://mged.sourceforge.net/ontologies/index.php) there is also the
> National Cancer Institute caBIG efforts in controlled vocabularies and
> common data elements (see https://cabig.nci.nih.gov/workspaces/VCDE/).
> And there are several entries in OBO (the open biological ontologies
> project) that are relevant, e.g. biological imaging methods and  
> evidence
> codes):
>
>   http://obo.sourceforge.net/cgi-bin/table.cgi
>
> Although I couldn't find it in a few minutes of poking around, I think
> the myGrid folks also developed one, in RDF (iirc).  May Alan Rector
> could update us on that one.
>
> Larry
>
Received on Wednesday, 10 May 2006 18:28:20 UTC