- From: Tim Clark <tim_clark@harvard.edu>
- Date: Tue, 9 May 2006 13:48:40 -0400
- To: larry.hunter@uchsc.edu, public-semweb-lifesci@w3.org
Larry & Group I personally see the key elements here as placement of data publication in semantic relationship to *all the other elements of the knowledge lifecycle*, and supporting use cases that scientists would actually use, rather than semantic structure of the published data itself. AFAIK data is basically never published except as supporting evidence for a publication. There are many variants on how this is done, but generally they are ad-hoc and none (again afaik) use formal semantics. I would also claim that data is *not useful* unless linked to and explicated by a publication by those who derived it. You know: "Hypothesis", "Materials and Methods", "Interpretation", etc. all the stuff you find in peer-reviewed papers (not to mention "References")!! This is what we are attempting to work out in the SWAN project, for one research community, in basically a very simple way, without imposing any internal structure on the data itself. The goal is rather different from reasoning across heterogeneous sets of data...it is much more about "finding useful stuff for my research I didn't know about already". Which is why we deliberately treat the contents of published data files as semantically opaque. This is not to say that we would object to someone exposing contents structured in semantically rigorous way, just that we think it has more limited usefulness. People make the very deceptive analogy to what was done in creating and using the various DAN sequence databases, but that was quite a special situation, and I suspect not so generally applicable across science. Many reasons for this...but even microarrays, I would assert, are not reasonable to compare across experiments and experimenters except with extreme caution and very thorough understanding of the experimental design and conditions. In my opinion, *re-implementing* a huge unwieldy square wheel in a nice simple round version is not always so bad. :-) :-) Best Tim On May 9, 2006, at 12:14 PM, Larry Hunter wrote: > > On Mon, 2006-05-08 at 22:53 -0700, AJ Chen wrote: > >> Proposed task: Distributed self-publishing of experiments > >> 1. Ontology for publishing projects and experiments. There are >> some domain-specific ontologies, such as microarray >> experiment >> ontology, already existed today. This task is intended to >> develop a general purpose ontology for describing projects >> and >> experiments in such a way that search and comparison of >> components of experiments is possible. > > Please, please do not reinvent this wheel. There are already several > existing ontologies for describing experiments and scientific > projects. > In addition to the MGED microarray ontology (see > http://mged.sourceforge.net/ontologies/index.php) there is also the > National Cancer Institute caBIG efforts in controlled vocabularies and > common data elements (see https://cabig.nci.nih.gov/workspaces/VCDE/). > And there are several entries in OBO (the open biological ontologies > project) that are relevant, e.g. biological imaging methods and > evidence > codes): > > http://obo.sourceforge.net/cgi-bin/table.cgi > > Although I couldn't find it in a few minutes of poking around, I think > the myGrid folks also developed one, in RDF (iirc). May Alan Rector > could update us on that one. > > Larry >
Received on Wednesday, 10 May 2006 18:28:20 UTC