Re: Experiment Ontology from Bill Bug on 2007-12-07 (public-semweb-lifesci@w3.org from December 2007)

From: Bill Bug <wbug@ncmir.ucsd.edu>
Date: Thu, 6 Dec 2007 23:16:40 -0500
To: Susie Stephens <STEPHENS_SUSIE_M@LILLY.COM>
Cc: Matthias Samwald <samwald@gmx.at>, "public-semweb-lifesci@w3.org hcls" <public-semweb-lifesci@w3.org>, Kei Cheung <kei.cheung@yale.edu>, "Karen (NIH/NIDA) [E] Skinner" <kskinner@nida.nih.gov>, Alan Ruttenberg <alanruttenberg@gmail.com>
Message-Id: <1C6CDC40-D570-4D4A-9B7D-F0A55DC3FC2D@ncmir.ucsd.edu>
Hi Susie,

We certainly do need an "Experiment Ontology" - or Ontology of  
Biomedical Investigation (OBI).

I believe Matthias, Michael, and Kei have all made exactly the points  
I think are most important to consider:
	1) Matthias's comments
		Are you following "best practices" in creating the ontology.  I  
believe Matthias gives many instructive examples on how to adjust  
what is here to bring it much more in sync with the emerging "best  
practices" that are coming out of the community development  
surrounding a variety of OBO Foundry ontologies.  Matthias also makes  
the point that its important to seek to re-use (or directly  
contribute to) the emerging community ontologies to cover the  
required domains.  In the case of this particular Experiment  
Ontology, the ontologies to consider are Ontology of Biomedical  
Investigation (OBI), the OBO Relations Ontology, the Gene Ontology  
(specifically the Molecular Function and Cellular Component branches,  
the latter of which is designed to capture components down to the  
level of macromolecular complexes), the Sequence Ontology, Protein  
Ontology (nascent - but proceeding rapidly), the Cell Ontology - at a  
minimum.  As many on this list know - and I'm certain the talented  
folks at Lilly who invested time in assembling this ontology also  
learned - many of these are not fully ready for prime-time, and/or  
may not FULLY cover the breadth and depth of the domains a specific  
application requires.  However, if one doesn't seek to work with  
these community efforts, you cannot expect to achieve the ultimately  
goal, which is to make your data maximally "semantically sticky", so  
as to ensure the least amount of custom logic and human effort will  
be required to get the most value from your data.  Otherwise, you  
stand the chance of creating what may be a useful ontology that meets  
your specific requirements (as has been true of "investigation"- 
oriented ontologies that have come before such as the MAGE Ontology,  
ExperiBase, EXPO, myGRID KAVE, etc.), but don't help the community at- 
large to appropriately re-use your data.  In each case, these  
ontologies or KR frameworks have been extremely useful in the local  
application context for which they were constructed, but they cannot  
be effectively employed as the basis for semantically-driven  
integration across data sets that may not be able to accept the  
constraints (or lack thereof) of this application-oriented ontology.
		Would you know off-hand, Susie, whether the folks who worked on  
this ontology at Lilly have both reviewed the relevant community  
efforts cited above and/or have sought to interact with those groups  
to get some input on how best to meet the overall requirements that  
underlie this particular Experiment Ontology with the minimal  
required effort and in a manner that could help to ensure Lilly's  
sunk investment could be of benefit to us all.

	2) Michael's comments
		It's very helpful to know what the target is when it comes to  
exporting/exchanging the actual data.  As Michael points out, a great  
deal of work has gone into the production of FuGE (and MaGE before  
it) to come up with the appropriate division of labor between the  
semantically-opaque, syntactical requirements as represented in a  
data model such as MaGE or FuGE and the explicit semantics as  
captured in the ontology.  For those using FuGE, as Michael states,  
in the realm of syntax, the intention for FuGE is to provide a shared  
structure for universal elements such as biomaterials, experiment  
populations/pools/groups, protocol details, reagents details, etc..   
Built on that shared, generic foundation, any specific discipline -  
e.g., microarray expression, GC-MS, FISH, MRI, etc. - can sub-class  
FuGE components and add what additional detail required in their  
discipline.  In parallel with this effort on data structure, the OBI  
ontology cooperative seeks to provide that same foundation for the  
shared semantic domains, and a clear set of recommended practices for  
how to re-use entities from other OBO Foundry ontologies such as  
ChEBI, Sequence Ontology, Protein Ontology, OBO Cell, Organism  
Taxonomy (OWL versions of NCBI Tax), etc. to specify the critical  
biomedical entities and their complex relations.  As I say above,  
these are works in progress.  For those of us who must have something  
working now, the recommended practice is to actively participate in  
these projects with an eye toward following their practice - and  
replacing any "proxy" you create in the interim with the community  
ontology, when it is ready for use.  This is what we have done in the  
BIRN ontology BIRNLex.  We actually have an OWL module called  
"BIRNLex-OBI-Proxy.owl" which we fully intend to replace with OBI  
entities, when they are ready for use.  We also have "BIRNLex- 
Investigation.owl" that builds on this "proxy" to cover entities BIRN  
researchers must capture.  We expect to eventually see the contents  
of "BIRNLex-Investigation" in OBI in some form.  We intend to  
"contribute" those elements from this OWL file directly to OBI, when  
OBI is ready for them, and we have the time work through this  
migration process.

	3) Kei's comments
		Examples - examples - examples.  This is critical.  Working through  
the example Kei cites from the NIH Neuroscience Microarray Consortium  
is a wonderful way to determine whether:
			- there are existing community ontologies that can meet the KR and  
processing requirements
			- where the gaps are in those community ontologies
			- whether the ontology you are creating effectively fills those  
gaps (if it does, that makes it very clear how the community effort  
can make effective use of your ontology)
		In regards to Gene Lists, Kei is certainly correct.  If these are  
captured through algorithmic means, it's critical to capture the  
details on that algorithm - typically both the version of the  
algorithm as well as the version of the data repository you ran it  
against.
		Also - where gene entities are concerned - there is ongoing work  
between the GO groups, the Sequence Ontology, and the Protein  
Ontology that is particularly targeted toward capturing the specific  
relations between types of genomic sequence elements and types of  
biologically active protein-based molecules (e.g., macromolecular  
complexes composed of a collection of proteins in a variety of post- 
translationally modified states - e.g., GPC receptors, ion channels,  
transporters, pathway enzymes, etc. - i.e., Rx drug targets).  These  
are the details we'll all require in order to do round-trip  
pharmacogenetics - i.e.,effects of genetic constructs on target  
susceptibility to drugs - AND - the ways in which drugs ultimately  
alter macromolecular complexes by leading to changes in gene expression.

Just my $0.02 filtering on these helpful comments from Matthias,  
Michael, and Kei.

Cheers,
Bill

On Dec 3, 2007, at 1:00 PM, Kei Cheung wrote:

>
> This is great!
>
> I have a microarray experiment description (that has to do with  
> Alzheimer Disease) extracted from NINDS microarray consortium:
>
> http://arrayconsortium.tgen.org/np2/viewProject.do? 
> action=viewProject&projectId=433773
>
> I just wonder how this example would fit this experiment ontology  
> (as well as others such as OBI) As shown in this example, we record  
> information such as organ type, organ region, cell type (layer II  
> pyramidal neuron), etc. NINDS microarry consortium uses different  
> array platforms (e.g., agilent, Affymetrix, and cDNA)  for  
> different organisms so one may need to divide chips into groups  
> corresponding to different platform types. Each group can then be  
> further divided into subgroups corresponding to different organisms.
>
> We also would like to capture gene lists (not the raw gene lists  
> but the ones (much shorter) that indicate what genes are over/under  
> expressed under certain experimental conditions). Such gene lists  
> would usually be extracted from the literature. Also the analysis  
> package (including version) that was used to generate a gene list  
> should be identified. One possible use of these gene lists is to  
> compare them to identify genes are differentially expressed under  
> the same/similar experimental condition across different microarray  
> experiments. This would help identify true signals from noises.
>
> Hope it helps.
>
> Cheers,
>
> -Kei
>
>
>
> Matthias Samwald wrote:
>>
>> Hi Susie,
>>
>> Susie wrote:
>>> It would be great if you could take a look at it and provide  
>>> comments. The
>>> ontology is available at:
>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ 
>>> Experiment_Ontology
>>
>> * Some of the entities/properties are missing a rdfs:label or have  
>> an empty label (a string with lenght 0).
>> * Some of the entities could be taken from existing ontologies  
>> like OBI, RO or some of the OBO Foundry ontologies. This would  
>> save work and makes integration with other data sources and  
>> ontologies much easier. By the way, there seem to be several  
>> groups working on ontologies for mircoarray experiments, or are at  
>> least planning to do that. It would be great if these groups could  
>> work together.
>> * The class 'Chip type' should be removed and be replaced by  
>> subclasses of 'chip', e.g., 'chip (human)', 'chip (mouse)' etc.
>> * Some of the object properties appear like they are intended to  
>> be datatype properties (e.g., 'has proteome id').
>> * Many of the datatype properties could be replaced with object  
>> properties, possibly referring to third party ontologies -- of  
>> course this would require a richer ontology and more work spent on  
>> creating mappings. 'has molecular function' could refer to  
>> entities from the gene ontology, 'has associated organ' could  
>> refer to an ontology about anatomy and so on.
>> * Object properties and their ranges are quite redundant. Property  
>> 'has reagent' has range 'Reagent', property 'has treatment' has  
>> range'Treatment' and so on. Maybe the ontology could be designed  
>> in such a way that there are only some generic properties such as  
>> 'has part'. This would make the ontology much easier to maintain,  
>> query and understand in the long term.
>> * It is unclear how 'Gene list' is intended to be used.
>> * 'Hardware' and 'Software' should not be subclasses of 'Protocol'.
>>
>>
>> Many of the datatype properties in this ontology look very  
>> interesting and might provide requirements for other ontologies.  
>> It would be great if some of them could be described/commented in  
>> more detail so that we know more about the requirements that  
>> motivated the creation of these properties.
>>
>> I hope that was somewhat helpful.
>>
>> cheers,
>> Matthias Samwald
>>
>>
>>
>
>
>



William Bug, M.S., M.Phil.                                         		 
email: wbug@ncmir.ucsd.edu
Ontological Engineer (Programmer Analyst III)		work: (610) 457-0443
Biomedical Informatics Research Network (BIRN)
and
National Center for Microscopy & Imaging Research (NCMIR)
Dept. of Neuroscience, School of Medicine
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093

Please note my email has recently changed
Received on Friday, 7 December 2007 04:17:00 UTC