Re: Experiment Ontology from Kei Cheung on 2007-12-10 (public-semweb-lifesci@w3.org from December 2007)

From: Kei Cheung <kei.cheung@yale.edu>
Date: Sun, 09 Dec 2007 21:40:30 -0500
To: Bill Bug <wbug@ncmir.ucsd.edu>
Cc: Susie Stephens <STEPHENS_SUSIE_M@LILLY.COM>, Matthias Samwald <samwald@gmx.at>, "public-semweb-lifesci@w3.org hcls" <public-semweb-lifesci@w3.org>, "Karen (NIH/NIDA) [E] Skinner" <kskinner@nida.nih.gov>, Alan Ruttenberg <alanruttenberg@gmail.com>
Message-id: <475CA71E.4040207@yale.edu>
Nice summary and comments, Bill. This is the idea of open innovation and 
open community.

The example I gave includes hypothesis. In addition to the ontologies 
you mentioned, we might also need to think about the SWAN ontology, 
which captures hypotheses.

Cheers,

-Kei

Bill Bug wrote:

> Hi Susie,
>
> We certainly do need an "Experiment Ontology" - or Ontology of 
> Biomedical Investigation (OBI).
>
> I believe Matthias, Michael, and Kei have all made exactly the points 
> I think are most important to consider:
> 1) Matthias's comments
> Are you following "best practices" in creating the ontology.  I 
> believe Matthias gives many instructive examples on how to adjust what 
> is here to bring it much more in sync with the emerging "best 
> practices" that are coming out of the community development 
> surrounding a variety of OBO Foundry ontologies.  Matthias also makes 
> the point that its important to seek to re-use (or directly contribute 
> to) the emerging community ontologies to cover the required domains. 
>  In the case of this particular Experiment Ontology, the ontologies to 
> consider are Ontology of Biomedical Investigation (OBI), the OBO 
> Relations Ontology, the Gene Ontology (specifically the Molecular 
> Function and Cellular Component branches, the latter of which is 
> designed to capture components down to the level of macromolecular 
> complexes), the Sequence Ontology, Protein Ontology (nascent - but 
> proceeding rapidly), the Cell Ontology - at a minimum.  As many on 
> this list know - and I'm certain the talented folks at Lilly who 
> invested time in assembling this ontology also learned - many of these 
> are not fully ready for prime-time, and/or may not FULLY cover the 
> breadth and depth of the domains a specific application requires. 
>  However, if one doesn't seek to work with these community efforts, 
> you cannot expect to achieve the ultimately goal, which is to make 
> your data maximally "semantically sticky", so as to ensure the least 
> amount of custom logic and human effort will be required to get the 
> most value from your data.  Otherwise, you stand the chance of 
> creating what may be a useful ontology that meets your specific 
> requirements (as has been true of "investigation"-oriented ontologies 
> that have come before such as the MAGE Ontology, ExperiBase, EXPO, 
> myGRID KAVE, etc.), but don't help the community at-large to 
> appropriately re-use your data.  In each case, these ontologies or KR 
> frameworks have been extremely useful in the local application context 
> for which they were constructed, but they cannot be effectively 
> employed as the basis for semantically-driven integration across data 
> sets that may not be able to accept the constraints (or lack thereof) 
> of this application-oriented ontology.
> Would you know off-hand, Susie, whether the folks who worked on this 
> ontology at Lilly have both reviewed the relevant community efforts 
> cited above and/or have sought to interact with those groups to get 
> some input on how best to meet the overall requirements that underlie 
> this particular Experiment Ontology with the minimal required effort 
> and in a manner that could help to ensure Lilly's sunk investment 
> could be of benefit to us all.
>
> 2) Michael's comments
> It's very helpful to know what the target is when it comes to 
> exporting/exchanging the actual data.  As Michael points out, a great 
> deal of work has gone into the production of FuGE (and MaGE before it) 
> to come up with the appropriate division of labor between the 
> semantically-opaque, syntactical requirements as represented in a data 
> model such as MaGE or FuGE and the explicit semantics as captured in 
> the ontology.  For those using FuGE, as Michael states, in the realm 
> of syntax, the intention for FuGE is to provide a shared structure for 
> universal elements such as biomaterials, experiment 
> populations/pools/groups, protocol details, reagents details, etc.. 
>  Built on that shared, generic foundation, any specific discipline - 
> e.g., microarray expression, GC-MS, FISH, MRI, etc. - can sub-class 
> FuGE components and add what additional detail required in their 
> discipline.  In parallel with this effort on data structure, the OBI 
> ontology cooperative seeks to provide that same foundation for the 
> shared semantic domains, and a clear set of recommended practices for 
> how to re-use entities from other OBO Foundry ontologies such as 
> ChEBI, Sequence Ontology, Protein Ontology, OBO Cell, Organism 
> Taxonomy (OWL versions of NCBI Tax), etc. to specify the critical 
> biomedical entities and their complex relations.  As I say above, 
> these are works in progress.  For those of us who must have something 
> working now, the recommended practice is to actively participate in 
> these projects with an eye toward following their practice - and 
> replacing any "proxy" you create in the interim with the community 
> ontology, when it is ready for use.  This is what we have done in the 
> BIRN ontology BIRNLex.  We actually have an OWL module called 
> "BIRNLex-OBI-Proxy.owl" which we fully intend to replace with OBI 
> entities, when they are ready for use.  We also have 
> "BIRNLex-Investigation.owl" that builds on this "proxy" to cover 
> entities BIRN researchers must capture.  We expect to eventually see 
> the contents of "BIRNLex-Investigation" in OBI in some form.  We 
> intend to "contribute" those elements from this OWL file directly to 
> OBI, when OBI is ready for them, and we have the time work through 
> this migration process.
>
> 3) Kei's comments
> Examples - examples - examples.  This is critical.  Working through 
> the example Kei cites from the NIH Neuroscience Microarray Consortium 
> is a wonderful way to determine whether:
> - there are existing community ontologies that can meet the KR and 
> processing requirements
> - where the gaps are in those community ontologies
> - whether the ontology you are creating effectively fills those gaps 
> (if it does, that makes it very clear how the community effort can 
> make effective use of your ontology)
> In regards to Gene Lists, Kei is certainly correct.  If these are 
> captured through algorithmic means, it's critical to capture the 
> details on that algorithm - typically both the version of the 
> algorithm as well as the version of the data repository you ran it 
> against.
> Also - where gene entities are concerned - there is ongoing work 
> between the GO groups, the Sequence Ontology, and the Protein Ontology 
> that is particularly targeted toward capturing the specific relations 
> between types of genomic sequence elements and types of biologically 
> active protein-based molecules (e.g., macromolecular complexes 
> composed of a collection of proteins in a variety of 
> post-translationally modified states - e.g., GPC receptors, ion 
> channels, transporters, pathway enzymes, etc. - i.e., Rx drug 
> targets).  These are the details we'll all require in order to do 
> round-trip pharmacogenetics - i.e.,effects of genetic constructs on 
> target susceptibility to drugs - AND - the ways in which drugs 
> ultimately alter macromolecular complexes by leading to changes in 
> gene expression.
>
> Just my $0.02 filtering on these helpful comments from Matthias, 
> Michael, and Kei.
>
> Cheers,
> Bill
>
> On Dec 3, 2007, at 1:00 PM, Kei Cheung wrote:
>
>>
>> This is great!
>>
>> I have a microarray experiment description (that has to do with 
>> Alzheimer Disease) extracted from NINDS microarray consortium:
>>
>> http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773 
>> <http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773>
>>
>> I just wonder how this example would fit this experiment ontology (as 
>> well as others such as OBI) As shown in this example, we record 
>> information such as organ type, organ region, cell type (layer II 
>> pyramidal neuron), etc. NINDS microarry consortium uses different 
>> array platforms (e.g., agilent, Affymetrix, and cDNA)  for different 
>> organisms so one may need to divide chips into groups corresponding 
>> to different platform types. Each group can then be further divided 
>> into subgroups corresponding to different organisms.
>>
>> We also would like to capture gene lists (not the raw gene lists but 
>> the ones (much shorter) that indicate what genes are over/under 
>> expressed under certain experimental conditions). Such gene lists 
>> would usually be extracted from the literature. Also the analysis 
>> package (including version) that was used to generate a gene list 
>> should be identified. One possible use of these gene lists is to 
>> compare them to identify genes are differentially expressed under the 
>> same/similar experimental condition across different microarray 
>> experiments. This would help identify true signals from noises.
>>
>> Hope it helps.
>>
>> Cheers,
>>
>> -Kei
>>
>>
>>
>> Matthias Samwald wrote:
>>
>>>
>>> Hi Susie,
>>>
>>> Susie wrote:
>>>
>>>> It would be great if you could take a look at it and provide 
>>>> comments. The
>>>> ontology is available at:
>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/Experiment_Ontology
>>>
>>>
>>> * Some of the entities/properties are missing a rdfs:label or have 
>>> an empty label (a string with lenght 0).
>>> * Some of the entities could be taken from existing ontologies like 
>>> OBI, RO or some of the OBO Foundry ontologies. This would save work 
>>> and makes integration with other data sources and ontologies much 
>>> easier. By the way, there seem to be several groups working on 
>>> ontologies for mircoarray experiments, or are at least planning to 
>>> do that. It would be great if these groups could work together.
>>> * The class 'Chip type' should be removed and be replaced by 
>>> subclasses of 'chip', e.g., 'chip (human)', 'chip (mouse)' etc.
>>> * Some of the object properties appear like they are intended to be 
>>> datatype properties (e.g., 'has proteome id').
>>> * Many of the datatype properties could be replaced with object 
>>> properties, possibly referring to third party ontologies -- of 
>>> course this would require a richer ontology and more work spent on 
>>> creating mappings. 'has molecular function' could refer to entities 
>>> from the gene ontology, 'has associated organ' could refer to an 
>>> ontology about anatomy and so on.
>>> * Object properties and their ranges are quite redundant. Property 
>>> 'has reagent' has range 'Reagent', property 'has treatment' has 
>>> range'Treatment' and so on. Maybe the ontology could be designed in 
>>> such a way that there are only some generic properties such as 'has 
>>> part'. This would make the ontology much easier to maintain, query 
>>> and understand in the long term.
>>> * It is unclear how 'Gene list' is intended to be used.
>>> * 'Hardware' and 'Software' should not be subclasses of 'Protocol'.
>>>
>>>
>>> Many of the datatype properties in this ontology look very 
>>> interesting and might provide requirements for other ontologies. It 
>>> would be great if some of them could be described/commented in more 
>>> detail so that we know more about the requirements that motivated 
>>> the creation of these properties.
>>>
>>> I hope that was somewhat helpful.
>>>
>>> cheers,
>>> Matthias Samwald
>>>
>>>
>>>
>>
>>
>>
>
>
>
> William Bug, M.S., M.Phil.                                          
> email: wbug@ncmir.ucsd.edu <mailto:wbug@ncmir.ucsd.edu>
> Ontological Engineer (Programmer Analyst III) work: (610) 457-0443
> Biomedical Informatics Research Network (BIRN)
> and
> National Center for Microscopy & Imaging Research (NCMIR)
> Dept. of Neuroscience, School of Medicine
> University of California, San Diego
> 9500 Gilman Drive
> La Jolla, CA 92093
>
> Please note my email has recently changed
>
>
Received on Monday, 10 December 2007 02:40:47 UTC