- From: Kei Cheung <kei.cheung@yale.edu>
- Date: Sun, 09 Dec 2007 21:40:30 -0500
- To: Bill Bug <wbug@ncmir.ucsd.edu>
- Cc: Susie Stephens <STEPHENS_SUSIE_M@LILLY.COM>, Matthias Samwald <samwald@gmx.at>, "public-semweb-lifesci@w3.org hcls" <public-semweb-lifesci@w3.org>, "Karen (NIH/NIDA) [E] Skinner" <kskinner@nida.nih.gov>, Alan Ruttenberg <alanruttenberg@gmail.com>
Nice summary and comments, Bill. This is the idea of open innovation and open community. The example I gave includes hypothesis. In addition to the ontologies you mentioned, we might also need to think about the SWAN ontology, which captures hypotheses. Cheers, -Kei Bill Bug wrote: > Hi Susie, > > We certainly do need an "Experiment Ontology" - or Ontology of > Biomedical Investigation (OBI). > > I believe Matthias, Michael, and Kei have all made exactly the points > I think are most important to consider: > 1) Matthias's comments > Are you following "best practices" in creating the ontology. I > believe Matthias gives many instructive examples on how to adjust what > is here to bring it much more in sync with the emerging "best > practices" that are coming out of the community development > surrounding a variety of OBO Foundry ontologies. Matthias also makes > the point that its important to seek to re-use (or directly contribute > to) the emerging community ontologies to cover the required domains. > In the case of this particular Experiment Ontology, the ontologies to > consider are Ontology of Biomedical Investigation (OBI), the OBO > Relations Ontology, the Gene Ontology (specifically the Molecular > Function and Cellular Component branches, the latter of which is > designed to capture components down to the level of macromolecular > complexes), the Sequence Ontology, Protein Ontology (nascent - but > proceeding rapidly), the Cell Ontology - at a minimum. As many on > this list know - and I'm certain the talented folks at Lilly who > invested time in assembling this ontology also learned - many of these > are not fully ready for prime-time, and/or may not FULLY cover the > breadth and depth of the domains a specific application requires. > However, if one doesn't seek to work with these community efforts, > you cannot expect to achieve the ultimately goal, which is to make > your data maximally "semantically sticky", so as to ensure the least > amount of custom logic and human effort will be required to get the > most value from your data. Otherwise, you stand the chance of > creating what may be a useful ontology that meets your specific > requirements (as has been true of "investigation"-oriented ontologies > that have come before such as the MAGE Ontology, ExperiBase, EXPO, > myGRID KAVE, etc.), but don't help the community at-large to > appropriately re-use your data. In each case, these ontologies or KR > frameworks have been extremely useful in the local application context > for which they were constructed, but they cannot be effectively > employed as the basis for semantically-driven integration across data > sets that may not be able to accept the constraints (or lack thereof) > of this application-oriented ontology. > Would you know off-hand, Susie, whether the folks who worked on this > ontology at Lilly have both reviewed the relevant community efforts > cited above and/or have sought to interact with those groups to get > some input on how best to meet the overall requirements that underlie > this particular Experiment Ontology with the minimal required effort > and in a manner that could help to ensure Lilly's sunk investment > could be of benefit to us all. > > 2) Michael's comments > It's very helpful to know what the target is when it comes to > exporting/exchanging the actual data. As Michael points out, a great > deal of work has gone into the production of FuGE (and MaGE before it) > to come up with the appropriate division of labor between the > semantically-opaque, syntactical requirements as represented in a data > model such as MaGE or FuGE and the explicit semantics as captured in > the ontology. For those using FuGE, as Michael states, in the realm > of syntax, the intention for FuGE is to provide a shared structure for > universal elements such as biomaterials, experiment > populations/pools/groups, protocol details, reagents details, etc.. > Built on that shared, generic foundation, any specific discipline - > e.g., microarray expression, GC-MS, FISH, MRI, etc. - can sub-class > FuGE components and add what additional detail required in their > discipline. In parallel with this effort on data structure, the OBI > ontology cooperative seeks to provide that same foundation for the > shared semantic domains, and a clear set of recommended practices for > how to re-use entities from other OBO Foundry ontologies such as > ChEBI, Sequence Ontology, Protein Ontology, OBO Cell, Organism > Taxonomy (OWL versions of NCBI Tax), etc. to specify the critical > biomedical entities and their complex relations. As I say above, > these are works in progress. For those of us who must have something > working now, the recommended practice is to actively participate in > these projects with an eye toward following their practice - and > replacing any "proxy" you create in the interim with the community > ontology, when it is ready for use. This is what we have done in the > BIRN ontology BIRNLex. We actually have an OWL module called > "BIRNLex-OBI-Proxy.owl" which we fully intend to replace with OBI > entities, when they are ready for use. We also have > "BIRNLex-Investigation.owl" that builds on this "proxy" to cover > entities BIRN researchers must capture. We expect to eventually see > the contents of "BIRNLex-Investigation" in OBI in some form. We > intend to "contribute" those elements from this OWL file directly to > OBI, when OBI is ready for them, and we have the time work through > this migration process. > > 3) Kei's comments > Examples - examples - examples. This is critical. Working through > the example Kei cites from the NIH Neuroscience Microarray Consortium > is a wonderful way to determine whether: > - there are existing community ontologies that can meet the KR and > processing requirements > - where the gaps are in those community ontologies > - whether the ontology you are creating effectively fills those gaps > (if it does, that makes it very clear how the community effort can > make effective use of your ontology) > In regards to Gene Lists, Kei is certainly correct. If these are > captured through algorithmic means, it's critical to capture the > details on that algorithm - typically both the version of the > algorithm as well as the version of the data repository you ran it > against. > Also - where gene entities are concerned - there is ongoing work > between the GO groups, the Sequence Ontology, and the Protein Ontology > that is particularly targeted toward capturing the specific relations > between types of genomic sequence elements and types of biologically > active protein-based molecules (e.g., macromolecular complexes > composed of a collection of proteins in a variety of > post-translationally modified states - e.g., GPC receptors, ion > channels, transporters, pathway enzymes, etc. - i.e., Rx drug > targets). These are the details we'll all require in order to do > round-trip pharmacogenetics - i.e.,effects of genetic constructs on > target susceptibility to drugs - AND - the ways in which drugs > ultimately alter macromolecular complexes by leading to changes in > gene expression. > > Just my $0.02 filtering on these helpful comments from Matthias, > Michael, and Kei. > > Cheers, > Bill > > On Dec 3, 2007, at 1:00 PM, Kei Cheung wrote: > >> >> This is great! >> >> I have a microarray experiment description (that has to do with >> Alzheimer Disease) extracted from NINDS microarray consortium: >> >> http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773 >> <http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773> >> >> I just wonder how this example would fit this experiment ontology (as >> well as others such as OBI) As shown in this example, we record >> information such as organ type, organ region, cell type (layer II >> pyramidal neuron), etc. NINDS microarry consortium uses different >> array platforms (e.g., agilent, Affymetrix, and cDNA) for different >> organisms so one may need to divide chips into groups corresponding >> to different platform types. Each group can then be further divided >> into subgroups corresponding to different organisms. >> >> We also would like to capture gene lists (not the raw gene lists but >> the ones (much shorter) that indicate what genes are over/under >> expressed under certain experimental conditions). Such gene lists >> would usually be extracted from the literature. Also the analysis >> package (including version) that was used to generate a gene list >> should be identified. One possible use of these gene lists is to >> compare them to identify genes are differentially expressed under the >> same/similar experimental condition across different microarray >> experiments. This would help identify true signals from noises. >> >> Hope it helps. >> >> Cheers, >> >> -Kei >> >> >> >> Matthias Samwald wrote: >> >>> >>> Hi Susie, >>> >>> Susie wrote: >>> >>>> It would be great if you could take a look at it and provide >>>> comments. The >>>> ontology is available at: >>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/Experiment_Ontology >>> >>> >>> * Some of the entities/properties are missing a rdfs:label or have >>> an empty label (a string with lenght 0). >>> * Some of the entities could be taken from existing ontologies like >>> OBI, RO or some of the OBO Foundry ontologies. This would save work >>> and makes integration with other data sources and ontologies much >>> easier. By the way, there seem to be several groups working on >>> ontologies for mircoarray experiments, or are at least planning to >>> do that. It would be great if these groups could work together. >>> * The class 'Chip type' should be removed and be replaced by >>> subclasses of 'chip', e.g., 'chip (human)', 'chip (mouse)' etc. >>> * Some of the object properties appear like they are intended to be >>> datatype properties (e.g., 'has proteome id'). >>> * Many of the datatype properties could be replaced with object >>> properties, possibly referring to third party ontologies -- of >>> course this would require a richer ontology and more work spent on >>> creating mappings. 'has molecular function' could refer to entities >>> from the gene ontology, 'has associated organ' could refer to an >>> ontology about anatomy and so on. >>> * Object properties and their ranges are quite redundant. Property >>> 'has reagent' has range 'Reagent', property 'has treatment' has >>> range'Treatment' and so on. Maybe the ontology could be designed in >>> such a way that there are only some generic properties such as 'has >>> part'. This would make the ontology much easier to maintain, query >>> and understand in the long term. >>> * It is unclear how 'Gene list' is intended to be used. >>> * 'Hardware' and 'Software' should not be subclasses of 'Protocol'. >>> >>> >>> Many of the datatype properties in this ontology look very >>> interesting and might provide requirements for other ontologies. It >>> would be great if some of them could be described/commented in more >>> detail so that we know more about the requirements that motivated >>> the creation of these properties. >>> >>> I hope that was somewhat helpful. >>> >>> cheers, >>> Matthias Samwald >>> >>> >>> >> >> >> > > > > William Bug, M.S., M.Phil. > email: wbug@ncmir.ucsd.edu <mailto:wbug@ncmir.ucsd.edu> > Ontological Engineer (Programmer Analyst III) work: (610) 457-0443 > Biomedical Informatics Research Network (BIRN) > and > National Center for Microscopy & Imaging Research (NCMIR) > Dept. of Neuroscience, School of Medicine > University of California, San Diego > 9500 Gilman Drive > La Jolla, CA 92093 > > Please note my email has recently changed > >
Received on Monday, 10 December 2007 02:40:47 UTC