- From: Chris Mungall <cjm@fruitfly.org>
- Date: Thu, 5 Jul 2007 09:01:04 -0700
- To: Kei Cheung <kei.cheung@yale.edu>
- Cc: "Skinner, Karen (NIH/NIDA) [E]" <kskinner@nida.nih.gov>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
On Jul 4, 2007, at 8:27 PM, Kei Cheung wrote: > > As a follow-up example, a study for estimating the error rate of > Gene Ontology (GO) was done: > > http://www.pubmedcentral.nih.gov/articlerender.fcgi? > artid=1892569#id2674403 > > The study showed that the GO term annotation error rate estimates > for the GoSeqLite database were found to be 13% to 18% for curated > non-ISS annotations, 49% for ISS annotations, and 28% to 30% for > all curated annotations. (ISS stands for inferred from sequence > similiarity). Despite these findings, the authors concluded that GO > is a comparatively high quality source of informaton. Integration > of databases involving significant error rates, however, can impact > negatively the quality of science. I have not yet properly digested this paper, but on a cursory reading there appear to be a few serious flaws. First, a lack of understanding of basic ontology principles - annotations to less specific classes in the graph are treated as errors. Second, the authors appear to make a lot of incorrect assumptions about how ISS annotations are curated. It's curious they predict such a high error rate yet don't provide any examples. > > -Kei > > Kei Cheung wrote: > >> >> Hi Karen, >> >> Your questions remind me of the following classic article written >> by Robert Robbins on "Challenges in the Human Genome Project". >> >> http://www.esp.org/umdnj.pdf >> >> Although it doesn't directly answer the questions, in the >> "Nomenclature Problems" section (p. 20-21), it discusses the >> significant problem of inconsistent knowledge representation. It >> says that it's mistake to believe that terminology fluidity is >> not an issue biological in database design. It also says that many >> biologists don't realize that, in a database bulit with 5% error >> in the definition of individual concepts, a query that joins >> across 15 concepts has less than 50% chance of returning an >> adequate answer. The section also points out the importance of >> formal representation of scientific knowledge in addressing the >> inconsistency and nomenclature problems. Semantic Web and standard >> ontologies provide a solution to these database problems. We just >> don't simply convert an existing database syntactically into a >> semantic web format, but we also need to do careful semantic >> conversion to eliminate as many errors, ambiguities, and >> inconsistencies as possible in order to reduce the costs of >> knowledge retrieval and discovery. >> >> -Kei >> >> Skinner, Karen (NIH/NIDA) [E] wrote: >> >>> Recently I read somewhere (on this list, a blog, a news story, >>> where...?) an assertion that struck me as an interesting passing >>> fact at the time. As I recall, it indicated that more websites >>> are accessed via a search engine than by typing a URL into a >>> browser web address bar. >>> >>> Alas, I did not save the reference, and now I am looking for the >>> proverbial needle in a haystack. Namely, what is the exact >>> assertion, who asserted it, and where did they make it? If >>> anyone in the world has this information or knows how to get it, >>> or or has related data, I imagine they would belong to this list. >>> I would be most grateful for any useful pointer. >>> >>> Along this same vein, if anyone has any statistics, data, >>> anecodotes or information related to the cost of >>> (1) "friction" arising from inefficient or inappropriate efforts >>> at information retrieval >>> and >>> (2) the cost of "negative knowledge" about an existing resource >>> or data, >>> >>> these, too, would be helpful. >>> >>> (For example, with respect to #2 above, we are all familiar with >>> comparison shopping for goods and services. We seek data/ >>> information about prices and quality , but at what point does the >>> expenditure of that effort exceed the value of the information >>> learned?) >>> >>> I am not looking for examples at the level of a philosophy or >>> ecnomics Ph.D. thesis, but rather a few examples in the sciences >>> that can be used at the level of an "elevator speech." >>> >>> >>> Karen Skinner >>> Deputy Director for Science and Technology Development >>> Division of Basic Neuroscience and Behavior Research >>> National Institute on Drug Abuse/NIH >>> >>> >>> >>> >>> >>> >> >> >> > > > >
Received on Thursday, 5 July 2007 16:01:26 UTC