- From: Kei Cheung <kei.cheung@yale.edu>
- Date: Wed, 04 Jul 2007 23:27:34 -0400
- To: "Skinner, Karen (NIH/NIDA) [E]" <kskinner@nida.nih.gov>
- Cc: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
As a follow-up example, a study for estimating the error rate of Gene Ontology (GO) was done: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1892569#id2674403 The study showed that the GO term annotation error rate estimates for the GoSeqLite database were found to be 13% to 18% for curated non-ISS annotations, 49% for ISS annotations, and 28% to 30% for all curated annotations. (ISS stands for inferred from sequence similiarity). Despite these findings, the authors concluded that GO is a comparatively high quality source of informaton. Integration of databases involving significant error rates, however, can impact negatively the quality of science. -Kei Kei Cheung wrote: > > Hi Karen, > > Your questions remind me of the following classic article written by > Robert Robbins on "Challenges in the Human Genome Project". > > http://www.esp.org/umdnj.pdf > > Although it doesn't directly answer the questions, in the > "Nomenclature Problems" section (p. 20-21), it discusses the > significant problem of inconsistent knowledge representation. It says > that it's mistake to believe that terminology fluidity is not an > issue biological in database design. It also says that many biologists > don't realize that, in a database bulit with 5% error in the > definition of individual concepts, a query that joins across 15 > concepts has less than 50% chance of returning an adequate answer. The > section also points out the importance of formal representation of > scientific knowledge in addressing the inconsistency and nomenclature > problems. Semantic Web and standard ontologies provide a solution to > these database problems. We just don't simply convert an existing > database syntactically into a semantic web format, but we also need to > do careful semantic conversion to eliminate as many errors, > ambiguities, and inconsistencies as possible in order to reduce the > costs of knowledge retrieval and discovery. > > -Kei > > Skinner, Karen (NIH/NIDA) [E] wrote: > >> Recently I read somewhere (on this list, a blog, a news story, >> where...?) an assertion that struck me as an interesting passing fact >> at the time. As I recall, it indicated that more websites are >> accessed via a search engine than by typing a URL into a browser web >> address bar. >> >> Alas, I did not save the reference, and now I am looking for the >> proverbial needle in a haystack. Namely, what is the exact assertion, >> who asserted it, and where did they make it? If anyone in the world >> has this information or knows how to get it, or or has related data, >> I imagine they would belong to this list. I would be most grateful >> for any useful pointer. >> >> Along this same vein, if anyone has any statistics, data, anecodotes >> or information related to the cost of >> (1) "friction" arising from inefficient or inappropriate efforts at >> information retrieval >> and >> (2) the cost of "negative knowledge" about an existing resource or data, >> >> these, too, would be helpful. >> >> (For example, with respect to #2 above, we are all familiar with >> comparison shopping for goods and services. We seek data/information >> about prices and quality , but at what point does the expenditure of >> that effort exceed the value of the information learned?) >> >> I am not looking for examples at the level of a philosophy or >> ecnomics Ph.D. thesis, but rather a few examples in the sciences that >> can be used at the level of an "elevator speech." >> >> >> Karen Skinner >> Deputy Director for Science and Technology Development >> Division of Basic Neuroscience and Behavior Research >> National Institute on Drug Abuse/NIH >> >> >> >> >> >> >> > > >
Received on Thursday, 5 July 2007 03:27:49 UTC