Re: Seeking Help with finding an assertion

On Jul 4, 2007, at 8:27 PM, Kei Cheung wrote:

>
> As a follow-up example, a study for estimating the error rate of  
> Gene Ontology (GO) was done:
>
> http://www.pubmedcentral.nih.gov/articlerender.fcgi? 
> artid=1892569#id2674403
>
> The study showed that the GO term annotation error rate estimates  
> for the GoSeqLite database were found to be 13% to 18% for curated  
> non-ISS annotations, 49% for ISS annotations, and 28% to 30% for  
> all curated annotations. (ISS stands for inferred from sequence  
> similiarity). Despite these findings, the authors concluded that GO  
> is a comparatively high quality source of informaton. Integration  
> of databases involving significant error rates, however, can impact  
> negatively the quality of science.

I have not yet properly digested this paper, but on a cursory reading  
there appear to be a few serious flaws. First, a lack of  
understanding of basic ontology principles - annotations to less  
specific classes in the graph are treated as errors. Second, the  
authors appear to make a lot of incorrect assumptions about how ISS  
annotations are curated.

It's curious they predict such a high error rate yet don't provide  
any examples.

>
> -Kei
>
> Kei Cheung wrote:
>
>>
>> Hi Karen,
>>
>> Your questions remind me of the following classic article written  
>> by Robert Robbins on "Challenges in the Human Genome Project".
>>
>> http://www.esp.org/umdnj.pdf
>>
>> Although it doesn't directly answer the questions, in the  
>> "Nomenclature Problems" section (p. 20-21), it discusses the  
>> significant problem of inconsistent knowledge representation. It  
>> says that it's mistake to believe  that terminology fluidity is  
>> not an issue biological in database design. It also says that many  
>> biologists don't realize that, in a database bulit with 5% error  
>> in the definition of individual concepts, a query that joins  
>> across 15 concepts has less than 50% chance of returning an  
>> adequate answer. The section also points out the importance of  
>> formal representation of scientific knowledge in addressing the  
>> inconsistency and nomenclature problems. Semantic Web and standard  
>> ontologies provide a solution to these database problems. We just  
>> don't simply convert an existing database syntactically into a  
>> semantic web format, but we also need to do careful semantic  
>> conversion to eliminate as many errors, ambiguities, and  
>> inconsistencies as possible in order to reduce the costs of  
>> knowledge retrieval and discovery.
>>
>> -Kei
>>
>> Skinner, Karen (NIH/NIDA) [E] wrote:
>>
>>> Recently I read somewhere (on this list, a blog, a news story,  
>>> where...?) an assertion that struck me as an interesting passing  
>>> fact at the time.   As I recall, it indicated that more websites  
>>> are accessed via a search engine than by typing a URL into a  
>>> browser web address bar.
>>>
>>> Alas, I did not save the reference, and now I am looking for the  
>>> proverbial needle in a haystack. Namely, what is the exact  
>>> assertion, who asserted it, and where did they make it?  If  
>>> anyone in the world has this information or knows how to get it,  
>>> or or has related data, I imagine they would belong to this list.  
>>> I would be most grateful for any useful pointer.
>>>
>>> Along this same vein, if anyone has any statistics, data,  
>>> anecodotes or information related to the cost of
>>> (1) "friction" arising from inefficient or inappropriate efforts  
>>> at information retrieval
>>> and
>>> (2) the cost of "negative knowledge" about an existing resource  
>>> or data,
>>>
>>> these, too, would be helpful.
>>>
>>> (For example, with respect to #2 above, we are all familiar with  
>>> comparison shopping for goods and services. We seek data/ 
>>> information about prices and quality , but at what point does the  
>>> expenditure of that effort exceed the value of the information  
>>> learned?)
>>>
>>> I am not looking for examples at the level of a philosophy or  
>>> ecnomics Ph.D. thesis, but rather a few examples in the sciences  
>>> that can be used at the level of an "elevator speech."
>>>
>>>
>>> Karen Skinner
>>> Deputy Director for Science and Technology Development
>>> Division of Basic Neuroscience and Behavior Research
>>> National Institute on Drug Abuse/NIH
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
>

Received on Thursday, 5 July 2007 16:01:26 UTC