Domain specific ontologies in the sciences

In thinking about applications of semantic web technology in the life
sciences, it will be important to address issues of domain specific
vocabulary.  This can occur at multiple levels.

Term collision

Problems of term collision are well known (e.g. the multiple meanings of
"china" or "tire").  These can occur within a domain, particularly when
using acronyms.  For example, to a cardiology PCR might mean "premature
contraction" while in the metabolism community it is an abbreviation for
"phosphocreatine" and of course in molecular biology it is widely used
as an abbreviation for "polymerase chain reaction".  Similarly, "P
names" for proteins are very widely used (e.g. P17, P21 and P53
referring to well known proteins with molecular weights of approximately
17, 21 and 53 kilodaltons, respectively), but with many thousands of
proteins, imprecise molecular weight determinations and a limited range
of molecular weights such terms are highly ambiguous.

Specific vs. general use

Many terms have specific as well as general definitions.  For example,
"association" in a statistical genetics paper is likely to mean
"co-occurrence of a trait with a genetic marker unlikely to occur at
random with a chance of greater than one in 1,000" while in many other
settings it will be used with the general meaning "related".  The phase
"For the purposes of this discussion, we will define..." is often, but
not always used to declare such specialized definitions.

Polymorphic definitions

Molecular biologists sometimes joke that if you put 3 scientists in a
room and ask them to define a gene, you will get 5 definitions and 7
dissenting opinions.  Comedy often has its roots in reality, and there
is considerable truth underlying this joke.  A gene might be a genetic
locus without reference to a particular molecular structure, it might be
a region of genomic DNA, or it might be specifically the transcribed
exons, to name just a few possibilities.  Often it is difficult to known
in exactly what sense a term is being used.  

The point of this discussion is to point out that domain and range
properties are often uncertain and may only be probabilistically
assigned even with a domain.  RDF seems to define domain and range in
absolute logical terms.  In applying RDF encoded semantic webs to the
life sciences, it will be important to allow for uncertainty in meaning.

David

David J. States, M.D., Ph.D.
Professor of Human Genetics
Director of Bioinformatics
University of Michigan School of Medicine
Medical Science Building I, Room 5443
Ann Arbor, MI 48109
USA
email: dstates@umich.edu
tel: (734) 615-5510
fax: (734) 615-6553
URL: http://stateslab.bioinformatics.med.umich.edu
 
 

Received on Sunday, 25 July 2004 16:38:30 UTC