- From: David States <dstates@bioinformatics.med.umich.edu>
- Date: Sun, 25 Jul 2004 16:37:48 -0400
- To: <public-semweb-lifesci@w3.org>
In thinking about applications of semantic web technology in the life sciences, it will be important to address issues of domain specific vocabulary. This can occur at multiple levels. Term collision Problems of term collision are well known (e.g. the multiple meanings of "china" or "tire"). These can occur within a domain, particularly when using acronyms. For example, to a cardiology PCR might mean "premature contraction" while in the metabolism community it is an abbreviation for "phosphocreatine" and of course in molecular biology it is widely used as an abbreviation for "polymerase chain reaction". Similarly, "P names" for proteins are very widely used (e.g. P17, P21 and P53 referring to well known proteins with molecular weights of approximately 17, 21 and 53 kilodaltons, respectively), but with many thousands of proteins, imprecise molecular weight determinations and a limited range of molecular weights such terms are highly ambiguous. Specific vs. general use Many terms have specific as well as general definitions. For example, "association" in a statistical genetics paper is likely to mean "co-occurrence of a trait with a genetic marker unlikely to occur at random with a chance of greater than one in 1,000" while in many other settings it will be used with the general meaning "related". The phase "For the purposes of this discussion, we will define..." is often, but not always used to declare such specialized definitions. Polymorphic definitions Molecular biologists sometimes joke that if you put 3 scientists in a room and ask them to define a gene, you will get 5 definitions and 7 dissenting opinions. Comedy often has its roots in reality, and there is considerable truth underlying this joke. A gene might be a genetic locus without reference to a particular molecular structure, it might be a region of genomic DNA, or it might be specifically the transcribed exons, to name just a few possibilities. Often it is difficult to known in exactly what sense a term is being used. The point of this discussion is to point out that domain and range properties are often uncertain and may only be probabilistically assigned even with a domain. RDF seems to define domain and range in absolute logical terms. In applying RDF encoded semantic webs to the life sciences, it will be important to allow for uncertainty in meaning. David David J. States, M.D., Ph.D. Professor of Human Genetics Director of Bioinformatics University of Michigan School of Medicine Medical Science Building I, Room 5443 Ann Arbor, MI 48109 USA email: dstates@umich.edu tel: (734) 615-5510 fax: (734) 615-6553 URL: http://stateslab.bioinformatics.med.umich.edu
Received on Sunday, 25 July 2004 16:38:30 UTC