- From: William Bug <William.Bug@DrexelMed.edu>
- Date: Wed, 23 Aug 2006 02:44:37 -0400
- To: Marja Koivunen <marja@annotea.org>, Tim Berners-Lee <timbl@w3.org>, Xiaoshu Wang <wangxiao@musc.edu>, "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, Mark Wilkinson <markw@illuminae.com>, w3c semweb hcls <public-semweb-lifesci@w3.org>, www-tag@w3.org, Adrian Walker <adrianw@snet.net>
- Message-Id: <9F820BC2-E8C7-4877-8099-98FFD7A3F8A9@DrexelMed.edu>
Hi All, There are examples of systems that strive to separate the lexicon from the ontology, so as to ensure one particular lexical view of the underlying semantics doesn't "lock out" either humans or machines who do not "understand" that lexicon. Few are perfect, but many have effectively handled the issue of semantic interoperability, though often not at the level of semantic granularity required by experts at the bleeding edge of a specific scientific field. An ontology is of little use to anyone - person or machine - without instantiating it via a lexicon. Where very significant problems arise is when the lexicon is confused with the universals the ontology is intended to formally represent. I realize this boundary may appear artificial to some, but those who've worked on such issues for decades in the library & info sciences and in computational linguistics - despite some disagreement at the edges - will generally see this boundary as useful - even if they agree to disagree on whether it is in fact an artifact of human linguistic expression or a more fundamental expression of a sort of Heisenberg Uncertainty principle of KE/KR/KD. What I mean is the moment an algorithm tries to compute on an ontological expression in the context of specific data instances - whether the algorithm resides in silico or in a human brain - it "breaks" the universal nature of the principles and grounds it in a lexicon used to address the specific existential instances being manipulated within the domain of a specific application. I believe this issue is at the heart of some significant confusion regarding what an ontology is and the tasks it can help to implement. An effective and practical knowledge resource needs to include both ontological graphs and a complex lexical repository. I think where "ontology" construction often goes wrong is when it is not EXPLICIT and - of equal importance- quite SYSTEMATIC regarding the lexical extensions it includes - e.g., abbreviations, misspellings, various types of synonyms, homographic homonyms (the bane of NLP efforts everywhere), etc.. I was just listening to Michio Kaku discussing the recent controversy regarding the redefinition of "planet" status. As he and the astronomer Ken Croswell were discussing the issue, Dr. Kaku brought up the story from Richard Feinmann's biography regarding the difference between "naming" an entity and studying the fundamental properties and rules relating the continuum of entities in the physical world. Both the naming and the formalisms for characterizing the fundamentals are human artifacts - BUT what separates the naming from the expression of universals is the latter is guided by our increasing level of insight and understanding of real-world entities and the ways in which they relate to one another. No such criterion exists for the naming process, and this is why it is extremely helpful to keep the lexicon characterizing these names distinct from the expression of fundamentals (the ontologies). This is also an issue addressed by Gottfried W. von Leibniz in his philosophical works which all derived from the insight he had as a child that it MIGHT be possible to create a computable formalism for ontological entities analogous to the system created by mathematicians for performing axiomatic proofs in geometry. In MANY ways, our efforts here date back to this work by Leibniz via several, related historical threads in mathematics, philosophy, and various computationally-oriented scientific fields. One other general point - obviously the strategies and "best practices" for addressing these issues in the context of existing (and historical) data records including the literature are somewhat different, as opposed to what we hope to see researchers doing going forward. In an ideal world - say 10 years form now - we can hope to see publication mechanisms in place both for primary data, supporting reduction/analysis/interpretation, and the larger world of the scientific literature - systems such as SWAN and some of the more advanced systems in development at BioMed Central and PLoS - to help reduce the complexity of the lexical Babel-esque landscape we must currently contend with. This needs to be done in a manner that doesn't in any way restrict the expressiveness of lexicon or the onotological foundations, while also being implemented in a highly intuitive manner not requiring the researcher learn a complex formal means to express themselves beyond the existing complexity typically used amongst domain experts. This is why I'd still place this 10 years out. I don't think that's too optimistic a duration, however, given some of the revolutionary changes being introduced both by the SWTech C.S. community, as well as by the community of researchers embedded in the increasingly less messy process of biomedical ontology development and use. Some of these more modern scientific publication systems will come on line much sooner than this, but probably only in restricted contexts where there is a centralized authority that can both provide technical resources to develop, support, and evolve the systems, as well as enforce a certain level of compliance amongst its users - e.g., caBIG, the eScience myGRID project, REWERSE, The MIND Center at MGH, the BIRN project, etc.. For better or worse, as great a profile as these organizations represent, the landscape of working neuroscientists extends way beyond this privileged environment, and we all hope to see our efforts be of use and relevant to all neuroscientists (given the current scope of the HCLSIG hosted efforts is focussed on the neurosciences) and the value it can help neuroscientists realize for society at-large. As an example of where things can go wrong when convolving the lexicon with the ontology, take an artifact as relatively simple and seemingly "self-evident" as the "preferred label" or "preferred term" for a node in an ontological graph. In making the assertion "preferred", there is the implication some person or agency has passed judgement on the term. Reconciling two ontologies with overlapping knowledge domains can be made unnecessarily difficult when this implied contract is not made explicit. In other words, if you focus on reconciling the terms rather than reconciling the underlying semantic graphs, you can run into many unnecessary problems. I believe this issue is related to many of the discussions we've had on this list over the past 3 months both regarding ontology construction and use, as well as URI uniqueness and versioning contract. Formalisms such as SKOS can be extremely helpful in this regard, as we need to compute on the lexicon, as well as the ontological graph. To offer a relatively simple and ubiquitous example from neuroscience - on one side of the pond they prefer "neurone", while on the other "neuron" is standard term. Is one more true? Do they refer to different, underlying fundamental entities? Can we even call the underlying entities "fundamental" when any neuroscientist would admit there is no neuron/neurone which has been explicitly qualified down to the level of all it's constituent molecules**, along with their explicit disposition in space and time? I won't hold you at bay. I'll give you my sense of the "practical" answers to these questions. Is one more true? Obviously not, since they are just lexical habits, as opposed to fundamental differences in the view of the world. Do they refer to different, underlying fundamental entities? This is a harder call - and very context dependent, obviously. It will be acutely sensitive to the level of granularity of the information provided on the neuron/neurone. If you presented two neuroscientists with coarse-grained data on a neuron/neurone, it is likely they could come to agreement they both were referring to the same fundamental entity when they named the source of that data as a neuron/neurone. Can we call the underlying entities "fundamental" when any neuroscientist would admit there is no neuron/neurone which has been explicitly qualified down to the level of all it's constituent molecules, along with their explicit disposition through time? What happens when you provide more detailed information regarding the purported neuron/neurone - say sufficient detail so that the two neuroscientists find aspects of data interpretation that are incommensurable in the Kuhnian sense (http://plato.stanford.edu/ entries/thomas-kuhn/). Then, even if the two referred to the biological material entity that was the source of the data as a "neuron", they would likely not agree they were referring to the same, underlying fundamental entity. This is not unlike the situation described several posts below in this thread regarding a "gene". There could be a gene X identified by gene finding algorithm 1, an "identical" gene X (in terms of the coding sequences it contains) derived from gene finding algorithm 2, the same gene X defined via a chromosomal walk, and finally a gene X defined via conventional genetic complementarity or hybrid mapping. They could all contain the same coding sequence - or the same as yet functionally unidentified ESTs. What it comes down to here, as Mark Wilkinson stated deep in the thread is there is much confusion regarding what actual material entity is being referenced - or whether a material entity is being referenced at all. In the end, I hope what SWTech can help us do is provide a robust, shared means to express the semantic facts about the data collected, as well as providing a dynamic and semi-automatic means to improve our characterization of the fundamentals - semi-automatic in the sense of "augmentation" of human intellectual abilities along the lines pursued by Doug Engelbart and Vanevar Bush before him. If we can devise a technical infrastructure allowing the formal, shared, semantic description of data to evolve toward an ever converging sense of what the true underlying entities are, then many of the misgivings folks have regarding the use of ontological frameworks to formally express semantic information will very likely fade. Cheers, Bill **Biophysicists who study ion-channel kinetics, protein folding dynamics, rhodopsin-based photon detection, mitochondrial energy transfer, etc. would probably also include quantum level formalisms to represent the states and dynamics of atoms, electrons, and sub- atomic particles. On Aug 22, 2006, at 3:57 PM, Marja Koivunen wrote: > > I agree, consistent use of terms makes life easier for machines and > for humans too when the terms have been agreed on, learned, and > understood. Unfortunately, this takes a lot of effort and > dedication from the humans. Learning a whole ontology before > anything can be done is a bit like reading the whole manual of a > DVD player before one can use that. And we all know that while > there are people who actually read the whole manual, they are a > minority. > > As a usability person I always like to see the machines support the > humans as much as possible and not vice versa. > In my view, new inventions often start from not so great terms and > evolve stepwise as learning happens. Often terms are first shared > and polished in small groups and later links are made between > groups that may use different terminologies for similar things. If > we want to support humans doing inventions I think we should > support the use of different terms, their evolution, and making > connections between similar terms when they are discovered as much > as possible. And I think Semantic Web is great for that. > > Marja > > Tim Berners-Lee wrote: > >> >> Yes, indeed. Machine processing of information relies on >> consistent usage of terms. You can't reuse information for >> new problems when its use requires human intervention to >> disambiguate it. >> >> Tim Berners-Lee >> >> On Aug 10, 2006, at 21:54, wangxiao@musc.edu wrote: >> >>> >>> Quoting "Miller, Michael D (Rosetta)" >>> <Michael_Miller@Rosettabio.com>: >>> >>>> You're correct here but it is the state of the art. Interestingly >>>> enough, I've found that in general the biology-based scientists and >>>> investigators are not all that bothered by this confusion and >>>> despite >>>> the confusion seem to make their way through it. >>> >>> >>> The problem is that semantic web is intended to make machine to >>> understand. And >>> the clarity is a prerequisite to instruct machine unambigously. >>> >>> Xiaoshu >>> >> >> > > Bill Bug Senior Research Analyst/Ontological Engineer Laboratory for Bioimaging & Anatomical Informatics www.neuroterrain.org Department of Neurobiology & Anatomy Drexel University College of Medicine 2900 Queen Lane Philadelphia, PA 19129 215 991 8430 (ph) 610 457 0443 (mobile) 215 843 9367 (fax) Please Note: I now have a new email - William.Bug@DrexelMed.edu This email and any accompanying attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Wednesday, 23 August 2006 06:44:58 UTC