W3C home > Mailing lists > Public > public-swbp-wg@w3.org > October 2004

Re: [VM] Natasha and Alan

From: Alan Rector <rector@cs.man.ac.uk>
Date: Thu, 28 Oct 2004 12:37:39 +0100
Message-ID: <4180DA03.118CD2F3@cs.man.ac.uk>
To: Thomas Baker <thomas.baker@bi.fhg.de>
CC: Natasha Noy <noy@smi.stanford.edu>, best-practice <public-swbp-wg@w3.org>, Harold Solbrig <Solbrig.Harold@mayo.edu>

Thomas

I am something of a stand-in for the experience of the medical community.  There are several relevant  groups to point at although I am not in a position to write a definitive paper on any of them:

SNOMED-CT (and its ancestor in the UK Clinical Terms) - the officially mandated terminology

 HL7 (the main healthcare information standards body) Vocabulary group which manages a variety of terminologies.

The National Cancer Institute's Metathesaurus

The Unified Medical Language system

The Gene Ontology and the Open Bio Ontologies (OBO) group more generally.

In all these cases the interest would be in the principles, issues and use cases encountered, e.g. tracking versions, handling of retired terms, handling of splitting and joining terms, when a change of a label indicates a new concept, etc.  None of them use URIs as identifiers.  HL7 uses OIDs, SNOMED has its own system of 64 bit - or perhaps now more - identifiers partitioned up so as to allow a name-space like construct.  All make a sharp separation between 'concept' and 'term', and use separate 'nonsemantic' identifiers for each.   All use "pre-web" technology.

The other work in the medical area which may have an impact in this area is Mayo Clinic's work on the terminology services engine, CTS.    The best thing on that is to contact Harold Solbrig directly whom I am copying on this email.

SNOMED had  several papers on their updating procedure that used to be on their web site during development, but they seem to have disappeared.  I can see if I can retrieve any of them, although they may be considered commercial confidential.

The strongest line in the medical community - because group after group has had to rediscover it or at least rediscover that the community was right - is to separate the "concept" and "term" and to use "meaningless" identifiers of some sort which are not subject to problems of arguments over spelling, differences in usage between communities, etc.   Also never, ever, to re-use an identifier.

I am not quite sure how this translates into the Semantic Web world.  The rdf:label property is a relatively weak mechanism, e.g. as far as I know  it is just string valued and there is no inverse, so that the notion of going from label to concept isn't really supported, nor any notion of labels in their own right.  There is also no obvious notion of "preferred term" - something the medical community has found essential as soon as you start to allow multiple labels.  Potentially the language field could be used for this purpose, but this really needs testing against the use cases.

The classic paper, although way pre-web is Cimino's Desiderata paper Methods of Information in Medicine37(4-5): pp. 394-403. available from  http://www.dbmi.columbia.edu/cimino/Publications/1998%20-%20Meth%20Inf%20Med%20-%20Desiderata%20for%20Controlled%20Medical%20Vocabularies%20in%20the%20Twenty-First%20Century.pdf
Some of the issues are clearly parochial to medicine, but others seem entirely general.

Regards

Alan

Thomas Baker wrote:

> Natasha, Alan,
>
> On the basis of previous discussion of the VM draft, it
> was clear to me exactly what role you would want to have in
> the paper.
>
> Note that in its current form, the draft is set up to cite
> practices from a handful of example vocabularies (FOAF, Dublin
> Core, SKOS, etc) to illustrate some general principles.
>
> I was wondering whether, between the two of you, there is
> a major vocabulary in medicine or the life sciences about
> which you would be willing to provide some explanatory text
> and references (with regard to use of URI references, formal
> schemas, etc).  Having one really large-scale, tightly designed
> ontology would be perfect to round out the set of examples.
>
> Any other ideas most welcome.
>
> Thank you,
> Tom
>
> >      -- A major medical or life-sciences vocabulary?
> >         TASK: Alan or Natasha - An example of a large-scale ontology?
> >            Do we perhaps need another major example?  It would
> >            be good to have a "large-scale" vocabulary of the
> >            "ontology" sort, preferably with some well-defined
> >            maintenance and versioning policies...
>
> --
> Dr. Thomas Baker                        Thomas.Baker@izb.fraunhofer.de
> Institutszentrum Schloss Birlinghoven         mobile +49-160-9664-2129
> Fraunhofer-Gesellschaft                          work +49-30-8109-9027
> 53754 Sankt Augustin, Germany                    fax +49-2241-144-2352
> Personal email: thbaker79@alumni.amherst.edu

--
Alan L Rector
Professor of Medical Informatics
Department of Computer Science
University of Manchester
Manchester M13 9PL, UK
TEL: +44-161-275-6188/6149/7183
FAX: +44-161-275-6236/6204
Room: 2.88a, Kilburn Building
email: rector@cs.man.ac.uk
web: www.cs.man.ac.uk/mig
        www.opengalen.org
        www.clinical-escience.org
        www.co-ode.org
Received on Thursday, 28 October 2004 11:37:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:13 GMT