Re: UMTHES and SKOS-XL and Others! from Thomas Bandholtz on 2009-10-24 (public-esw-thes@w3.org from October 2009)

From: Thomas Bandholtz <thomas.bandholtz@innoq.com>
Date: Sat, 24 Oct 2009 19:19:46 +0200
To: Christophe Dupriez <christophe.dupriez@destin.be>
CC: Richard Light <richard@light.demon.co.uk>, Stella Dextre Clarke <stella@lukehouse.org>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Message-ID: <4AE33732.6070200@innoq.com>

Dear Christophe,

I am not familiar enough with the MeSH/UMLS schema to comment your SKOS
mapping spontaneously.
So i limit myself to your more general statements:

>
> * Full Natural Language Processing needs a way to efficiently treat
> the EXCEPTIONS: the intuition believes that 80/20 rule is good enough.
>    Reality is much more demanding: "small" linguistic errors are never
> accepted by humans (when visible: this is why Google does not document
> them!).
>    So the representation of exceptions must be in the design of data
> structures for Natural Language Processing systems.
>    It is their main use (the general 80% rules can even be hard coded).
>    This is way too complex to be seen as a simple SKOS extension.

I agree, more or less. SKOS is not made to express rules. But you may
enhance xl:Label instances with certain linguistic data (specific to the
given language) in order to enable NLP systems getting along with the
remaining 20%. At least this is what we try in UMTHES.

>
> * Thesaurus "projection" over a text has been used with success to
> generate suggestions to human indexers (not for fully automatic
> indexation).

In practise, we once buildt a wizzard making suggestions to human
indexers, and after some tests people used it as a fully automatic
indexation.
This was not because the wizzard would have been perfect, it was because
80% (or even 70) were found to be "good enough". This depends strongly
on the use case.

>    It is very useful and it is true that having the necessary lexical
> information in a SKOS extension to achieve this would be nice.
>    It is limited to the detection of nominal groups but it may have
> problems with different grammatical ways to express coordination
> between elementary concepts in a term.
>    To succeed, this "extension" normalization effort should be done to
> define properties only for that precise purpose

Can this be "normalized". I don't see any normalized NLP methods, so I
wonder how we can normalize the properties that will support such
methods. Do you have something in mind?

>
>    In general, focused "purpose", open to the different applications
> with that purpose, is the only way to deliver a working standard...

To me any real world conceptScheme is an individual to a certain extent.
SKOS (XL included) covers the common patterns and gives room for
necessarily individual extensions. Over time, we might discover more
common patterns even in the individuality of each scheme, but some
diversity will always remain. I don't think this is a problem.

Referring to the UMTHES extensions, it was not the intension to provide
a standardisation proposal.
UMTHES just needs a lossless RDF serialisation making the most of SKOS
and extending it for our specific demands, and we need all this now.
But I would be enthusiastic about some future extensions of SKOS towards
linguistics and NLP support, if they may arise from this discussion.

Kind regards,
Thomas

-- 
Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com 
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491

Received on Saturday, 24 October 2009 17:20:18 UTC