Re: UMTHES and SKOS-XL and Others! from Antoine Isaac on 2009-10-24 (public-esw-thes@w3.org from October 2009)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sat, 24 Oct 2009 17:45:45 -0400
To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: Christophe Dupriez <christophe.dupriez@destin.be>, Richard Light <richard@light.demon.co.uk>, Stella Dextre Clarke <stella@lukehouse.org>, SKOS <public-esw-thes@w3.org>
Message-ID: <4AE37589.3060203@few.vu.nl>
Hello everyone,

Johan's suggestion
> There are three levels of organization.
> - Concepts (SKOS talk)
> - Labels
> - Text processing
makes sense indeed. As Thomas, however, I would think that the label layer falls at least partly in the SKOS(XL) scope. And in the ISO/BS one.

But to answer Stella specifically, on what I think belongs to the third point of Johan, 

> I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? 

I also think that this is a dangerous road to go.
I mean, I certainly think that the effort of representing lexical info is very useful. And I believe that it is possible to achieve interesting stuff based on that.
But for us (more simple KOS-oriented efforts like SKOS/ISO/BS) it would be better to just focus on:
- point to some initiatives, such as Wordnet and [1], which try to represent lexical information to allow NLP tools to work with.
- to allow those initiatives to be plugged onto our KOS-related efforts (or vice versa) by providing with the sufficient extension hooks. Which was the main rationale for SKOS-XL, in fact.

Trying to cope with all the required details is out of our scope, and I think, our expertise, even if ISO/BS committees have bright people involved ;-)
In fact finding a core model for lexical information modelling (such as [1]) is still an ongoing work, and there are multiple proposals around, which shows that it is indeed a complex.

Cheers,

Antoine

[1] http://code.google.com/p/lexinfo/

> Dear Christophe,
> 
> I am not familiar enough with the MeSH/UMLS schema to comment your SKOS
> mapping spontaneously.
> So i limit myself to your more general statements:
> 
>> * Full Natural Language Processing needs a way to efficiently treat
>> the EXCEPTIONS: the intuition believes that 80/20 rule is good enough.
>>    Reality is much more demanding: "small" linguistic errors are never
>> accepted by humans (when visible: this is why Google does not document
>> them!).
>>    So the representation of exceptions must be in the design of data
>> structures for Natural Language Processing systems.
>>    It is their main use (the general 80% rules can even be hard coded).
>>    This is way too complex to be seen as a simple SKOS extension.
> 
> I agree, more or less. SKOS is not made to express rules. But you may
> enhance xl:Label instances with certain linguistic data (specific to the
> given language) in order to enable NLP systems getting along with the
> remaining 20%. At least this is what we try in UMTHES.
> 
>> * Thesaurus "projection" over a text has been used with success to
>> generate suggestions to human indexers (not for fully automatic
>> indexation).
> 
> In practise, we once buildt a wizzard making suggestions to human
> indexers, and after some tests people used it as a fully automatic
> indexation.
> This was not because the wizzard would have been perfect, it was because
> 80% (or even 70) were found to be "good enough". This depends strongly
> on the use case.
> 
>>    It is very useful and it is true that having the necessary lexical
>> information in a SKOS extension to achieve this would be nice.
>>    It is limited to the detection of nominal groups but it may have
>> problems with different grammatical ways to express coordination
>> between elementary concepts in a term.
>>    To succeed, this "extension" normalization effort should be done to
>> define properties only for that precise purpose
> 
> Can this be "normalized". I don't see any normalized NLP methods, so I
> wonder how we can normalize the properties that will support such
> methods. Do you have something in mind?
> 
>>    In general, focused "purpose", open to the different applications
>> with that purpose, is the only way to deliver a working standard...
> 
> To me any real world conceptScheme is an individual to a certain extent.
> SKOS (XL included) covers the common patterns and gives room for
> necessarily individual extensions. Over time, we might discover more
> common patterns even in the individuality of each scheme, but some
> diversity will always remain. I don't think this is a problem.
> 
> Referring to the UMTHES extensions, it was not the intension to provide
> a standardisation proposal.
> UMTHES just needs a lossless RDF serialisation making the most of SKOS
> and extending it for our specific demands, and we need all this now.
> But I would be enthusiastic about some future extensions of SKOS towards
> linguistics and NLP support, if they may arise from this discussion.
> 
> Kind regards,
> Thomas
>
Received on Saturday, 24 October 2009 21:46:33 UTC