Re: UMTHES and SKOS-XL and Others! from Christophe Dupriez on 2009-10-23 (public-esw-thes@w3.org from October 2009)

From: Christophe Dupriez <christophe.dupriez@destin.be>
Date: Fri, 23 Oct 2009 09:50:48 +0200
To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: Richard Light <richard@light.demon.co.uk>, Stella Dextre Clarke <stella@lukehouse.org>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Message-ID: <4AE16058.20905@destin.be>
Hi!

A few thoughts coming from this discussion:

* Indexing Authority List vs Existing Concepts Inventory: the MeSH is an 
example of merging both.
   In MeSH/UMLS, Concepts have their specific labels (terms) but they 
are grouped in micro-hierarchies to form an Heading entry.
  Example:
http://www.nlm.nih.gov/cgi/mesh/2010/MB_cgi?mode=&index=877&view=expanded
  I believe SKOS is able to represent most of MeSH attributes:
    * Concept Unique identifier is the "about"
    * Tree numbers (changing from one year to another) is a notation system
    * (Heading) Entry Unique Id is another notation system (an id within 
a sub-scheme)
    * Registry Number (CAS) is another notation system (an id within 
another scheme)
    * Terms are preferred labels or synonyms (depending of lexical tag 
value)
    * Scope Notes are SKOS Scope notes. The concept references within 
Scope Notes have to represented somehow.
    * Annotation and other are editor notes or other types of SKOS notes
    * Previous indexing: relatedMatch with older Heading Schemes?
  It remain to be found a good way to represent Semantic Types 
(collections?) and Allowable qualifiers (collections too? or SKOS 
extension?)
  In this example, a difficult problem is present: the Heading entry is 
a specific (and not a generic) of the two other "non preferred" concepts!

* Full Natural Language Processing needs a way to efficiently treat the 
EXCEPTIONS: the intuition believes that 80/20 rule is good enough.
   Reality is much more demanding: "small" linguistic errors are never 
accepted by humans (when visible: this is why Google does not document 
them!).
   So the representation of exceptions must be in the design of data 
structures for Natural Language Processing systems.
   It is their main use (the general 80% rules can even be hard coded).
   This is way too complex to be seen as a simple SKOS extension.

* Thesaurus "projection" over a text has been used with success to 
generate suggestions to human indexers (not for fully automatic indexation).
   It is very useful and it is true that having the necessary lexical 
information in a SKOS extension to achieve this would be nice.
   It is limited to the detection of nominal groups but it may have 
problems with different grammatical ways to express coordination between 
elementary concepts in a term.
   To succeed, this "extension" normalization effort should be done to 
define properties only for that precise purpose.

   In general, focused "purpose", open to the different applications 
with that purpose, is the only way to deliver a working standard...

I am very very sorry I cannot attend "Classification at CrossRoads" and 
the SKOS day, October the 30th in Nederlands: I hope to be able at 
another occasion.
I suppose the communications will be available?

Have a nice day!

Christophe Dupriez

Thomas Bandholtz a écrit :
> Dear Robert,
>>
>> I would say not.  "Machines detecting concepts" strikes me as an 
>> unachievable goal, certainly with our current capabilities.  
>> "Machines detecting the presence of words which are also terms in a 
>> thesaurus" is achievable, but it _isn't_ the same thing.
> Richard, when the machine has detected a term (which is quite easy so 
> far) there are some remaining problems to be solved. I give only two 
> examples:
>
>     * the term may be simply ambigous ( a homograph). It may designate
>       more than one Concept. Qualifiers may help in this case (Stella
>       mentioned this), but such qualifiers may not appear literally in
>       the same text context ...
>     * the term can designate a Concepts by itself, but it may also
>       occurr as part of a compound term which designates a different
>       Concept.
>
> You can add more cases.
> "Machines detecting concepts" means getting closer and closer towards 
> a save automatic decision in such cases.
> This will not be finalised by a "big bang", but it is not "an 
> unachievable goal" as you say.
> It is not yet achieved completely, but there are many approaches 
> coming closer every time you revisit them.
> Give this a little more time!
>
> Best regards,
> Thomas
>
> PS: On the other hand, if someone wants to to expose her knowledge to 
> the Semantic Web, she should use a formal language such as RDF 
> directly and not human lingo. This would make everything much easyer! 
> (Dreaming ;-)
>
>
> -- 
> Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com 
> innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
> Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
>
Received on Friday, 23 October 2009 07:44:45 UTC