- From: Thomas Bandholtz <thomas.bandholtz@innoq.com>
- Date: Sat, 24 Oct 2009 19:19:46 +0200
- To: Christophe Dupriez <christophe.dupriez@destin.be>
- CC: Richard Light <richard@light.demon.co.uk>, Stella Dextre Clarke <stella@lukehouse.org>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Dear Christophe, I am not familiar enough with the MeSH/UMLS schema to comment your SKOS mapping spontaneously. So i limit myself to your more general statements: > > * Full Natural Language Processing needs a way to efficiently treat > the EXCEPTIONS: the intuition believes that 80/20 rule is good enough. > Reality is much more demanding: "small" linguistic errors are never > accepted by humans (when visible: this is why Google does not document > them!). > So the representation of exceptions must be in the design of data > structures for Natural Language Processing systems. > It is their main use (the general 80% rules can even be hard coded). > This is way too complex to be seen as a simple SKOS extension. I agree, more or less. SKOS is not made to express rules. But you may enhance xl:Label instances with certain linguistic data (specific to the given language) in order to enable NLP systems getting along with the remaining 20%. At least this is what we try in UMTHES. > > * Thesaurus "projection" over a text has been used with success to > generate suggestions to human indexers (not for fully automatic > indexation). In practise, we once buildt a wizzard making suggestions to human indexers, and after some tests people used it as a fully automatic indexation. This was not because the wizzard would have been perfect, it was because 80% (or even 70) were found to be "good enough". This depends strongly on the use case. > It is very useful and it is true that having the necessary lexical > information in a SKOS extension to achieve this would be nice. > It is limited to the detection of nominal groups but it may have > problems with different grammatical ways to express coordination > between elementary concepts in a term. > To succeed, this "extension" normalization effort should be done to > define properties only for that precise purpose Can this be "normalized". I don't see any normalized NLP methods, so I wonder how we can normalize the properties that will support such methods. Do you have something in mind? > > In general, focused "purpose", open to the different applications > with that purpose, is the only way to deliver a working standard... To me any real world conceptScheme is an individual to a certain extent. SKOS (XL included) covers the common patterns and gives room for necessarily individual extensions. Over time, we might discover more common patterns even in the individuality of each scheme, but some diversity will always remain. I don't think this is a problem. Referring to the UMTHES extensions, it was not the intension to provide a standardisation proposal. UMTHES just needs a lossless RDF serialisation making the most of SKOS and extending it for our specific demands, and we need all this now. But I would be enthusiastic about some future extensions of SKOS towards linguistics and NLP support, if they may arise from this discussion. Kind regards, Thomas -- Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
Received on Saturday, 24 October 2009 17:20:18 UTC