- From: Antoine Isaac <aisaac@few.vu.nl>
- Date: Sat, 24 Oct 2009 17:45:45 -0400
- To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
- CC: Christophe Dupriez <christophe.dupriez@destin.be>, Richard Light <richard@light.demon.co.uk>, Stella Dextre Clarke <stella@lukehouse.org>, SKOS <public-esw-thes@w3.org>
Hello everyone, Johan's suggestion > There are three levels of organization. > - Concepts (SKOS talk) > - Labels > - Text processing makes sense indeed. As Thomas, however, I would think that the label layer falls at least partly in the SKOS(XL) scope. And in the ISO/BS one. But to answer Stella specifically, on what I think belongs to the third point of Johan, > I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? I also think that this is a dangerous road to go. I mean, I certainly think that the effort of representing lexical info is very useful. And I believe that it is possible to achieve interesting stuff based on that. But for us (more simple KOS-oriented efforts like SKOS/ISO/BS) it would be better to just focus on: - point to some initiatives, such as Wordnet and [1], which try to represent lexical information to allow NLP tools to work with. - to allow those initiatives to be plugged onto our KOS-related efforts (or vice versa) by providing with the sufficient extension hooks. Which was the main rationale for SKOS-XL, in fact. Trying to cope with all the required details is out of our scope, and I think, our expertise, even if ISO/BS committees have bright people involved ;-) In fact finding a core model for lexical information modelling (such as [1]) is still an ongoing work, and there are multiple proposals around, which shows that it is indeed a complex. Cheers, Antoine [1] http://code.google.com/p/lexinfo/ > Dear Christophe, > > I am not familiar enough with the MeSH/UMLS schema to comment your SKOS > mapping spontaneously. > So i limit myself to your more general statements: > >> * Full Natural Language Processing needs a way to efficiently treat >> the EXCEPTIONS: the intuition believes that 80/20 rule is good enough. >> Reality is much more demanding: "small" linguistic errors are never >> accepted by humans (when visible: this is why Google does not document >> them!). >> So the representation of exceptions must be in the design of data >> structures for Natural Language Processing systems. >> It is their main use (the general 80% rules can even be hard coded). >> This is way too complex to be seen as a simple SKOS extension. > > I agree, more or less. SKOS is not made to express rules. But you may > enhance xl:Label instances with certain linguistic data (specific to the > given language) in order to enable NLP systems getting along with the > remaining 20%. At least this is what we try in UMTHES. > >> * Thesaurus "projection" over a text has been used with success to >> generate suggestions to human indexers (not for fully automatic >> indexation). > > In practise, we once buildt a wizzard making suggestions to human > indexers, and after some tests people used it as a fully automatic > indexation. > This was not because the wizzard would have been perfect, it was because > 80% (or even 70) were found to be "good enough". This depends strongly > on the use case. > >> It is very useful and it is true that having the necessary lexical >> information in a SKOS extension to achieve this would be nice. >> It is limited to the detection of nominal groups but it may have >> problems with different grammatical ways to express coordination >> between elementary concepts in a term. >> To succeed, this "extension" normalization effort should be done to >> define properties only for that precise purpose > > Can this be "normalized". I don't see any normalized NLP methods, so I > wonder how we can normalize the properties that will support such > methods. Do you have something in mind? > >> In general, focused "purpose", open to the different applications >> with that purpose, is the only way to deliver a working standard... > > To me any real world conceptScheme is an individual to a certain extent. > SKOS (XL included) covers the common patterns and gives room for > necessarily individual extensions. Over time, we might discover more > common patterns even in the individuality of each scheme, but some > diversity will always remain. I don't think this is a problem. > > Referring to the UMTHES extensions, it was not the intension to provide > a standardisation proposal. > UMTHES just needs a lossless RDF serialisation making the most of SKOS > and extending it for our specific demands, and we need all this now. > But I would be enthusiastic about some future extensions of SKOS towards > linguistics and NLP support, if they may arise from this discussion. > > Kind regards, > Thomas >
Received on Saturday, 24 October 2009 21:46:33 UTC