- From: Thomas Bandholtz <thomas.bandholtz@innoq.com>
- Date: Thu, 22 Oct 2009 22:40:05 +0200
- To: Johan De Smedt <Johan.De-smedt@tenforce.com>
- CC: Stella Dextre Clarke <stella@lukehouse.org>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Hi Johan, > Suggestion: There are three levels of organization. > - Concepts (SKOS talk) > - Labels > - Text processing > Good idea! I would add: Labels are skosxl, text processing is not yet really covered by skos(xl), but can be supported by extending skosxl locally. > A significant part of the issues discussed related to what is on the label management level > and what is on the text processing level (thus needing a proper definition) > > Language specific text processing and analysis (including inflection) > seems to me a specialized area for which global resource (language dictionalries) > like word-net can solve. > http://wordnet.princeton.edu/wordnet/ starts with this sentence: "WordNet® is a large lexical database of English". Right. We have more than 20 languages in European GEMET. Believe me, when it comes to language specific text processing, English is the most simple language. > Stemmeng, also is in this area. > It seems to me costly if this would be managed in every thesaurus. > It is costly, sure, but as I have expressed before, UMTHES has already invested in this, and the question now is how to express the results in a skosxl extension, but not: should UMTHES forget all the results of this investment. You are right in one point: In general, a thesaurus needs not to care about this. It is not a general requirement. But language specific text processing needs to be solved on a language specific level by someone somehow. > Label management can focus on standard terms and term decomposition as relevant within a > thesaurus or taxonomy. (equivalence relation, compound equivalence, acronym, > short-name, qualifiers ...) > Right so far. What we try to handle is: each of such terms (=labels) has multiple spelling conventions, and a spelling variant does not make a different term on the same level. May be this is specific to some languages only and not such an issue in English. > Indexing and search engines combining thesaurus and text processing should can use the label > management layer (of the thesaurus) to configure the relevant text processing. > I think this needs a third, dedicated layer. > Concept and label processing surely belong to the thesaurus/taxonomy/... management. > Text processing, I would suggest, is in the text processing engines. > Right, but text processing engines need some structure to express the diversity of term (Label) ocurrence in natural language. > PS: > - thanks for the UMTHES presentation - very instructive. > Thanks for the flowers, I tried hard to provide some valuable contribution. As always, one has to surrender at some point of complexity (just to be on time for the meeting) and leave the rest to the next presentation, ... > - would it be an idea to build on further SKOS extensions to have common schema for > artefacts like equivalence relation and compound equivalence; or for specializing > some xl:labelRelation ? > I think we should collect more examples and patterns, and we should not try to harmonise this too striktly. What we tried to implement in UMTHES: seperate a pure SKOS CORE representation which everybody can handle from a somehow more experimental (admitted) extension which goes beyound established skos(xl) patterns. But for UMTHES need it now (!) as an exchange format in a real production scenario, so we cannot wait. Thanks Johan for your comments, really helpful to think this over more thoroughly! -- Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
Received on Thursday, 22 October 2009 20:40:41 UTC