W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2009

Re: UMTHES and SKOS-XL

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sat, 24 Oct 2009 17:18:05 -0400
Message-ID: <4AE36F0D.4070301@few.vu.nl>
To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: SKOS <public-esw-thes@w3.org>
Dear Thomas,

The discussion has gone quite wild, I see :-)
I'll try to come back to the original UMTHES issue, first...

> Speaking in ISO Thesaurus lingo: we do not want inflectional forms etc. to become entry terms.

> Why then do we need all those lexical variants at all?
> At first, UMTHES just has them. It is my job to serialise UMTHES in SKOS, not to change UMTHES.
> Secondly, we need this stuff to support automated indexing of full text documents. Machine need to be enabled to detect the Concepts behind this weird mess of character strings that makes a document (more on this in the ecoterm presentation).


I think everything is here, and you don't need to say much more!
Especially the first sentence, which can be enough to define a practice (or actually remind it). I now see clearly the point in the example in your slides [2], where the main form xl:Label has dozens of variants in German. Having the knowledge of those could be counter-productive for many user-oriented applications but sophisticated NLP-based tools.

Please remind however of the hiddenLabel solution. I agree with your prejudice againts creating more instances of xl:Label, but if you see a slight chance that UMTHES could evolve towards an even more lexically intensive thing, having the instances of xl:Label could spare your some painful model change...

Picking some elements from your mail at the bottom:

> This usage for acronyms (se also Stellas example above) is just an 
> example, not part of the standard. We have considered to follow this 
> example in the beginning, but then we found "subproperty of 
> lexicalVariant" more convenient. It still conforms, as far as I see.


Yes!


> Why should we introduce such a complex linkage chain here and waste all 
> those recources needed to handle linked class instances instead of 
> simple string properties ?


The overhead is not really huge, in fact. I mean, it adds a fraction of all triples that you have already in UMTHES, it's not as if it mutliplied them by ten.

 
> Further more & may be more important, I see a considerabel semantic 
> difference between a term (label) and a spelling variant of a term. 
> That's why I do not want to handle them both equally on the model level.


Yes, but as SKOS would not make the distinction (other than treating them as hiddenLabel, whereas the others would be pref or alt labels) there would not b a strong counter-argument to it from the SKOS perspective. And from your more practical perspective, you could still create two sub-classes of xl:Label, a bit like what you hint at in your presentation, in fact.


>>> I guess we can have 
>>> several ways of handling a relation such as acronymy co-exist.
> I would appreciate this and I am expecting nothing else. There are more 
> patterns which have not been "harmonised" in SKOS, such as 
> norrowerPartitive etc. for good reasons. I don't think this is a 
> problem. Any standard should give room for some diversity at its borders.
>>> But well, having one of the first XL deployments departing from the 
>>> meager guidelines we had put in the Reference would not be a great 
>>> sign for us :-/
> why this? The paatern you recommend is not bad, but its usability 
> depends on the intentions of the thesaurus provdiders.
> Anyway, I can think about this for acronyms.


You're right, much of that depends on the intention of thesaurus providers. And the pattern we had is certainly not intended as normative. 


>>> 1. Are you planning to add the language tag that seem to be missing 
>>> on some slides (e.g. for the ext:inflection objects) in the real data?
> I can do so theough this is not what we want to express. As each 
> xl:label has exactly one xl:literalForm, this necessarily has  a single 
> language.  From this can be infered that lexical variants of  this 
> literalForm have the same language.  This is what we want to express, 
> but I see no way to do this  in Turtle  or even savely in RDF/XML ...


Yes. The only way to proceed is to simulate that rule and by just putting the tags for all your literals that are in your data :-/

If you want to do it in a neat way, with rules, then you have to represent languages as full-fledged resources, and build axioms using them.
Note that there is some logic, in a way. You cannot expect the syntax to allow you to deal with something that seems very much at the model level, at least to me!

Cheers,

Antoine


> Dear Stella & Antoine,
> 
> Antoine has raised the essential issue, Stella came up with a related 
> use case which can be solved using the UMTHES patterns.
> UMTHES distinguishes not only prefLabel from altlabel, but also both 
> from multiple spelling conventions of any label.
> We see abbreviations/acronyms as part of such spelling conventions, 
> others are inflectional forms of the same term, or even common 
> misspellings. If we mix this all together into altLabel instances, it 
> would not make sense any more.
> 
> Stellas example about abbrev is similar, but we separate spelling 
> conventions ("lexical variants") from labels regardless whether they may 
> be pref or alt.
> Example:
> 
> :4711 rdf:type skos:Concept;
>    skos:prefLabel "waste water";
>    skos:altLabel "sewage".
> 
> makes sense, but
> 
> #not recommended:
> :4711 rdf:type skos:Concept;
>    skos:prefLabel "waste water";
>    skos:prefLabel "waste waters";
>    skos:prefLabel "wastewater";
>    skos:prefLabel "wastewaters";
>    skos:altLabel "sewage".
> 
> looks at least somehow "unballanced".
> 
> UMTHTES knows even more about lexical complexity (a really awful issue 
> in German), that is why we decided to use xl:Label extensions to 
> separate such complexity from the more prominent list of labels which 
> are directly assigned to a skos:Concept:
> 
> # hiding lexical complexity from the list of labels
> :wasteWater rdf:type skosxl:Label;
>    skosxl:literalForm "waste water";
>    ext:lexicalVariant "wastewater";
>    ext:lexicalVariant "wastewaters";
>    ext:compoundFrom (:waste :water).
> 
> Speaking in ISO Thesaurus lingo: we do not want inflectional forms etc. 
> to become entry terms.
> (see 
> http://www.w3.org/2004/02/skos/core/proposals.html#thesaurusRepresentation-11 
> ...)
> 
> This is also why we really do not want to have a property chain from a 
> ext:lexicalVariant to a skos:Concept.
> We appreciate the property chain from the skosxl:literalForm to the 
> skos:Concept.
> 
> Why then do we need all those lexical variants at all?
> At first, UMTHES just has them. It is my job to serialise UMTHES in 
> SKOS, not to change UMTHES.
> Secondly, we need this stuff to support automated indexing of full text 
> documents. Machine need to be enabled to detect the Concepts behind this 
> weird mess of character strings that makes a document (more on this in 
> the ecoterm presentation).
> 
> See some more notes inline below.
> 
> Stella Dextre Clarke schrieb:
>> Ah yes. We discovered a similar problem during work on BS 8723. It was 
>> about whether to introduce a specialisation of USE/UF to cater for 
>> abbreviations/acronyms and their expansions, for which you might use 
>> tags such as AB/FT. A problem arises when the abbreviation is short 
>> for another non-preferred term rather than the preferred term.
>> (For example, the preferred term "Information and communication 
>> technology" can have non-preferred terms "Information technology", 
>> "IT" and "ICT")
>> It becomes apparent that the proposed specialisation is not really a 
>> type of USE/UF. It is an inter-term relationship that can sometimes 
>> apply between non-preferred terms. Obviously it is possible to find a 
>> way of representing this accurately, but at the expense of making the 
>> whole model more complicated and the tagging conventions more cumbersome.
>>
>> My personal view on this is that if you try to add more value in the 
>> shape of lexical/terminological information, you lose the virtue of 
>> simplicity. To put it another way, if you have mixed objectives 
>> (trying to achieve  terminological objectives as well as enabling 
>> information retrieval) these tend to detract from each other.
> right. If someone only wants the pure thesaurus, she might get along 
> with the skos: part of UMTHES only and simply ignore the skosxl:+extensions.
> Cudos to the property chain which Antoine has mentioned, each 
> skosxl:literalForm is equivalent to a directly asigned skos:pref/altLabel.
> So, nothing would be missing.
> 
>>
>> Cheers
>> Stella
>>
>> *****************************************************
>> Stella Dextre Clarke
>> Information Consultant
>> Luke House, West Hendred, Wantage, OX12 8RR, UK
>> Tel: 01235-833-298
>> Fax: 01235-863-298
>> stella@lukehouse.org
>> *****************************************************
>>
>>
>> Antoine Isaac wrote:
>>> Hi everyone,
>>>
>>> I'm putting here a discussion we started with Thomas Brandholtz on 
>>> UMTHES [1] on the use of SKOS-XL there (see slides at [2]). A long 
>>> mail, but it can be interesting for a wider audience, as UMTHES is 
>>> one of the first SKOS-XL implementations!
>>>
>>> ===
>>>
>>> Dear Thomas,
>>>
>>> So let's go. The main issue I have is that xl:Label is used in a very 
>>> "term-oriented" way in UMTHES.
>>> More precisely, I feel that you are using labels to aggregate lexical 
>>> entities which which indeed are belonging to the same "term". But 
>>> these literals be introduced as labels in basic SKOS, I think. Trying 
>>> to use a concrete example from your slides:
>>>
>>> :4711 rdf:type skos:Concept;
>>>    skosxl:prefLabel :wasteWater.
>>>
>>> :wasteWater rdf:type skosxl:Label;
>>>    skosxl:literalForm "waste water";
>>>    ext:lexicalVariant "wastewater";
>>>    ext:compoundFrom (:waste :water).
>>>
>>> "wastewater" is introduced as a lexical variant of "waste water". Per 
>>> se, this is of course ok.
>>> But in basic SKOS, I would have modelled that "wastewater" as a 
>>> skos:altLabel or a skos:hiddenLabel of :4711. As not attaching that 
>>> string to an instance of xl:Label using xl:literalForm prevents you 
>>> from benefitting from the useful property chains given in XL. So I 
>>> would have represented "wastewater" as an instance of xl:Label.
>>>
>>> Of course, you may object that you can declare yourself a property 
>>> chain (or property chains) that would allow to infer that the 
>>> literals that are objects of ext:lexicalVariant triples (or the ones 
>>> involving sub-properties of ext:lexicalVariant) are also objects of 
>>> skos:hiddenLabel (or skos:altLabel) statements attached to the 
>>> skos:Concept to which your xl:Label is attached.
> as said, above: we do not want such property chains.
> Anyway, hiddenLabel might also hide the lexical complexity, this might 
> be an idea.
> But I don't like the idea of creating thousands of xl:Label class 
> instances when each of them only carries "exactly one" xl:literalForm 
> and I do not really need class instances for anything else.

>>> But then I'd be still uncomfortable with an xl:Label giving raise to 
>>> several (SKOS-basic) labels.
>>> Additionally, we actually introduced xl:labelRelation to handle cases 
>>> like acronyms [1]. In your approach, acronym is a subproperty of 
>>> lexicalVariant, which is clearly a different pattern from ours.
> This usage for acronyms (se also Stellas example above) is just an 
> example, not part of the standard. We have considered to follow this 
> example in the beginning, but then we found "subproperty of 
> lexicalVariant" more convenient. It still conforms, as far as I see.
>>>
>>> As I feel it, your choice may be prefectly grounded in terminology. 
>>> Still, I'd be curious to hear whether this is a strong position of 
>>> yours, or if you could accomodate a different pattern.
>>>
>>> Maybe there can be indeed a solution accomodating both points of view 
>>> (if I interpreted one correctly, of course). Namely, introducing 
>>> "wastewater" as the literalForm of an xl:Label which is not connected 
>>> to any concept; just connected (by an ext:lexicalVariant which would 
>>> be then a sub-property of xl:labelRelation) to :wasteWater.
> Why should we introduce such a complex linkage chain here and waste all 
> those recources needed to handle linked class instances instead of 
> simple string properties ?
> 
> Further more & may be more important, I see a considerabel semantic 
> difference between a term (label) and a spelling variant of a term. 
> That's why I do not want to handle them both equally on the model level.
> 
>>>
>>> Of course you can say then that the distinction between "waste water" 
>>> and "wastewater" is something very important for your UMTHES and the 
>>> applications you envision with it, and that "wastewater" should never 
>>> be used as a basic concept label, even a hidden one. Or not even 
>>> interpreted as something that could be a label...
> see above
>>>
>>> You can also argue that the xl:Label story is quite thin in the SKOS 
>>> Reference anyway, and that you can use that class as a purely 
>>> technical hook for any purpose. That's indeed not far from being the 
>>> truth, and if all are rightfully motivated, well, I guess we can have 
>>> several ways of handling a relation such as acronymy co-exist.
> I would appreciate this and I am expecting nothing else. There are more 
> patterns which have not been "harmonised" in SKOS, such as 
> norrowerPartitive etc. for good reasons. I don't think this is a 
> problem. Any standard should give room for some diversity at its borders.
> 
>>> But well, having one of the first XL deployments departing from the 
>>> meager guidelines we had put in the Reference would not be a great 
>>> sign for us :-/
> why this? The paatern you recommend is not bad, but its usability 
> depends on the intentions of the thesaurus provdiders.
> Anyway, I can think about this for acronyms.
>>>
>>>
>>> Apart from this issue of ext:literalVariant and its sub-properties, I 
>>> found the rest really good, confirming my first enthusiastic reaction 
>>> after your talk :-)
>>>
>>> Two comments/questions, maybe:
>>>
>>> 1. Are you planning to add the language tag that seem to be missing 
>>> on some slides (e.g. for the ext:inflection objects) in the real data?
> I can do so theough this is not what we want to express. As each 
> xl:label has exactly one xl:literalForm, this necessarily has  a single 
> language.  From this can be infered that lexical variants of  this 
> literalForm have the same language.  This is what we want to express, 
> but I see no way to do this  in Turtle  or even savely in RDF/XML ...
>>>
>>> 2. Intuitively, I feel that the definition of :NonPreferredTerm (on 
>>> slide 33) is too strong. I would have said that everything that is 
>>> related via xl:altLabel to a concept cannot be a PreferredTerm. 
>>> Otherwise there would be a conflict with the inferred basic SKOS 
>>> labelling triples [2]. So the complementOf axiom would not be really 
>>> needed. 
> You may be right, I'll think this over, but now I have to go out for 
> dinner first :-)
> 
> Many thanks for your rich comments, Antone!
> 
> 
> Best regards, Thomas
>>> But again, it's late, and I prefer to send this mail rather than 
>>> letting you wait more time for my answer...
>>> Cheers,
>>>
>>> Antoine
>>>
>>> [1] http://www.w3.org/2006/07/SWD/track/issues/215
>>> [2] 
>>> http://eea.eionet.europa.eu/Public/irc/envirowindows/jad/library?l=/ecoinformatics_indicator/ecoterm_5-6102009/ecoterm09-bandholtzppt/_EN_1.0_&a=d 
>>>
>>>
>>>
>>>
>>
> 
> 
> -- 
> Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com 
> innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
> Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
> 
Received on Saturday, 24 October 2009 21:18:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:39:04 GMT