Re: Modeling acronym and abbreviation Labels scenario from Stella Dextre Clarke on 2013-02-24 (public-esw-thes@w3.org from February 2013)

From: Stella Dextre Clarke <stella@lukehouse.org>
Date: Sun, 24 Feb 2013 18:30:28 +0000
To: Antoine Isaac <aisaac@few.vu.nl>
CC: public-esw-thes@w3.org
Message-ID: <512A5C44.1000903@lukehouse.org>
Dear Bradley/Antoine,
I suspect that Antoine intended to write:
<skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
   <skos:prefLabel xml:lang="en">Triple Data Encryption 
Algorithm</skos:prefLabel>
   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
   <skos:altLabel xml:lang="en">Triple DES</skos:altLabel>
   <skos:notation 
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">3DES</skos:notation>

As well as the small correction in the above, I wonder why "3DES" would 
be handled as a notation. Bradley's message does not mention needing a 
notation. A notation has a different function from either abbreviations 
or acronyms.
Why not treat all the respectable alternatives as altlabel, thus:

<skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
   <skos:prefLabel xml:lang="en">Triple Data Encryption 
Algorithm</skos:prefLabel>
   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
   <skos:altLabel xml:lang="en">Triple DES</skos:altLabel>
   <skos:altLabel xml:lang="en">3DES</skos:altLabel>

If you do this, you fail to declare whether 3DES is an abbreviation or 
an acronym or a synonym or a near-synonym or a common name or a 
scientific name or etc, but in most applications it is unnecessary to 
specify what kind of a non-preferred term it is. Just ignore this 
suggestion If you DO need to pick out those non-preferred terms that are 
acronyms or abbreviations (or if you cannot accept more than one 
ordinary non-preferred term).
Stella

*****************************************************
Stella Dextre Clarke
Information Consultant and Chair, ISKO UK
Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
stella@lukehouse.org
*****************************************************




On 24/02/2013 16:27, Antoine Isaac wrote:
> Dear Bradley,
>
> First sorry for the time it took...
>
> I'm actually not sure to understand the question. Are you searching 
> for more complex than
>
> <skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
>   <skos:prefLabel xml:lang="en">Triple Data Encryption 
> Algorithm</skos:prefLabel>
>   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
>   <skos:altLabel xml:lang="en">Triple DES</skos:prefLabel>
>   <skos:notation 
> rdf:datatype="http://www.w3.org/2001/XMLSchema#string">3DES</skos:notation>
>
> ?
> You can indeed create sub-properties of skos:prefLabel, skos:altLabel, 
> skos:hiddenLabel and skos:notation for representing the exact "flavor" 
> or your acronyms and abbreviations, but I'm not sure this is what you 
> really need, for simple text mining the occurrence of concepts in 
> documents.
>
> MADS/RDF offers finer grain. But similarly, I'm not sure you need it...
>
> Best,
>
> Antoine
>
>
>> Dear mailing list,
>>
>> I am trying to build a controlled vocabulary schema to be able to 
>> model something like RFC 4949 http://tools.ietf.org/html/rfc4949
>>
>> This controlled vocabulary has “separate” entries for the acronym, 
>> abbreviation, each slang/synonym, and canonical term. There are also 
>> deprecatedLabel.
>>
>> I do not want separate entries for each acronym/abbreviation as the 
>> MADs/rdf object properties hasAcronymVariant and 
>> hasAbbreviationVAriant suggests. Instead I want everything in one 
>> canonical entry. (reasons outline in Use Case Scenario below)
>>
>> For example in the RFC 4949, page 9 :
>>
>> prefLabel: Triple Data Encryption Algorithm
>>
>> hiddenLabel: Triple DEZ [I made up this slang]
>>
>> How would you model these 2 alternatives to the canonical Label in 
>> MADS/rdf?
>>
>> acronym:3DES
>>
>> abbreviation: Triple DES
>>
>> Use Case Scenario
>>
>> We want to build a master controlled vocabulary by text mining many 
>> glossaries such as RFC 4949. So we have to be able to process these 
>> varying labels and cross references.
>>
>> One approach is to model RFC 4949 using MDS/rdf as the specification 
>> suggests, and then use a some sort of inferencing/query to get the 
>> acronyms/abbreviations to “appear” as part of the canonical term 
>> using object properties. This leads to more term entries but makes it 
>> easy to text mine. This complicates XSLT transformation to .txt for 
>> further text mining.
>>
>> An alternate approach is to make one canonical entry for all label 
>> types for the text mining reason listed next which would simply the 
>> XSLT transformation from OWL to .txt
>>
>> We curate the multiple glossary inputs to ensure there is only one 
>> canonical idea presented ontologically/conceptually by a SME (either 
>> manually curate to ensure syntactically different labels for the same 
>> term are matched or SPARQL query to isolate duplicates or both 
>> techniques).
>>
>> Then we export the master term list as a .txt with preferred label, 
>> acronyms, symbols (QUDT ontology), abbreviations, and synonyms 
>> (altLabel). This acts as an input again for GATE so that we can text 
>> mine the true corpus that describes a product to build the knowledge 
>> base for that product.
>>
>> Right now our glossary has over 20,000 telecommunications terms (many 
>> complex and simple labels). So the design is important so we do not 
>> have a big job correcting populated design errors.
>>
>> Of course I can just model owl:acronym and owl:abbreviation under the 
>> approriate imported SKOS, SKOS-XL, and MADS/rdf data properties, but 
>> I would like to remain as close as possible to customary modeling.
>>
>> Any thoughts?
>>
>> *Bradley Shoebottom**
>> **Senior Information Architect – Research and Product Development*
>> Phone:*(506) 674-5439*| Toll-Free: *(800) 363-3358**
>> *Skype:*bradley.shoebottom*
>>
>> Email:*bradley.shoebottom@innovatia..net 
>> <mailto:bradley.shoebottom@innovatia.net>*
>>
>
>


--
Received on Sunday, 24 February 2013 18:31:02 UTC