W3C home > Mailing lists > Public > public-esw-thes@w3.org > February 2013

RE: Modeling acronym and abbreviation Labels scenario

From: Bradley Shoebottom <bradley.shoebottom@Innovatia.net>
Date: Mon, 25 Feb 2013 13:29:51 +0000
To: Stella Dextre Clarke <stella@lukehouse.org>, Antoine Isaac <aisaac@few.vu.nl>
CC: "public-esw-thes@w3.org" <public-esw-thes@w3.org>
Message-ID: <1B8EDAD4532ABF41A819B3E5845062DB33106EDD@MBX245.domain.local>

I can't use a generic altLabel for my acronym because I want to also use this controlled vocabulary to generate actual definitions with the format:

Preferred Label (acronym/Abbreviation)
Synonyms/Alternate Labels
Definition 1, Broader/Narrower or associated concept
Definition 2, Broader/Narrower or associated concept
Definition n.....

And then in a generated html glossary, I would get separate acronym, abbreviation and synonyms entries generated with a reference back to the original term
Some acronyms will have potential several preferred terms so I will eventually do like Wikipedia does and list the context the Acronym appears so that users can go to the correct preferred term.

Further, our text mining pipeline has algorithms that are "trained" to link acronyms occurring in certain contexts to specific concepts based on the concepts occurring in the sentence/paragraph.

Bradley Shoebottom
Senior Information Architect - Research and Product Development
Phone: (506) 674-5439   |   Toll-Free: (800) 363-3358
Skype: bradley.shoebottom
Email: bradley.shoebottom@innovatia.net 


-----Original Message-----
From: Stella Dextre Clarke [mailto:stella@lukehouse.org] 
Sent: Sunday, February 24, 2013 2:30 PM
To: Antoine Isaac
Cc: public-esw-thes@w3.org
Subject: Re: Modeling acronym and abbreviation Labels scenario

Dear Bradley/Antoine,
I suspect that Antoine intended to write:
<skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
   <skos:prefLabel xml:lang="en">Triple Data Encryption Algorithm</skos:prefLabel>
   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
   <skos:altLabel xml:lang="en">Triple DES</skos:altLabel>

As well as the small correction in the above, I wonder why "3DES" would be handled as a notation. Bradley's message does not mention needing a notation. A notation has a different function from either abbreviations or acronyms.
Why not treat all the respectable alternatives as altlabel, thus:

<skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
   <skos:prefLabel xml:lang="en">Triple Data Encryption Algorithm</skos:prefLabel>
   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
   <skos:altLabel xml:lang="en">Triple DES</skos:altLabel>
   <skos:altLabel xml:lang="en">3DES</skos:altLabel>

If you do this, you fail to declare whether 3DES is an abbreviation or an acronym or a synonym or a near-synonym or a common name or a scientific name or etc, but in most applications it is unnecessary to specify what kind of a non-preferred term it is. Just ignore this suggestion If you DO need to pick out those non-preferred terms that are acronyms or abbreviations (or if you cannot accept more than one ordinary non-preferred term).

Stella Dextre Clarke
Information Consultant and Chair, ISKO UK Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298

On 24/02/2013 16:27, Antoine Isaac wrote:
> Dear Bradley,
> First sorry for the time it took...
> I'm actually not sure to understand the question. Are you searching 
> for more complex than
> <skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
>   <skos:prefLabel xml:lang="en">Triple Data Encryption 
> Algorithm</skos:prefLabel>
>   <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
>   <skos:altLabel xml:lang="en">Triple DES</skos:prefLabel>
>   <skos:notation
> rdf:datatype="http://www.w3.org/2001/XMLSchema#string">3DES</skos:nota
> tion>
> ?
> You can indeed create sub-properties of skos:prefLabel, skos:altLabel, 
> skos:hiddenLabel and skos:notation for representing the exact "flavor"
> or your acronyms and abbreviations, but I'm not sure this is what you 
> really need, for simple text mining the occurrence of concepts in 
> documents.
> MADS/RDF offers finer grain. But similarly, I'm not sure you need it...
> Best,
> Antoine
>> Dear mailing list,
>> I am trying to build a controlled vocabulary schema to be able to 
>> model something like RFC 4949 http://tools.ietf.org/html/rfc4949
>> This controlled vocabulary has "separate" entries for the acronym, 
>> abbreviation, each slang/synonym, and canonical term. There are also 
>> deprecatedLabel.
>> I do not want separate entries for each acronym/abbreviation as the 
>> MADs/rdf object properties hasAcronymVariant and 
>> hasAbbreviationVAriant suggests. Instead I want everything in one 
>> canonical entry. (reasons outline in Use Case Scenario below)
>> For example in the RFC 4949, page 9 :
>> prefLabel: Triple Data Encryption Algorithm
>> hiddenLabel: Triple DEZ [I made up this slang]
>> How would you model these 2 alternatives to the canonical Label in 
>> MADS/rdf?
>> acronym:3DES
>> abbreviation: Triple DES
>> Use Case Scenario
>> We want to build a master controlled vocabulary by text mining many 
>> glossaries such as RFC 4949. So we have to be able to process these 
>> varying labels and cross references.
>> One approach is to model RFC 4949 using MDS/rdf as the specification 
>> suggests, and then use a some sort of inferencing/query to get the 
>> acronyms/abbreviations to "appear" as part of the canonical term 
>> using object properties. This leads to more term entries but makes it 
>> easy to text mine. This complicates XSLT transformation to .txt for 
>> further text mining.
>> An alternate approach is to make one canonical entry for all label 
>> types for the text mining reason listed next which would simply the 
>> XSLT transformation from OWL to .txt
>> We curate the multiple glossary inputs to ensure there is only one 
>> canonical idea presented ontologically/conceptually by a SME (either 
>> manually curate to ensure syntactically different labels for the same 
>> term are matched or SPARQL query to isolate duplicates or both 
>> techniques).
>> Then we export the master term list as a .txt with preferred label, 
>> acronyms, symbols (QUDT ontology), abbreviations, and synonyms 
>> (altLabel). This acts as an input again for GATE so that we can text 
>> mine the true corpus that describes a product to build the knowledge 
>> base for that product.
>> Right now our glossary has over 20,000 telecommunications terms (many 
>> complex and simple labels). So the design is important so we do not 
>> have a big job correcting populated design errors.
>> Of course I can just model owl:acronym and owl:abbreviation under the 
>> approriate imported SKOS, SKOS-XL, and MADS/rdf data properties, but 
>> I would like to remain as close as possible to customary modeling.
>> Any thoughts?
>> *Bradley Shoebottom**
>> **Senior Information Architect - Research and Product Development*
>> Phone:*(506) 674-5439*| Toll-Free: *(800) 363-3358**
>> *Skype:*bradley.shoebottom*
>> Email:*bradley.shoebottom@innovatia..net
>> <mailto:bradley.shoebottom@innovatia.net>*

Received on Monday, 25 February 2013 13:30:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:46:20 UTC