Re: Modeling acronym and abbreviation Labels scenario

By the way is your case similar to for example concepts like
http://id.loc.gov/vocabulary/cryptographicHashFunctions/sha-256
?

If yes, then it would be interesting to look at the patterns represented there, both in "pure" MADS/RDF
http://id.loc.gov/vocabulary/cryptographicHashFunctions/sha-256.madsrdf.rdf
and in SKOS
http://id.loc.gov/vocabulary/cryptographicHashFunctions/sha-256.skos.rdf

Antoine

> Dear Bradley,
>
> First sorry for the time it took...
>
> I'm actually not sure to understand the question. Are you searching for more complex than
>
> <skos:Concept rdf:about"#:tripleDataEncryptionAlgorithm">
> <skos:prefLabel xml:lang="en">Triple Data Encryption Algorithm</skos:prefLabel>
> <skos:hiddenLabel xml:lang="en">Triple DEZ</skos:hiddenLabel>
> <skos:altLabel xml:lang="en">Triple DES</skos:prefLabel>
> <skos:notation rdf:datatype="http://www.w3.org/2001/XMLSchema#string">3DES</skos:notation>
>
> ?
> You can indeed create sub-properties of skos:prefLabel, skos:altLabel, skos:hiddenLabel and skos:notation for representing the exact "flavor" or your acronyms and abbreviations, but I'm not sure this is what you really need, for simple text mining the occurrence of concepts in documents.
>
> MADS/RDF offers finer grain. But similarly, I'm not sure you need it...
>
> Best,
>
> Antoine
>
>
>> Dear mailing list,
>>
>> I am trying to build a controlled vocabulary schema to be able to model something like RFC 4949 http://tools.ietf.org/html/rfc4949
>>
>> This controlled vocabulary has “separate” entries for the acronym, abbreviation, each slang/synonym, and canonical term. There are also deprecatedLabel.
>>
>> I do not want separate entries for each acronym/abbreviation as the MADs/rdf object properties hasAcronymVariant and hasAbbreviationVAriant suggests. Instead I want everything in one canonical entry. (reasons outline in Use Case Scenario below)
>>
>> For example in the RFC 4949, page 9 :
>>
>> prefLabel: Triple Data Encryption Algorithm
>>
>> hiddenLabel: Triple DEZ [I made up this slang]
>>
>> How would you model these 2 alternatives to the canonical Label in MADS/rdf?
>>
>> acronym:3DES
>>
>> abbreviation: Triple DES
>>
>> Use Case Scenario
>>
>> We want to build a master controlled vocabulary by text mining many glossaries such as RFC 4949. So we have to be able to process these varying labels and cross references.
>>
>> One approach is to model RFC 4949 using MDS/rdf as the specification suggests, and then use a some sort of inferencing/query to get the acronyms/abbreviations to “appear” as part of the canonical term using object properties. This leads to more term entries but makes it easy to text mine. This complicates XSLT transformation to .txt for further text mining.
>>
>> An alternate approach is to make one canonical entry for all label types for the text mining reason listed next which would simply the XSLT transformation from OWL to .txt
>>
>> We curate the multiple glossary inputs to ensure there is only one canonical idea presented ontologically/conceptually by a SME (either manually curate to ensure syntactically different labels for the same term are matched or SPARQL query to isolate duplicates or both techniques).
>>
>> Then we export the master term list as a .txt with preferred label, acronyms, symbols (QUDT ontology), abbreviations, and synonyms (altLabel). This acts as an input again for GATE so that we can text mine the true corpus that describes a product to build the knowledge base for that product.
>>
>> Right now our glossary has over 20,000 telecommunications terms (many complex and simple labels). So the design is important so we do not have a big job correcting populated design errors.
>>
>> Of course I can just model owl:acronym and owl:abbreviation under the approriate imported SKOS, SKOS-XL, and MADS/rdf data properties, but I would like to remain as close as possible to customary modeling.
>>
>> Any thoughts?
>>
>> *Bradley Shoebottom**
>> **Senior Information Architect – Research and Product Development*
>> Phone:*(506) 674-5439*| Toll-Free: *(800) 363-3358**
>> *Skype:*bradley.shoebottom*
>>
>> Email:*bradley.shoebottom@innovatia..net <mailto:bradley.shoebottom@innovatia.net>*
>>
>
>

Received on Sunday, 24 February 2013 16:32:06 UTC