- From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
- Date: Mon, 20 Jan 2014 18:52:50 +0200
- To: "'Christian Mader'" <c.mader@semantic-web.at>, "'Osma Suominen'" <osma.suominen@helsinki.fi>
- Cc: <public-esw-thes@w3.org>, "'Gregg Garcia'" <GGarcia@getty.edu>, "'Joan Cobb'" <JCobb@getty.edu>
> Maybe the fastest way to learn about them is this joint paper? > Osma Suominen and Christian Mader: Assessing and Improving the Quality of SKOS Vocabularies. Journal on Data Semantics, 2013. > http://www.seco.tkk.fi/publications/2013/suominen-mader-skosquality.pdf The paper is very nice indeed! I've read it in detail, and here are some remarks on some of the validation criteria from AAT's standpoint ** 4.2.1 Omitted or Invalid Language Tags Ok, but make sure you're not too restrictive with parsing the tags. E.g. we use qqq-002 "private language, region Africa" to denote what Getty calls "African language" We also use private subtags in various positions, e.g. la vs la-x-liturgic vs la-x-medieval and zh-Latn-pinyin vs zh-Latn-pinyin-x-hanyu vs zh-Latn-pinyin-x-notone ** 4.2.2 Incomplete Language Coverage This may be relevant to Eurovoc (a relatively small vocab that's intended to have uniform/full coverage in numerous languages). But it's not relevant to AAT, which has: 3 core languages (English, Spanish, Dutch) 1 core language in-progress (Chinese) over 100 languages (from Africaans to Zulu) that provide a few vernacular/loan terms, and never intended to have complete coverage. The same can be observed for Rameau, and I'd guess any large Library or Cultural Heritage vocab. So it'll be nice to add an option "core languages" and check coverage only against them. And take only the first part of the langtag (sparql's langMatches()) because in AAT Chinese is covered with different transcriptions: zh-Hant zh-Latn-wadegile zh-Latn-pinyin-x-hanyu zh-Latn-pinyin-x-notone ** 4.2.4 Overlapping Labels Two problems with this criterion as formulated: a. AAT systematically includes the plural noun as prefLabel, and singular noun as altLabel. E.g. the @en labels of http://getty.ontotext.com/resource/aat/300198841 include: prefLabel=rhyta, altLabel=rhyton, altLabel=rhytons Your default similarity matching (I guess Levenstein with distance 1) would flag those b.It is quite legitimate to have the prefLabel of one concept and altLabel of another be the same. The query select ?l ?x ?y {?x skos:prefLabel ?l. ?y skos:altLabel ?l} at http://getty.ontotext.com/sparql finds 866 such pairs. E.g. 300055155 prefLabel=awe (positive emotions, emotion, ... Associated Concepts Facet) vs 300387898 altLabel=awe (the Aweti language) Please note that AAT often includes a (qualifier) in parens to ensure that prefLabels are unique, e.g.: 300111178 English (culture or style) vs 300388277 English (language) ** 4.3.1 Orphan Concepts AAT is 8-9 levels deep. Yet, there is a surprisingly large number of topConcepts: 4291 out of 37058 or 11.5%; and many of them may not have skos:Concept children. E.g. 300054031 "drawing (metalworking)" is a top concept, although it's nested 8 levels deep: <metal forming processes and techniques>, <metalworking processes and techniques>, <metalworking and metalworking processes and techniques>, <processes and techniques by material>, <processes and techniques by specific type>, <processes and techniques>, Processes and Techniques, Activities Facet But all these levels are NOT skos:Concepts. e.g. 300388277 English (language) http://getty.ontotext.com/resource/aat/300388277 is nested 5 levels deep: <languages and writing systems by specific type>, <languages and writing systems>, language-related concepts, Associated Concepts, Associated Concepts Facet but it doesn't have any children, nor skos:Concept parents. Furthermore, AAT has 70 associative relations. But none of them are mapped to skos:related yet, because some connect non-Concepts while skos:related can connect only concepts. So this criterion should take into account skos:Collection parents (i.e. skos:member^) ** 4.3.2 Disconnected Concept Clusters Similarly for this criterion, you should consider the deeper skos:Collection structure. Getty even has Concepts above some Collections (called Guide Terms). In such case the links are: skos:Collection -> skos:member -> skos:Concept -> iso:subordinateArray -> skos:Collection,iso:ThesaurusArray (You'll find illustrations in other posts in this mailing list) So: you should consider iso:subordinateArray and skos:member in addition to skos:narrower when making up the structure. Best regards! -- Vladimir Alexiev, PhD, PMP Lead, Data and Ontology Management Group Ontotext Corp, www.ontotext.com Sirma Group Holding, www.sirma.com Email: vladimir.alexiev@ontotext.com, skype:valexiev1 Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net Landline: +359 (988) 106 084, Fax: +359 (2) 975 3226 Calendar: https://www.google.com/calendar/embed?src=vladimir%40sirma.bg
Received on Monday, 20 January 2014 16:53:15 UTC