- From: Christian Mader <c.mader@semantic-web.at>
- Date: Mon, 20 Jan 2014 18:12:25 +0100
- To: vladimir.alexiev@ontotext.com
- Cc: Osma Suominen <osma.suominen@helsinki.fi>, public-esw-thes@w3.org, Gregg Garcia <GGarcia@getty.edu>, Joan Cobb <JCobb@getty.edu>
- Message-ID: <CAD8=V5y9LQzDDKZcioST=BtXVSxY3hssjH6EoyDdFoDd4kXiPQ@mail.gmail.com>
Hi Vladimir, Thanks for sharing your findings and for the suggestions! I will definitely consider them in the next versions of the qSKOS/online checker tool! best, Christian 2014/1/20 Vladimir Alexiev <vladimir.alexiev@ontotext.com> > > Maybe the fastest way to learn about them is this joint paper? > > Osma Suominen and Christian Mader: Assessing and Improving the Quality > of SKOS Vocabularies. Journal on Data Semantics, 2013. > > http://www.seco.tkk.fi/publications/2013/suominen-mader-skosquality.pdf > > The paper is very nice indeed! > I've read it in detail, and here are some remarks on some of the > validation criteria from AAT's standpoint > > ** 4.2.1 Omitted or Invalid Language Tags > > Ok, but make sure you're not too restrictive with parsing the tags. E.g. > we use > qqq-002 "private language, region Africa" to denote what Getty calls > "African language" > > We also use private subtags in various positions, e.g. > la vs > la-x-liturgic vs > la-x-medieval > > and > zh-Latn-pinyin vs > zh-Latn-pinyin-x-hanyu vs > zh-Latn-pinyin-x-notone > > ** 4.2.2 Incomplete Language Coverage > This may be relevant to Eurovoc (a relatively small vocab that's intended > to have uniform/full coverage in numerous languages). > > But it's not relevant to AAT, which has: > 3 core languages (English, Spanish, Dutch) > 1 core language in-progress (Chinese) > over 100 languages (from Africaans to Zulu) that provide a few > vernacular/loan terms, and never intended to have complete coverage. > > The same can be observed for Rameau, and I'd guess any large Library or > Cultural Heritage vocab. > > So it'll be nice to add an option "core languages" and check coverage only > against them. > And take only the first part of the langtag (sparql's langMatches()) > because in AAT Chinese is covered with different transcriptions: > zh-Hant > zh-Latn-wadegile > zh-Latn-pinyin-x-hanyu > zh-Latn-pinyin-x-notone > > ** 4.2.4 Overlapping Labels > > Two problems with this criterion as formulated: > > a. AAT systematically includes the plural noun as prefLabel, and singular > noun as altLabel. > E.g. the @en labels of http://getty.ontotext.com/resource/aat/300198841include: > prefLabel=rhyta, altLabel=rhyton, altLabel=rhytons > Your default similarity matching (I guess Levenstein with distance 1) > would flag those > > b.It is quite legitimate to have the prefLabel of one concept and altLabel > of another be the same. > The query select ?l ?x ?y {?x skos:prefLabel ?l. ?y skos:altLabel ?l} > at http://getty.ontotext.com/sparql > finds 866 such pairs. > > E.g. 300055155 prefLabel=awe (positive emotions, emotion, ... Associated > Concepts Facet) > vs 300387898 altLabel=awe (the Aweti language) > > Please note that AAT often includes a (qualifier) in parens to ensure that > prefLabels are unique, e.g.: > 300111178 English (culture or style) vs > 300388277 English (language) > > ** 4.3.1 Orphan Concepts > > AAT is 8-9 levels deep. > Yet, there is a surprisingly large number of topConcepts: 4291 out of > 37058 or 11.5%; > and many of them may not have skos:Concept children. > > E.g. 300054031 "drawing (metalworking)" is a top concept, although it's > nested 8 levels deep: > <metal forming processes and techniques>, <metalworking processes and > techniques>, <metalworking and metalworking processes and techniques>, > <processes and techniques by material>, <processes and techniques by > specific type>, <processes and techniques>, Processes and Techniques, > Activities Facet > But all these levels are NOT skos:Concepts. > > e.g. 300388277 English (language) > http://getty.ontotext.com/resource/aat/300388277 is nested 5 levels deep: > <languages and writing systems by specific type>, <languages and writing > systems>, language-related concepts, Associated Concepts, Associated > Concepts Facet > but it doesn't have any children, nor skos:Concept parents. > > Furthermore, AAT has 70 associative relations. > But none of them are mapped to skos:related yet, because some connect > non-Concepts while skos:related can connect only concepts. > > So this criterion should take into account skos:Collection parents (i.e. > skos:member^) > > ** 4.3.2 Disconnected Concept Clusters > > Similarly for this criterion, you should consider the deeper > skos:Collection structure. > Getty even has Concepts above some Collections (called Guide Terms). > In such case the links are: > skos:Collection -> skos:member -> skos:Concept -> iso:subordinateArray -> > skos:Collection,iso:ThesaurusArray > (You'll find illustrations in other posts in this mailing list) > > So: you should consider iso:subordinateArray and skos:member in addition > to skos:narrower when making up the structure. > > Best regards! > -- > Vladimir Alexiev, PhD, PMP > Lead, Data and Ontology Management Group > Ontotext Corp, www.ontotext.com > Sirma Group Holding, www.sirma.com > Email: vladimir.alexiev@ontotext.com, skype:valexiev1 > Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net > Landline: +359 (988) 106 084, Fax: +359 (2) 975 3226 > Calendar: https://www.google.com/calendar/embed?src=vladimir%40sirma.bg > > > > >
Received on Monday, 20 January 2014 17:12:54 UTC