W3C home > Mailing lists > Public > public-esw-thes@w3.org > January 2014

Re: SKOS Quality Checkers

From: Christian Mader <c.mader@semantic-web.at>
Date: Mon, 20 Jan 2014 18:12:25 +0100
Message-ID: <CAD8=V5y9LQzDDKZcioST=BtXVSxY3hssjH6EoyDdFoDd4kXiPQ@mail.gmail.com>
To: vladimir.alexiev@ontotext.com
Cc: Osma Suominen <osma.suominen@helsinki.fi>, public-esw-thes@w3.org, Gregg Garcia <GGarcia@getty.edu>, Joan Cobb <JCobb@getty.edu>
Hi Vladimir,

Thanks for sharing your findings and for the suggestions! I will definitely
consider them in the next versions of the qSKOS/online checker tool!

best,
Christian


2014/1/20 Vladimir Alexiev <vladimir.alexiev@ontotext.com>

> > Maybe the fastest way to learn about them is this joint paper?
> > Osma Suominen and Christian Mader: Assessing and Improving the Quality
> of SKOS Vocabularies. Journal on Data Semantics, 2013.
> > http://www.seco.tkk.fi/publications/2013/suominen-mader-skosquality.pdf
>
> The paper is very nice indeed!
> I've read it in detail, and here are some remarks on some of the
> validation criteria from AAT's standpoint
>
> ** 4.2.1 Omitted or Invalid Language Tags
>
> Ok, but make sure you're not too restrictive with parsing the tags. E.g.
> we use
> qqq-002 "private language, region Africa" to denote what Getty calls
> "African language"
>
> We also use private subtags in various positions, e.g.
> la vs
> la-x-liturgic vs
> la-x-medieval
>
> and
> zh-Latn-pinyin vs
> zh-Latn-pinyin-x-hanyu vs
> zh-Latn-pinyin-x-notone
>
> ** 4.2.2 Incomplete Language Coverage
> This may be relevant to Eurovoc (a relatively small vocab that's intended
> to have uniform/full coverage in numerous languages).
>
> But it's not relevant to AAT, which has:
> 3 core languages (English, Spanish, Dutch)
> 1 core language in-progress (Chinese)
> over 100 languages (from Africaans to Zulu) that provide a few
> vernacular/loan terms, and never intended to have complete coverage.
>
> The same can be observed for Rameau, and I'd guess any large Library or
> Cultural Heritage vocab.
>
> So it'll be nice to add an option "core languages" and check coverage only
> against them.
> And take only the first part of the langtag (sparql's langMatches())
> because in AAT Chinese is covered with different transcriptions:
> zh-Hant
> zh-Latn-wadegile
> zh-Latn-pinyin-x-hanyu
> zh-Latn-pinyin-x-notone
>
> ** 4.2.4 Overlapping Labels
>
> Two problems with this criterion as formulated:
>
> a. AAT systematically includes the plural noun as prefLabel, and singular
> noun as altLabel.
> E.g. the @en labels of http://getty.ontotext.com/resource/aat/300198841include:
>   prefLabel=rhyta, altLabel=rhyton, altLabel=rhytons
> Your default similarity matching (I guess Levenstein with distance 1)
> would flag those
>
> b.It is quite legitimate to have the prefLabel of one concept and altLabel
> of another be the same.
> The query  select ?l ?x ?y {?x skos:prefLabel ?l. ?y skos:altLabel ?l}
> at http://getty.ontotext.com/sparql
> finds 866 such pairs.
>
> E.g. 300055155 prefLabel=awe (positive emotions, emotion, ... Associated
> Concepts Facet)
> vs 300387898 altLabel=awe (the Aweti language)
>
> Please note that AAT often includes a (qualifier) in parens to ensure that
> prefLabels are unique, e.g.:
> 300111178 English (culture or style) vs
> 300388277 English (language)
>
> ** 4.3.1 Orphan Concepts
>
> AAT is 8-9 levels deep.
> Yet, there is a surprisingly large number of topConcepts: 4291 out of
> 37058 or 11.5%;
> and many of them may not have skos:Concept children.
>
> E.g. 300054031 "drawing (metalworking)" is a top concept, although it's
> nested 8 levels deep:
> <metal forming processes and techniques>, <metalworking processes and
> techniques>, <metalworking and metalworking processes and techniques>,
> <processes and techniques by material>, <processes and techniques by
> specific type>, <processes and techniques>, Processes and Techniques,
> Activities Facet
> But all these levels are NOT skos:Concepts.
>
> e.g. 300388277 English (language)
> http://getty.ontotext.com/resource/aat/300388277 is nested 5 levels deep:
> <languages and writing systems by specific type>, <languages and writing
> systems>, language-related concepts, Associated Concepts, Associated
> Concepts Facet
> but it doesn't have any children, nor skos:Concept parents.
>
> Furthermore, AAT has 70 associative relations.
> But none of them are mapped to skos:related yet, because some connect
> non-Concepts while skos:related can connect only concepts.
>
> So this criterion should take into account skos:Collection parents (i.e.
> skos:member^)
>
> ** 4.3.2 Disconnected Concept Clusters
>
> Similarly for this criterion, you should consider the deeper
> skos:Collection structure.
> Getty even has Concepts above some Collections (called Guide Terms).
> In such case the links are:
> skos:Collection -> skos:member -> skos:Concept -> iso:subordinateArray ->
> skos:Collection,iso:ThesaurusArray
> (You'll find illustrations in other posts in this mailing list)
>
> So: you should consider iso:subordinateArray and skos:member in addition
> to skos:narrower when making up the structure.
>
> Best regards!
> --
> Vladimir Alexiev, PhD, PMP
> Lead, Data and Ontology Management Group
> Ontotext Corp, www.ontotext.com
> Sirma Group Holding, www.sirma.com
> Email: vladimir.alexiev@ontotext.com, skype:valexiev1
> Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net
> Landline: +359 (988) 106 084, Fax: +359 (2) 975 3226
> Calendar: https://www.google.com/calendar/embed?src=vladimir%40sirma.bg
>
>
>
>
>
Received on Monday, 20 January 2014 17:12:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:19 UTC