Re: Reasoning with ontologies and knowledge graphs?

Indeed, Wikidata has many quality problems. We did a recent study trying to somehow measure different quality aspects of WD, see [1]
My own observation is that having such a large, crowdsourced KG, with an essentially unpredictable base of contributors (wrt formal training) requires pretty extensive machinery to ensure a reasonable level of consistency… I think it is unrealistic to expect that most contributors will be attuned to the logical/formal distinctions between sub-class-of and instance-of relations, among others. This not to mention the multiplication of similar (if not identical) concepts, super-specializations, etc…
For anyone interested in seeing how such characteristics influence different similarity measures between concepts, you can explore [2] - Just enter a concept name, then add other concepts to see how different similarity measures compares them, and see if they match your intuition as to which concepts should be more similar than others.

Cheers
Daniel

[1] A Study of the Quality of Wikidata - https://arxiv.org/abs/2107.00156 <https://arxiv.org/abs/2107.00156>  
[2] https://kgtk.isi.edu/similarity/ <https://kgtk.isi.edu/similarity/>

> On 13 Dec 2021, at 23:23, Patrick J. Hayes <phayes@ihmc.org> wrote:
> 
> I want to pick up on the example that Margaret gave, below, to make some points about inference. Basically, even simple 'basic' inferences (like following subclass chains) are put at risk by the poor quality of the available data, especially at the more abstract layers that one would hope might be deserving of the title 'ontology'. 
> 
>> On Dec 12, 2021, at 2:46 PM, Margaret Warren <mm@zeroexp.com <mailto:mm@zeroexp.com>> wrote:
>> ...Our search paths function is also quite revealing about oddities that come up in places like the subclasses used in Wikidata for example - when you can do things like get a result of an image of a Bay in New Zealand for a search for a term like: 'communication medium' 
>> 
>> The hops returned are as follows: 
>> 
>> http://dbpedia.org/resource/New_Zealand <http://dbpedia.org/resource/New_Zealand>  (sameAs) 
>> Wikidata: New Zealand http://www.wikidata.org/entity/Q664 <http://www.wikidata.org/entity/Q664> a Commonwealth realm http://www.wikidata.org/entity/Q202686 <http://www.wikidata.org/entity/Q202686> 
>> subclass of kingdom http://www.wikidata.org/entity/Q417175 <http://www.wikidata.org/entity/Q417175> subclass of monarchy http://www.wikidata.org/entity/Q7269 <http://www.wikidata.org/entity/Q7269> 
>> subclass of monarchic system http://www.wikidata.org/entity/Q22676587 <http://www.wikidata.org/entity/Q22676587> subclass of form of government http://www.wikidata.org/entity/Q1307214 <http://www.wikidata.org/entity/Q1307214> 
>> subclass of administrative type http://www.wikidata.org/entity/Q2752458 <http://www.wikidata.org/entity/Q2752458> subclass of classification system http://www.wikidata.org/entity/Q5962346 <http://www.wikidata.org/entity/Q5962346> 
>> subclass of knowledge organization system http://www.wikidata.org/entity/Q6423319 <http://www.wikidata.org/entity/Q6423319> subclass of communication medium http://www.wikidata.org/entity/Q340169 <http://www.wikidata.org/entity/Q340169>
> So New Zealand is a communication medium. Hmmm. 
> 
> Leaving aside some of the factually doubtful claims here (such as Commonwealth Realms being a subclass of Monarchies), the main problem seems to be that the meaning of 'monarchy' shifts from being a class of countries to being a type of government system. New Zealand is at least the right kind of thing to be an instance of the first sense of 'monarchy', but only the second sense can be asserted to be a monarchic system, and then surely it is an /instance/ of such a system, not a subclass, so it is again an instance, not a subclass, of a 'form of government'. 
> 
> This could have been fixed by distinguishing between the actual country and its form of government, eg by saying that New Zealand hasGovernmentalSystem Monarchy, thus breaking the subclass chain (and allowing the first 'class' sense to be defined as on OWL restriction on the value of the property, in a decent ontology.) But going up from there feels like getting lost in a conceptual fog. Perhaps forms of government are administrative types, although I would have no real sense of why; but surely a form of government is not a knowledge organization system? (What does this even mean? What would make anyone feel that this assertion was required or useful, let alone true? Some very abstract theory of types of 'system', perhaps?) And then the final piece of insanity has 'communication medium' as the most overarching concept in this hypthetical theory of government semiotics. Really? Even if classifications and knowledge organizations are both forms of communication (a doubtful claim), surely they are not communication /media/. 
> 
> All this stuff at a higher level than 'form of government' is largely meaningless, not the slightest use to anyone, and potentially dangerous. And even just two levels above something as concrete as New Zealand, we have isa/subclass confusions, which seem to be ubiquitous. . 
> 
> Wikidata is one of the best curated large-scale linked-data corpora, yet it contains stuff like this, so that even inferences as simple and basic as running up subclass chains is liable to result in nonsense. Maybe we would be better off NOT doing too much inference.  
> 
> Margaret was gracious enough to add that Wikidata is not always as bad as this and often gives great results. And yes, OK, but how do our inference engines avoid the bad stuff and only use the good? 
> 
> Pat Hayes
> 

Received on Tuesday, 14 December 2021 16:42:53 UTC