- From: Mike Bergman <mike@mkbergman.com>
- Date: Wed, 17 Jun 2020 19:57:59 -0500
- To: paoladimaio10@googlemail.com
- Cc: W3C AIKR CG <public-aikr@w3.org>
- Message-ID: <e30d9fe6-f22f-e9f8-1ae8-865de442b281@mkbergman.com>
Hi Paola, You always ask the big questions. ;) So, I will try to limit my response to big answers. As for Wikipedia, I know/suspect there is bias and falsity in some of the information. I see little of it directly, more in terms of errors of omission or viewpoint, rather than direct falsities. My suspicion is the actual percentage of unreliable information is quite low, though the information may still be incomplete. One point worth making has to do with so-called 'gold standards' that are essential to all science-based assessments, particularly with regard to human language or knowledge. Studies often see interannotator agreements in the 75-80% range, and only very widely used standards (like WordNet or various language corpora) get to agreements in the 90-95% range. This is an actual error term, so when one sees F1 stats or similar, perhaps of 80% or whatever, you need to decrement that amount by the interannotator agreement percentage. Many tests claiming 85-90% agreements for NLP are actually closer to 64% to 85% once we adjust for interannotator. Is 35% to 15% of information on Wikipedia bad?? As a general matter, I am extremely leery of fact-checking services because what are the standards? who are the annotators? what is their interannotator agreement? These are science-based concerns, and I have ethical ones as well. As for the information in KBpedia, we tend to check most if not all of our links each release (which have averaged every 4-6 months or so). That is not perhaps frequent enough, but we also tend to tie into the more central or structural concepts in these external sources, rather than the leaves, which are more dynamic. The way KBpedia works is to tie into a key linkage point in an external source, and use that linkage point to retrieve current instances from that source. That is one reason why there are only 58 K concepts in KBpedia, but they tie into tens of millions of instances as maintained by the external sources. The reasoning we do is the traditional deductive ones (consistency, satisfiabiilty and subsumption) using reasoners like Pellet or HermiT, plus inductive reasoning that is based on various supervised machine learning approaches. We are not using abductive reasoning, but a reason for trying to follow the insights of Charles Peirce is that we have a means to get into that hypothesis-generating and -screening logic, which Peirce did more than anyone to explicate. It is an area I personally want to pursue further. The management of information follows a triple/quad store that handles the overall reasoning knowledge graph, with direct retrievals of instance data from the source knowledge bases (the seven specifically mentioned, plus another score of minor ones). Thus, KBpedia is not a massive, centralized system, but a rather lightweight one with distributed access and retrieval from its contributor sources. Of course, this kind of Web-oriented architecture with all resources identified by IRIs is one of the reasons semantic technologies make such great sense. Lastly, in terms of big useful lessons, I would point to the power of having "correct" KR distinctions between instances (individuals), types (generals or concepts), events, and attributes (monadic characteristics like color or shape) on the noun side. And, on the verb side, relations that split between attributes, direct relations and representations (indexes and denotations). Look at any top-level ontology or knowledge graph and ask yourself whether and how they handle these distinctions. Most do not or only hand wave. The distinctions that we use on these matters again come from the insights of Charles Sanders Peirce. Best, Mike On 6/16/2020 6:49 PM, Paola Di Maio wrote: > Thank you Mike, looks like a big interesting project, congrats for the > release > > Now, the problem I have with wikipedia is that in addition to > containing good articles sometimes, it is not fact checked, there is a > lot of rubbish/false information (true, there is quite a lot of > rubbish outside of wikipedia too). > > A few of questions: how often is the data pulled/updated from these > databases? Is the data stored in sql or how? How does the system > manage the integration of different data sets/data structures? can you > share the design of the inference model/reasoning architecture? what > are the implications/useful lessons for KR we can learn from this project? > > On Tue, Jun 16, 2020 at 10:27 PM Mike Bergman <mike@mkbergman.com > <mailto:mike@mkbergman.com>> wrote: > > To All, > > I am pleased to announce that we have released KBpedia > <http://kbpedia.org/> v 2.50 with e-commerce and logistics > capabilities, as well as significant other refinements. This > upgrade comes from adding the entire top structure and the most > common products and services of the United Nations Standard > Products and Services Code. UNSPSC > <https://en.wikipedia.org/wiki/UNSPSC> is a comprehensive, > multi-lingual taxonomy for products and services, organized into > four levels, with third-party crosswalks to economic and > demographic data sources. It is a leading standard for many > industrial and economic applications. UNSPSC is KBpedia's seventh > core knowledge base, joining the public knowledge bases of > Wikipedia <https://en.wikipedia.org/wiki/Wikipedia>, Wikidata > <https://en.wikipedia.org/wiki/Wikidata>, GeoNames > <https://en.wikipedia.org/wiki/GeoNames>, DBpedia > <https://en.wikipedia.org/wiki/DBpedia>, schema.org > <https://en.wikipedia.org/wiki/Schema.org>, and OpenCyc > <https://en.wikipedia.org/wiki/Cyc> already integrated into the > system. > > KBpedia is a knowledge graph that provides a coherent scaffolding > to achieve its twin goals of data interoperability and > knowledge-based artificial intelligence (KBAI > <http://www.mkbergman.com/category/kbai/>). KBpedia now contains > more than 58,000 reference concepts and nearly 200,000 unique > mappings to its knowledge bases, enabling links to more than 40 > million entities. It is written in the standard OWL 2 > <https://en.wikipedia.org/wiki/Web_Ontology_Language> semantic > language from the W3C > <https://en.wikipedia.org/wiki/World_Wide_Web_Consortium>. > > KBpedia consists of 73 mostly disjoint typologies organized under > an upper KBpedia Knowledge Ontology (KKO), which is designed > according to the universal categories and knowledge representation > insights of the great American 19th century scientist, logician, > and polymath, Charles Sanders Peirce > <https://en.wikipedia.org/wiki/Charles_Sanders_Peirce>. KBpedia, > KKO, and all of its mappings and files are open source under the > Creative Commons Attribution 4.0 International (CC BY 4.0) > <https://creativecommons.org/licenses/by/4.0/> license. > > For more details, see the release announcement > <http://kbpedia.org/resources/news/kbpedia-adds-ecommerce/> or go > to Github > <https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/> > to download <http://kbpedia.org/resources/downloads/> the distro. > > Thanks, Mike >
Received on Thursday, 18 June 2020 00:58:14 UTC