- From: Paola Di Maio <paoladimaio10@gmail.com>
- Date: Sat, 27 Jun 2020 11:35:04 +0800
- To: Mike Bergman <mike@mkbergman.com>
- Cc: W3C AIKR CG <public-aikr@w3.org>
- Message-ID: <CAMXe=SpO2JGU_ja1NTZ=gjucnXoyggQ=t6TwrHOH955YUr3DNg@mail.gmail.com>
Thank you Mike for reply yes - the big questions, because of an innate desire to tackle big problems .... Regarding fact checking, well its necessary in reasoning - Much of the facts upon which reasoning is based - including human reasoning, not AI - be be false, or arguable We need to put fact checking in truth preservation and logic etc P On Thu, Jun 18, 2020 at 8:58 AM Mike Bergman <mike@mkbergman.com> wrote: > Hi Paola, > > You always ask the big questions. ;) So, I will try to limit my response > to big answers. > > As for Wikipedia, I know/suspect there is bias and falsity in some of the > information. I see little of it directly, more in terms of errors of > omission or viewpoint, rather than direct falsities. My suspicion is the > actual percentage of unreliable information is quite low, though the > information may still be incomplete. One point worth making has to do with > so-called 'gold standards' that are essential to all science-based > assessments, particularly with regard to human language or knowledge. > Studies often see interannotator agreements in the 75-80% range, and only > very widely used standards (like WordNet or various language corpora) get > to agreements in the 90-95% range. This is an actual error term, so when > one sees F1 stats or similar, perhaps of 80% or whatever, you need to > decrement that amount by the interannotator agreement percentage. Many > tests claiming 85-90% agreements for NLP are actually closer to 64% to 85% > once we adjust for interannotator. Is 35% to 15% of information on > Wikipedia bad?? > > As a general matter, I am extremely leery of fact-checking services > because what are the standards? who are the annotators? what is their > interannotator agreement? These are science-based concerns, and I have > ethical ones as well. > > As for the information in KBpedia, we tend to check most if not all of our > links each release (which have averaged every 4-6 months or so). That is > not perhaps frequent enough, but we also tend to tie into the more central > or structural concepts in these external sources, rather than the leaves, > which are more dynamic. The way KBpedia works is to tie into a key linkage > point in an external source, and use that linkage point to retrieve current > instances from that source. That is one reason why there are only 58 K > concepts in KBpedia, but they tie into tens of millions of instances as > maintained by the external sources. > > The reasoning we do is the traditional deductive ones (consistency, > satisfiabiilty and subsumption) using reasoners like Pellet or HermiT, plus > inductive reasoning that is based on various supervised machine learning > approaches. We are not using abductive reasoning, but a reason for trying > to follow the insights of Charles Peirce is that we have a means to get > into that hypothesis-generating and -screening logic, which Peirce did more > than anyone to explicate. It is an area I personally want to pursue further. > > The management of information follows a triple/quad store that handles the > overall reasoning knowledge graph, with direct retrievals of instance data > from the source knowledge bases (the seven specifically mentioned, plus > another score of minor ones). Thus, KBpedia is not a massive, centralized > system, but a rather lightweight one with distributed access and retrieval > from its contributor sources. Of course, this kind of Web-oriented > architecture with all resources identified by IRIs is one of the reasons > semantic technologies make such great sense. > > Lastly, in terms of big useful lessons, I would point to the power of > having "correct" KR distinctions between instances (individuals), types > (generals or concepts), events, and attributes (monadic characteristics > like color or shape) on the noun side. And, on the verb side, relations > that split between attributes, direct relations and representations > (indexes and denotations). Look at any top-level ontology or knowledge > graph and ask yourself whether and how they handle these distinctions. Most > do not or only hand wave. The distinctions that we use on these matters > again come from the insights of Charles Sanders Peirce. > > Best, Mike > On 6/16/2020 6:49 PM, Paola Di Maio wrote: > > Thank you Mike, looks like a big interesting project, congrats for the > release > > Now, the problem I have with wikipedia is that in addition to containing > good articles sometimes, it is not fact checked, there is a lot of > rubbish/false information (true, there is quite a lot of rubbish outside of > wikipedia too). > > A few of questions: how often is the data pulled/updated from these > databases? Is the data stored in sql or how? How does the system manage > the integration of different data sets/data structures? can you share the > design of the inference model/reasoning architecture? what are the > implications/useful lessons for KR we can learn from this project? > > On Tue, Jun 16, 2020 at 10:27 PM Mike Bergman <mike@mkbergman.com> wrote: > >> To All, >> >> I am pleased to announce that we have released KBpedia >> <http://kbpedia.org/> v 2.50 with e-commerce and logistics capabilities, >> as well as significant other refinements. This upgrade comes from adding >> the entire top structure and the most common products and services of the >> United Nations Standard Products and Services Code. UNSPSC >> <https://en.wikipedia.org/wiki/UNSPSC> is a comprehensive, multi-lingual >> taxonomy for products and services, organized into four levels, with >> third-party crosswalks to economic and demographic data sources. It is a >> leading standard for many industrial and economic applications. UNSPSC is >> KBpedia's seventh core knowledge base, joining the public knowledge bases >> of Wikipedia <https://en.wikipedia.org/wiki/Wikipedia>, Wikidata >> <https://en.wikipedia.org/wiki/Wikidata>, GeoNames >> <https://en.wikipedia.org/wiki/GeoNames>, DBpedia >> <https://en.wikipedia.org/wiki/DBpedia>, schema.org >> <https://en.wikipedia.org/wiki/Schema.org>, and OpenCyc >> <https://en.wikipedia.org/wiki/Cyc> already integrated into the system. >> >> KBpedia is a knowledge graph that provides a coherent scaffolding to >> achieve its twin goals of data interoperability and knowledge-based >> artificial intelligence (KBAI <http://www.mkbergman.com/category/kbai/>). >> KBpedia now contains more than 58,000 reference concepts and nearly 200,000 >> unique mappings to its knowledge bases, enabling links to more than 40 >> million entities. It is written in the standard OWL 2 >> <https://en.wikipedia.org/wiki/Web_Ontology_Language> semantic language >> from the W3C <https://en.wikipedia.org/wiki/World_Wide_Web_Consortium>. >> >> KBpedia consists of 73 mostly disjoint typologies organized under an >> upper KBpedia Knowledge Ontology (KKO), which is designed according to the >> universal categories and knowledge representation insights of the great >> American 19th century scientist, logician, and polymath, Charles Sanders >> Peirce <https://en.wikipedia.org/wiki/Charles_Sanders_Peirce>. KBpedia, >> KKO, and all of its mappings and files are open source under the Creative >> Commons Attribution 4.0 International (CC BY 4.0) >> <https://creativecommons.org/licenses/by/4.0/> license. >> >> For more details, see the release announcement >> <http://kbpedia.org/resources/news/kbpedia-adds-ecommerce/> or go to >> Github <https://github.com/Cognonto/kbpedia/blob/master/versions/2.50/> >> to download <http://kbpedia.org/resources/downloads/> the distro. >> >> Thanks, Mike >> >
Received on Saturday, 27 June 2020 03:35:57 UTC