Re: Linked data and rdf:type in dbpedia

Hi Tim,

> That was a great session in Banff. 

yes. I think we were pretty successful in bringing the concept of Linked Data to the conference participants  attention. Our next steps, here in Berlin, will be to:
- publish some more data sources as linked data (CIA Factbook, Project Gutenberg, Eurostat).
- write a tutorial about how to publish information as Linked Data on the Web.
- improve the quality of DBpedia data.

Which brings me to your actual question. Yes, you are right dbpedia:Category:English_people_by_county shouldn't be a class. 

It currently is a class as there is still some old data loaded into the DBpedia SPARQL endpoint / linked data interface which shouldn't be there. The old data is not present in the RDF dumps and will also be removed form the endpoint with the next release of the DBpedia dataset.

Our current approach to categorization/classification is to:

- represent Wikipedia categories using the SKOS vocabulary. Therefore you find all the skos:subject statements in the dataset.
- use rdf:type statements only for classes from the YAGO classification, which forms a proper hirarchie.
  An example of such a statement is the :TBL a </class/yago/person> statement you mention in your mail.

We are currently working on refactoring the DBpedia information extraction process in order to improve data qualtiy and currency. 

For this, it is extremely helpful if DBpedia users report errors they discover in the dataset.
Our preferred way for bug reports in the DBpedia bug tracker on sourceforge
http://sourceforge.net/tracker/?atid=935520&group_id=190976&func=browse


Cheers

Chris



--
Dr. Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message ----- 
From: Tim Berners-Lee 
To: SW-forum 
Sent: Sunday, May 20, 2007 7:07 PM
Subject: Fwd: Linked data and rdf:type in dbpedia






Begin forwarded message:


From: Tim Berners-Lee <timbl@w3.org>
Date: 2007-05-19 19:27:56 EDT
To: Chris Bizer <chris@bizer.de>
Cc: tabultor@csai.mit.edu
Subject: Linked data and rdf:type in dbpedia


Hi Chris.


That was a great session in Banff. 


I'm looking now at a problem where the Tabulator sucks in huge amounts of dbpedia.  The problem is rather random rdf:type links


1. My home page says:


<http://dbpedia.org/resource/Tim_Berners-Lee> = card:i.


2. That causes tab'r to bring in http://dbpedia.org/resource/Tim_Berners-Lee
which in turn says


   <http://dbpedia.org/resource/Tim_Berners-Lee>     a   <http://dbpedia.org/resource/Category:People_from_London>.


3. That causes Tab'r to look up the class Category:People_from_London


$ cwm http://dbpedia.org/resource/Category:People_from_London


This says a bunch of people  whose have subject of that
   <http://dbpedia.org/resource/Catherine_of_York>     :subject <> .
which is fine, but it also says:


   <>     a </class/yago/person>,
                <http://dbpedia.org/resource/Category:English_people_by_county>,
                <http://dbpedia.org/resource/Category:London>,
<http://dbpedia.org/resource/Category:People_by_city_or_town_in_England>,
                :Concept.


  Here I think the use of rdf:type is incorrect.   The class People_form_London is a class of people.  It is a subclass of Person.


  It has no simple relationship to London. (It is in fact an owl:Restriction on property origin to value london, but I doubt if you can generalize that across dbpedia).


  English people by County  *could* be a class of classes.




The tabulator assumes that every time it follows rdf:type it is going meta: from classes to classes of classes, etc.  It does this as in every other case so far, there have been only a few levels (like 2).


Currently, it can't use dbpedia as it pulls it memory-busting amounts of it.  It not even clear that the rdf:type links don't have cycles.


Anyone using OWL with this data wil of course find t impossible to deal with classes of classes at all.  I don't know to what extent the issue is an 




Example:


me : Unitarian
http://en.wikipedia.org/wiki/Tim_Berners-Lee


    is a member of the class of


http://en.wikipedia.org/wiki/Category:Unitarian_Universalists


    this is a member of the metaclass:


http://en.wikipedia.org/wiki/Category:People_by_religion


    this i member of the metametametaclass of ways in whcih people are categorized


http://en.wikipedia.org/wiki/Category:People


   What follows here is the weak link.  Reference is a section of the library.
    "This category is for information typically found in the reference section of a library: reference works."  Now the meta meta class  is regarded as a work? o-oh.


http://en.wikipedia.org/wiki/Category:Reference


   it continues, following Category (rdf:type in dbpedia):


http://en.wikipedia.org/wiki/Category:Knowledge
http://en.wikipedia.org/wiki/Category:Information
http://en.wikipedia.org/wiki/Category:Physical_quantity
http://en.wikipedia.org/wiki/Category:Measurement
http://en.wikipedia.org/wiki/Category:Scientific_observation
http://en.wikipedia.org/wiki/Category:Data_collection
http://en.wikipedia.org/wiki/Category:Data_management
http://en.wikipedia.org/wiki/Category:Product_development
http://en.wikipedia.org/wiki/Category:Product_management
http://en.wikipedia.org/wiki/Category:Engineering
http://en.wikipedia.org/wiki/Category:Applied_sciences
http://en.wikipedia.org/wiki/Category:Science
http://en.wikipedia.org/wiki/Category:Knowledge


Ooops! It is cyclic.


The logical relationships are not consistent.  i don't know whether there are a finite number of 
categories for which rdfs:class does not work, which could be put into a stop list.  "Reference" would be one.


I wonder whether dbpedia could either find a way of judging which ones are really rdf:type relationships, or just use something vaguer for the relationship.
Maybe wikepedia:category would be best as that is what it is in general.


Tim

Received on Sunday, 20 May 2007 17:58:21 UTC