W3C home > Mailing lists > Public > semantic-web@w3.org > May 2007

Re: Linked data and rdf:type in dbpedia

From: Chris Bizer <chris@bizer.de>
Date: Sun, 20 May 2007 19:58:04 +0200
Message-ID: <009301c79b08$6a245c80$c4e84d57@named4gc1asnuj>
To: "SW-forum" <semantic-web@w3.org>, "Tim Berners-Lee" <timbl@w3.org>
Hi Tim,

> That was a great session in Banff. 

yes. I think we were pretty successful in bringing the concept of Linked Data to the conference participants  attention. Our next steps, here in Berlin, will be to:
- publish some more data sources as linked data (CIA Factbook, Project Gutenberg, Eurostat).
- write a tutorial about how to publish information as Linked Data on the Web.
- improve the quality of DBpedia data.

Which brings me to your actual question. Yes, you are right dbpedia:Category:English_people_by_county shouldn't be a class. 

It currently is a class as there is still some old data loaded into the DBpedia SPARQL endpoint / linked data interface which shouldn't be there. The old data is not present in the RDF dumps and will also be removed form the endpoint with the next release of the DBpedia dataset.

Our current approach to categorization/classification is to:

- represent Wikipedia categories using the SKOS vocabulary. Therefore you find all the skos:subject statements in the dataset.
- use rdf:type statements only for classes from the YAGO classification, which forms a proper hirarchie.
  An example of such a statement is the :TBL a </class/yago/person> statement you mention in your mail.

We are currently working on refactoring the DBpedia information extraction process in order to improve data qualtiy and currency. 

For this, it is extremely helpful if DBpedia users report errors they discover in the dataset.
Our preferred way for bug reports in the DBpedia bug tracker on sourceforge



Dr. Chris Bizer
Freie Universitšt Berlin
+49 30 838 54057
----- Original Message ----- 
From: Tim Berners-Lee 
To: SW-forum 
Sent: Sunday, May 20, 2007 7:07 PM
Subject: Fwd: Linked data and rdf:type in dbpedia

Begin forwarded message:

From: Tim Berners-Lee <timbl@w3.org>
Date: 2007-05-19 19:27:56 EDT
To: Chris Bizer <chris@bizer.de>
Cc: tabultor@csai.mit.edu
Subject: Linked data and rdf:type in dbpedia

Hi Chris.

That was a great session in Banff. 

I'm looking now at a problem where the Tabulator sucks in huge amounts of dbpedia.  The problem is rather random rdf:type links

1. My home page says:

<http://dbpedia.org/resource/Tim_Berners-Lee> = card:i.

2. That causes tab'r to bring in http://dbpedia.org/resource/Tim_Berners-Lee
which in turn says

   <http://dbpedia.org/resource/Tim_Berners-Lee>     a   <http://dbpedia.org/resource/Category:People_from_London>.

3. That causes Tab'r to look up the class Category:People_from_London

$ cwm http://dbpedia.org/resource/Category:People_from_London

This says a bunch of people  whose have subject of that
   <http://dbpedia.org/resource/Catherine_of_York>     :subject <> .
which is fine, but it also says:

   <>     a </class/yago/person>,

  Here I think the use of rdf:type is incorrect.   The class People_form_London is a class of people.  It is a subclass of Person.

  It has no simple relationship to London. (It is in fact an owl:Restriction on property origin to value london, but I doubt if you can generalize that across dbpedia).

  English people by County  *could* be a class of classes.

The tabulator assumes that every time it follows rdf:type it is going meta: from classes to classes of classes, etc.  It does this as in every other case so far, there have been only a few levels (like 2).

Currently, it can't use dbpedia as it pulls it memory-busting amounts of it.  It not even clear that the rdf:type links don't have cycles.

Anyone using OWL with this data wil of course find t impossible to deal with classes of classes at all.  I don't know to what extent the issue is an 


me : Unitarian

    is a member of the class of


    this is a member of the metaclass:


    this i member of the metametametaclass of ways in whcih people are categorized


   What follows here is the weak link.  Reference is a section of the library.
    "This category is for information typically found in the reference section of a library: reference works."  Now the meta meta class  is regarded as a work? o-oh.


   it continues, following Category (rdf:type in dbpedia):


Ooops! It is cyclic.

The logical relationships are not consistent.  i don't know whether there are a finite number of 
categories for which rdfs:class does not work, which could be put into a stop list.  "Reference" would be one.

I wonder whether dbpedia could either find a way of judging which ones are really rdf:type relationships, or just use something vaguer for the relationship.
Maybe wikepedia:category would be best as that is what it is in general.

Received on Sunday, 20 May 2007 17:58:21 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:00 UTC