W3C home > Mailing lists > Public > semantic-web@w3.org > May 2007

Fwd: Linked data and rdf:type in dbpedia

From: Tim Berners-Lee <timbl@w3.org>
Date: Sun, 20 May 2007 13:07:50 -0400
To: SW-forum <semantic-web@w3.org>
Message-Id: <2519F1D0-9C45-4A8C-9FCF-400BC2DF361E@w3.org>


Begin forwarded message:

From: Tim Berners-Lee <timbl@w3.org>
Date: 2007-05-19 19:27:56 EDT
To: Chris Bizer <chris@bizer.de>
Cc: tabultor@csai.mit.edu
Subject: Linked data and rdf:type in dbpedia

Hi Chris.

That was a great session in Banff.

I'm looking now at a problem where the Tabulator sucks in huge  
amounts of dbpedia.  The problem is rather random rdf:type links

1. My home page says:

	<http://dbpedia.org/resource/Tim_Berners-Lee> = card:i.

2. That causes tab'r to bring in http://dbpedia.org/resource/ 
Tim_Berners-Lee
which in turn says

    <http://dbpedia.org/resource/Tim_Berners-Lee>     a   <http:// 
dbpedia.org/resource/Category:People_from_London>.

3. That causes Tab'r to look up the class Category:People_from_London

$ cwm http://dbpedia.org/resource/Category:People_from_London

This says a bunch of people  whose have subject of that
    <http://dbpedia.org/resource/Catherine_of_York>     :subject <> .
which is fine, but it also says:

    <>     a </class/yago/person>,
                 <http://dbpedia.org/resource/ 
Category:English_people_by_county>,
                 <http://dbpedia.org/resource/Category:London>,
	<http://dbpedia.org/resource/ 
Category:People_by_city_or_town_in_England>,
                 :Concept.

   Here I think the use of rdf:type is incorrect.   The class  
People_form_London is a class of people.  It is a subclass of Person.

   It has no simple relationship to London. (It is in fact an  
owl:Restriction on property origin to value london, but I doubt if  
you can generalize that across dbpedia).

   English people by County  *could* be a class of classes.


The tabulator assumes that every time it follows rdf:type it is going  
meta: from classes to classes of classes, etc.  It does this as in  
every other case so far, there have been only a few levels (like 2).

Currently, it can't use dbpedia as it pulls it memory-busting amounts  
of it.  It not even clear that the rdf:type links don't have cycles.

Anyone using OWL with this data wil of course find t impossible to  
deal with classes of classes at all.  I don't know to what extent the  
issue is an


Example:

me : Unitarian
http://en.wikipedia.org/wiki/Tim_Berners-Lee

     is a member of the class of

http://en.wikipedia.org/wiki/Category:Unitarian_Universalists

     this is a member of the metaclass:

http://en.wikipedia.org/wiki/Category:People_by_religion

     this i member of the metametametaclass of ways in whcih people  
are categorized

http://en.wikipedia.org/wiki/Category:People

    What follows here is the weak link.  Reference is a section of  
the library.
     "This category is for information typically found in the  
reference section of a library: reference works."  Now the meta meta  
class  is regarded as a work? o-oh.

http://en.wikipedia.org/wiki/Category:Reference

    it continues, following Category (rdf:type in dbpedia):

http://en.wikipedia.org/wiki/Category:Knowledge
http://en.wikipedia.org/wiki/Category:Information
http://en.wikipedia.org/wiki/Category:Physical_quantity
http://en.wikipedia.org/wiki/Category:Measurement
http://en.wikipedia.org/wiki/Category:Scientific_observation
http://en.wikipedia.org/wiki/Category:Data_collection
http://en.wikipedia.org/wiki/Category:Data_management
http://en.wikipedia.org/wiki/Category:Product_development
http://en.wikipedia.org/wiki/Category:Product_management
http://en.wikipedia.org/wiki/Category:Engineering
http://en.wikipedia.org/wiki/Category:Applied_sciences
http://en.wikipedia.org/wiki/Category:Science
http://en.wikipedia.org/wiki/Category:Knowledge

Ooops! It is cyclic.

The logical relationships are not consistent.  i don't know whether  
there are a finite number of
categories for which rdfs:class does not work, which could be put  
into a stop list.  "Reference" would be one.

I wonder whether dbpedia could either find a way of judging which  
ones are really rdf:type relationships, or just use something vaguer  
for the relationship.
Maybe wikepedia:category would be best as that is what it is in general.

Tim






Received on Sunday, 20 May 2007 17:07:59 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:00 UTC