- From: Chris Bizer <chris@bizer.de>
- Date: Sun, 20 May 2007 19:58:04 +0200
- To: "SW-forum" <semantic-web@w3.org>, "Tim Berners-Lee" <timbl@w3.org>
- Message-ID: <009301c79b08$6a245c80$c4e84d57@named4gc1asnuj>
Hi Tim, > That was a great session in Banff. yes. I think we were pretty successful in bringing the concept of Linked Data to the conference participants attention. Our next steps, here in Berlin, will be to: - publish some more data sources as linked data (CIA Factbook, Project Gutenberg, Eurostat). - write a tutorial about how to publish information as Linked Data on the Web. - improve the quality of DBpedia data. Which brings me to your actual question. Yes, you are right dbpedia:Category:English_people_by_county shouldn't be a class. It currently is a class as there is still some old data loaded into the DBpedia SPARQL endpoint / linked data interface which shouldn't be there. The old data is not present in the RDF dumps and will also be removed form the endpoint with the next release of the DBpedia dataset. Our current approach to categorization/classification is to: - represent Wikipedia categories using the SKOS vocabulary. Therefore you find all the skos:subject statements in the dataset. - use rdf:type statements only for classes from the YAGO classification, which forms a proper hirarchie. An example of such a statement is the :TBL a </class/yago/person> statement you mention in your mail. We are currently working on refactoring the DBpedia information extraction process in order to improve data qualtiy and currency. For this, it is extremely helpful if DBpedia users report errors they discover in the dataset. Our preferred way for bug reports in the DBpedia bug tracker on sourceforge http://sourceforge.net/tracker/?atid=935520&group_id=190976&func=browse Cheers Chris -- Dr. Chris Bizer Freie Universität Berlin +49 30 838 54057 chris@bizer.de www.bizer.de ----- Original Message ----- From: Tim Berners-Lee To: SW-forum Sent: Sunday, May 20, 2007 7:07 PM Subject: Fwd: Linked data and rdf:type in dbpedia Begin forwarded message: From: Tim Berners-Lee <timbl@w3.org> Date: 2007-05-19 19:27:56 EDT To: Chris Bizer <chris@bizer.de> Cc: tabultor@csai.mit.edu Subject: Linked data and rdf:type in dbpedia Hi Chris. That was a great session in Banff. I'm looking now at a problem where the Tabulator sucks in huge amounts of dbpedia. The problem is rather random rdf:type links 1. My home page says: <http://dbpedia.org/resource/Tim_Berners-Lee> = card:i. 2. That causes tab'r to bring in http://dbpedia.org/resource/Tim_Berners-Lee which in turn says <http://dbpedia.org/resource/Tim_Berners-Lee> a <http://dbpedia.org/resource/Category:People_from_London>. 3. That causes Tab'r to look up the class Category:People_from_London $ cwm http://dbpedia.org/resource/Category:People_from_London This says a bunch of people whose have subject of that <http://dbpedia.org/resource/Catherine_of_York> :subject <> . which is fine, but it also says: <> a </class/yago/person>, <http://dbpedia.org/resource/Category:English_people_by_county>, <http://dbpedia.org/resource/Category:London>, <http://dbpedia.org/resource/Category:People_by_city_or_town_in_England>, :Concept. Here I think the use of rdf:type is incorrect. The class People_form_London is a class of people. It is a subclass of Person. It has no simple relationship to London. (It is in fact an owl:Restriction on property origin to value london, but I doubt if you can generalize that across dbpedia). English people by County *could* be a class of classes. The tabulator assumes that every time it follows rdf:type it is going meta: from classes to classes of classes, etc. It does this as in every other case so far, there have been only a few levels (like 2). Currently, it can't use dbpedia as it pulls it memory-busting amounts of it. It not even clear that the rdf:type links don't have cycles. Anyone using OWL with this data wil of course find t impossible to deal with classes of classes at all. I don't know to what extent the issue is an Example: me : Unitarian http://en.wikipedia.org/wiki/Tim_Berners-Lee is a member of the class of http://en.wikipedia.org/wiki/Category:Unitarian_Universalists this is a member of the metaclass: http://en.wikipedia.org/wiki/Category:People_by_religion this i member of the metametametaclass of ways in whcih people are categorized http://en.wikipedia.org/wiki/Category:People What follows here is the weak link. Reference is a section of the library. "This category is for information typically found in the reference section of a library: reference works." Now the meta meta class is regarded as a work? o-oh. http://en.wikipedia.org/wiki/Category:Reference it continues, following Category (rdf:type in dbpedia): http://en.wikipedia.org/wiki/Category:Knowledge http://en.wikipedia.org/wiki/Category:Information http://en.wikipedia.org/wiki/Category:Physical_quantity http://en.wikipedia.org/wiki/Category:Measurement http://en.wikipedia.org/wiki/Category:Scientific_observation http://en.wikipedia.org/wiki/Category:Data_collection http://en.wikipedia.org/wiki/Category:Data_management http://en.wikipedia.org/wiki/Category:Product_development http://en.wikipedia.org/wiki/Category:Product_management http://en.wikipedia.org/wiki/Category:Engineering http://en.wikipedia.org/wiki/Category:Applied_sciences http://en.wikipedia.org/wiki/Category:Science http://en.wikipedia.org/wiki/Category:Knowledge Ooops! It is cyclic. The logical relationships are not consistent. i don't know whether there are a finite number of categories for which rdfs:class does not work, which could be put into a stop list. "Reference" would be one. I wonder whether dbpedia could either find a way of judging which ones are really rdf:type relationships, or just use something vaguer for the relationship. Maybe wikepedia:category would be best as that is what it is in general. Tim
Received on Sunday, 20 May 2007 17:58:21 UTC