W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: DBpedia hosting burden

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Thu, 15 Apr 2010 15:08:02 +0100
Message-ID: <4BC71DC2.4090002@talis.com>
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>


On 15/04/2010 2:44 PM, Kingsley Idehen wrote:
> Andy,
>
> Great stuff, this is also why we are going to leave the current DBpedia
> 3.5 instance to stew for a while (until end of this week or a little
> later).
>
> DBpedia users:
> Now is the time to identify problems with the DBpedia 3.5 dataset dumps.
> We don't want to continue reloading DBpedia (Static Edition and then
> recalibrating DBpedia-Live) based on faulty datasets related matters, we
> do have other operational priorities etc..

"Faulty" is a bit strong.

Many of the warnings are legal RDF, but bad lexical forms for the 
datatype, or IRIs that trigger some of the standard warnings (but they 
are still legal IRIs).  Should they be included or not? Seems to me you 
can argue both for and against.

external_links_en.nt.bz2  is the largest source of broken IRIs.

DBpedia is a wonderful and important dataset, and being derived from 
elsewhere is unlikely to ever be "perfect" (for some definition of 
"perfect").  Better to have the data than to wait for perfection.

	Andy
Received on Thursday, 15 April 2010 14:08:29 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:26 UTC