W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: DBpedia hosting burden

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 15 Apr 2010 09:44:20 -0400
Message-ID: <4BC71834.4000202@openlinksw.com>
To: Andy Seaborne <andy.seaborne@talis.com>
CC: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Andy Seaborne wrote:
> I ran the files from 
> http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt 
> through an N-Triples parser with checking:
> The report is here (it's 25K lines long):
> http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt
> It covers both strict errors and warnings of ill-advised forms.
> A few examples:
> Bad IRI: <=?(''[[Nepenthes>
> Bad IRI: <http://www.european-athletics.org‎>
> Bad lexical forms for the value space:
> "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date
> (there is no February the 31st)
> Warning of well known ports of other protocols:
> http://stream1.securenetsystems.net:443
> Warning about explicit about port 80:
> http://bibliotecadigitalhispanica.bne.es:80/
> and use of . and .. in absolute URIs which are all from the standard 
> list of IRI warnings.
> Bad IRI: <http://dbpedia.org/resource/..> Code: 
> 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ 
> not at the beginning of a relative reference, or it contains a /./ 
> These should be removed.
>     Andy
> Software used:
> The IRI checker, by Jeremy Carroll, is available from
> http://www.openjena.org/iri/ and Maven.
> The lexical form checking is done by Apache Xerces.
> The N-triples parser is the one from TDB v0.8.5 which bundles the 
> above two together.
> On 15/04/2010 9:54 AM, Malte Kiesel wrote:
>> Ivan Mikhailov wrote:
>>> If I were The Emperor of LOD I'd ask all grand dukes of datasources to
>>> put fresh dumps at some torrent with control of UL/DL ratio :)
>> Last time I checked (which was quite a while ago though), loading
>> DBpedia in a normal triple store such as Jena TDB didn't work very well
>> due to many issues with the DBpedia RDF (e.g., problems with the URIs of
>> external links scraped from Wikipedia).
>> I don't know whether this is a bug in TDB or DBpedia but I guess this is
>> one of the problems causing people to use DBpedia online only - even if,
>> due to performance reasons, running it locally would be far better.
>> Regards
>> Malte

Great stuff, this is also why we are going to leave the current DBpedia 
3.5 instance to stew for a while (until end of this week or a little later).

DBpedia users:
Now is the time to identify problems with the DBpedia 3.5 dataset dumps. 
We don't want to continue reloading DBpedia (Static Edition and then 
recalibrating DBpedia-Live) based on faulty datasets related matters, we 
do have other operational priorities etc..



Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 
Received on Thursday, 15 April 2010 13:44:52 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:05 UTC