- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 15 Apr 2010 09:44:20 -0400
- To: Andy Seaborne <andy.seaborne@talis.com>
- CC: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Andy Seaborne wrote: > I ran the files from > http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt > through an N-Triples parser with checking: > > The report is here (it's 25K lines long): > > http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt > > It covers both strict errors and warnings of ill-advised forms. > > A few examples: > > Bad IRI: <=?(''[[Nepenthes> > Bad IRI: <http://www.european-athletics.org> > > Bad lexical forms for the value space: > "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date > (there is no February the 31st) > > > Warning of well known ports of other protocols: > http://stream1.securenetsystems.net:443 > > Warning about explicit about port 80: > > http://bibliotecadigitalhispanica.bne.es:80/ > > and use of . and .. in absolute URIs which are all from the standard > list of IRI warnings. > > Bad IRI: <http://dbpedia.org/resource/..> Code: > 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ > not at the beginning of a relative reference, or it contains a /./ > These should be removed. > > Andy > > Software used: > > The IRI checker, by Jeremy Carroll, is available from > http://www.openjena.org/iri/ and Maven. > > The lexical form checking is done by Apache Xerces. > > The N-triples parser is the one from TDB v0.8.5 which bundles the > above two together. > > > On 15/04/2010 9:54 AM, Malte Kiesel wrote: >> Ivan Mikhailov wrote: >> >>> If I were The Emperor of LOD I'd ask all grand dukes of datasources to >>> put fresh dumps at some torrent with control of UL/DL ratio :) >> >> Last time I checked (which was quite a while ago though), loading >> DBpedia in a normal triple store such as Jena TDB didn't work very well >> due to many issues with the DBpedia RDF (e.g., problems with the URIs of >> external links scraped from Wikipedia). >> >> I don't know whether this is a bug in TDB or DBpedia but I guess this is >> one of the problems causing people to use DBpedia online only - even if, >> due to performance reasons, running it locally would be far better. >> >> Regards >> Malte >> > > Andy, Great stuff, this is also why we are going to leave the current DBpedia 3.5 instance to stew for a while (until end of this week or a little later). DBpedia users: Now is the time to identify problems with the DBpedia 3.5 dataset dumps. We don't want to continue reloading DBpedia (Static Edition and then recalibrating DBpedia-Live) based on faulty datasets related matters, we do have other operational priorities etc.. -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Thursday, 15 April 2010 13:44:52 UTC