- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Thu, 15 Apr 2010 13:36:04 +0100
- To: public-lod@w3.org
- CC: dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
I ran the files from http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt through an N-Triples parser with checking: The report is here (it's 25K lines long): http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt It covers both strict errors and warnings of ill-advised forms. A few examples: Bad IRI: <=?(''[[Nepenthes> Bad IRI: <http://www.european-athletics.org> Bad lexical forms for the value space: "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date (there is no February the 31st) Warning of well known ports of other protocols: http://stream1.securenetsystems.net:443 Warning about explicit about port 80: http://bibliotecadigitalhispanica.bne.es:80/ and use of . and .. in absolute URIs which are all from the standard list of IRI warnings. Bad IRI: <http://dbpedia.org/resource/..> Code: 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not at the beginning of a relative reference, or it contains a /./ These should be removed. Andy Software used: The IRI checker, by Jeremy Carroll, is available from http://www.openjena.org/iri/ and Maven. The lexical form checking is done by Apache Xerces. The N-triples parser is the one from TDB v0.8.5 which bundles the above two together. On 15/04/2010 9:54 AM, Malte Kiesel wrote: > Ivan Mikhailov wrote: > >> If I were The Emperor of LOD I'd ask all grand dukes of datasources to >> put fresh dumps at some torrent with control of UL/DL ratio :) > > Last time I checked (which was quite a while ago though), loading > DBpedia in a normal triple store such as Jena TDB didn't work very well > due to many issues with the DBpedia RDF (e.g., problems with the URIs of > external links scraped from Wikipedia). > > I don't know whether this is a bug in TDB or DBpedia but I guess this is > one of the problems causing people to use DBpedia online only - even if, > due to performance reasons, running it locally would be far better. > > Regards > Malte >
Received on Thursday, 15 April 2010 12:36:33 UTC