- From: Chris Bizer <chris@bizer.de>
- Date: Thu, 15 Apr 2010 16:40:32 +0200
- To: <public-lod@w3.org>, "'dbpedia-discussion'" <dbpedia-discussion@lists.sourceforge.net>
- Cc: "'Kingsley Idehen'" <kidehen@openlinksw.com>, "'Andy Seaborne'" <andy.seaborne@talis.com>
Hi all, > Great stuff, this is also why we are going to leave the current DBpedia > 3.5 instance to stew for a while (until end of this week or a little later). > > DBpedia users: > Now is the time to identify problems with the DBpedia 3.5 dataset dumps. > We don't want to continue reloading DBpedia (Static Edition and then > recalibrating DBpedia-Live) based on faulty datasets related matters, we > do have other operational priorities etc.. Yes, the testing by the community has exposed enough small and medium bugs in the datasets so that we are going to extract a new fixed 3.5.1. release next week. I'm my opinion the bugs do not impair Robert's and Anja's great achievement of porting the extraction framework from PHP to Scala. If you rewrite more than 10.000 lines of code for something as complex as a multilingual Wikipedia extraction, I think it is normal that some minor bugs remain even after their tough testing. So, if you have discovered additional bugs and want them fixed. Please report them to the DBpedia bug tracker until Friday EOB. http://sourceforge.net/tracker/?group_id=190976 Cheers, Chris > -----Ursprüngliche Nachricht----- > Von: public-lod-request@w3.org [mailto:public-lod-request@w3.org] Im Auftrag > von Kingsley Idehen > Gesendet: Donnerstag, 15. April 2010 15:44 > An: Andy Seaborne > Cc: public-lod@w3.org; dbpedia-discussion > Betreff: Re: DBpedia hosting burden > > Andy Seaborne wrote: > > I ran the files from > > http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt > > through an N-Triples parser with checking: > > > > The report is here (it's 25K lines long): > > > > http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt > > > > It covers both strict errors and warnings of ill-advised forms. > > > > A few examples: > > > > Bad IRI: <=?(''[[Nepenthes> > > Bad IRI: <http://www.european-athletics.org‎> > > > > Bad lexical forms for the value space: > > "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date > > (there is no February the 31st) > > > > > > Warning of well known ports of other protocols: > > http://stream1.securenetsystems.net:443 > > > > Warning about explicit about port 80: > > > > http://bibliotecadigitalhispanica.bne.es:80/ > > > > and use of . and .. in absolute URIs which are all from the standard > > list of IRI warnings. > > > > Bad IRI: <http://dbpedia.org/resource/..> Code: > > 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ > > not at the beginning of a relative reference, or it contains a /./ > > These should be removed. > > > > Andy > > > > Software used: > > > > The IRI checker, by Jeremy Carroll, is available from > > http://www.openjena.org/iri/ and Maven. > > > > The lexical form checking is done by Apache Xerces. > > > > The N-triples parser is the one from TDB v0.8.5 which bundles the > > above two together. > > > > > > On 15/04/2010 9:54 AM, Malte Kiesel wrote: > >> Ivan Mikhailov wrote: > >> > >>> If I were The Emperor of LOD I'd ask all grand dukes of datasources to > >>> put fresh dumps at some torrent with control of UL/DL ratio :) > >> > >> Last time I checked (which was quite a while ago though), loading > >> DBpedia in a normal triple store such as Jena TDB didn't work very well > >> due to many issues with the DBpedia RDF (e.g., problems with the URIs of > >> external links scraped from Wikipedia). > >> > >> I don't know whether this is a bug in TDB or DBpedia but I guess this is > >> one of the problems causing people to use DBpedia online only - even if, > >> due to performance reasons, running it locally would be far better. > >> > >> Regards > >> Malte > >> > > > > > Andy, > > Great stuff, this is also why we are going to leave the current DBpedia > 3.5 instance to stew for a while (until end of this week or a little later). > > DBpedia users: > Now is the time to identify problems with the DBpedia 3.5 dataset dumps. > We don't want to continue reloading DBpedia (Static Edition and then > recalibrating DBpedia-Live) based on faulty datasets related matters, we > do have other operational priorities etc.. > > > -- > > Regards, > > Kingsley Idehen > President & CEO > OpenLink Software > Web: http://www.openlinksw.com > Weblog: http://www.openlinksw.com/blog/~kidehen > Twitter/Identi.ca: kidehen > > > >
Received on Thursday, 15 April 2010 14:39:03 UTC