- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 15 Apr 2010 10:56:50 -0400
- To: Chris Bizer <chris@bizer.de>
- CC: public-lod@w3.org, 'dbpedia-discussion' <dbpedia-discussion@lists.sourceforge.net>, 'Andy Seaborne' <andy.seaborne@talis.com>
Chris Bizer wrote: > Hi all, > > >> Great stuff, this is also why we are going to leave the current DBpedia >> 3.5 instance to stew for a while (until end of this week or a little later). >> >> DBpedia users: >> Now is the time to identify problems with the DBpedia 3.5 dataset dumps. >> We don't want to continue reloading DBpedia (Static Edition and then >> recalibrating DBpedia-Live) based on faulty datasets related matters, we >> do have other operational priorities etc.. >> > > Yes, the testing by the community has exposed enough small and medium bugs in the datasets so that we are going to extract a new fixed 3.5.1. release next week. > > I'm my opinion the bugs do not impair Robert's and Anja's great achievement of porting the extraction framework from PHP to Scala. Oh! Certainly not! That is a major contribution etc.. > If you rewrite more than 10.000 lines of code for something as complex as a multilingual Wikipedia extraction, I think it is normal that some minor bugs remain even after their tough testing. > Of course. > So, if you have discovered additional bugs and want them fixed. > > Please report them to the DBpedia bug tracker until Friday EOB. > > http://sourceforge.net/tracker/?group_id=190976 > Yes, and then we can schedule a reload such that 3.5.1 is live come Monday (maybe even earlier). Kingsley > > Cheers, > > Chris > > > >> -----Ursprüngliche Nachricht----- >> Von: public-lod-request@w3.org [mailto:public-lod-request@w3.org] Im Auftrag >> von Kingsley Idehen >> Gesendet: Donnerstag, 15. April 2010 15:44 >> An: Andy Seaborne >> Cc: public-lod@w3.org; dbpedia-discussion >> Betreff: Re: DBpedia hosting burden >> >> Andy Seaborne wrote: >> >>> I ran the files from >>> http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt >>> through an N-Triples parser with checking: >>> >>> The report is here (it's 25K lines long): >>> >>> http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt >>> >>> It covers both strict errors and warnings of ill-advised forms. >>> >>> A few examples: >>> >>> Bad IRI: <=?(''[[Nepenthes> >>> Bad IRI: <http://www.european-athletics.org‎> >>> >>> Bad lexical forms for the value space: >>> "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date >>> (there is no February the 31st) >>> >>> >>> Warning of well known ports of other protocols: >>> http://stream1.securenetsystems.net:443 >>> >>> Warning about explicit about port 80: >>> >>> http://bibliotecadigitalhispanica.bne.es:80/ >>> >>> and use of . and .. in absolute URIs which are all from the standard >>> list of IRI warnings. >>> >>> Bad IRI: <http://dbpedia.org/resource/..> Code: >>> 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ >>> not at the beginning of a relative reference, or it contains a /./ >>> These should be removed. >>> >>> Andy >>> >>> Software used: >>> >>> The IRI checker, by Jeremy Carroll, is available from >>> http://www.openjena.org/iri/ and Maven. >>> >>> The lexical form checking is done by Apache Xerces. >>> >>> The N-triples parser is the one from TDB v0.8.5 which bundles the >>> above two together. >>> >>> >>> On 15/04/2010 9:54 AM, Malte Kiesel wrote: >>> >>>> Ivan Mikhailov wrote: >>>> >>>> >>>>> If I were The Emperor of LOD I'd ask all grand dukes of datasources to >>>>> put fresh dumps at some torrent with control of UL/DL ratio :) >>>>> >>>> Last time I checked (which was quite a while ago though), loading >>>> DBpedia in a normal triple store such as Jena TDB didn't work very well >>>> due to many issues with the DBpedia RDF (e.g., problems with the URIs of >>>> external links scraped from Wikipedia). >>>> >>>> I don't know whether this is a bug in TDB or DBpedia but I guess this is >>>> one of the problems causing people to use DBpedia online only - even if, >>>> due to performance reasons, running it locally would be far better. >>>> >>>> Regards >>>> Malte >>>> >>>> >>> >> Andy, >> >> Great stuff, this is also why we are going to leave the current DBpedia >> 3.5 instance to stew for a while (until end of this week or a little later). >> >> DBpedia users: >> Now is the time to identify problems with the DBpedia 3.5 dataset dumps. >> We don't want to continue reloading DBpedia (Static Edition and then >> recalibrating DBpedia-Live) based on faulty datasets related matters, we >> do have other operational priorities etc.. >> >> >> -- >> >> Regards, >> >> Kingsley Idehen >> President & CEO >> OpenLink Software >> Web: http://www.openlinksw.com >> Weblog: http://www.openlinksw.com/blog/~kidehen >> Twitter/Identi.ca: kidehen >> >> >> >> >> > > > > -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Thursday, 15 April 2010 14:57:19 UTC