W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: DBpedia hosting burden

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Thu, 15 Apr 2010 18:01:43 +0700
To: Malte Kiesel <malte.kiesel@dfki.de>
Cc: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Message-Id: <1271329303.3114.967.camel@octo.iv.dev.null>
> Last time I checked (which was quite a while ago though), loading 
> DBpedia in a normal triple store such as Jena TDB didn't work very well 
> due to many issues with the DBpedia RDF (e.g., problems with the URIs of 
> external links scraped from Wikipedia).

Agree. Common errors in LOD are:

-- single quoted and double quoted strings with newlines;
-- bnode predicates (but SPARQL processor may ignore them!);
-- variables, but triples with variables are ignored;
-- literal subjects, but triples with them are ignored;
-- '/', '#', '%' and '+' in local part of QName ("Qname with path");
-- invalid symbols between '<' and '>', i.e. in relative IRIs.

That's why my own TURTLE parser is configurable to selectively report or
ignore these errors. In addition I can relax TURTLE syntax to include
popular violations like redundant delimiters and/or try to recover from
lexical errors as much as it is possible, even if I should lose some ill
triples together with some limited number of proper triples around them
("GIGO mode", for "Garbage In Garbage Out").

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com
Received on Thursday, 15 April 2010 11:02:16 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:26 UTC