W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: DBpedia hosting burden

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Thu, 15 Apr 2010 18:01:43 +0700
To: Malte Kiesel <malte.kiesel@dfki.de>
Cc: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Message-Id: <1271329303.3114.967.camel@octo.iv.dev.null>
> Last time I checked (which was quite a while ago though), loading 
> DBpedia in a normal triple store such as Jena TDB didn't work very well 
> due to many issues with the DBpedia RDF (e.g., problems with the URIs of 
> external links scraped from Wikipedia).

Agree. Common errors in LOD are:

-- single quoted and double quoted strings with newlines;
-- bnode predicates (but SPARQL processor may ignore them!);
-- variables, but triples with variables are ignored;
-- literal subjects, but triples with them are ignored;
-- '/', '#', '%' and '+' in local part of QName ("Qname with path");
-- invalid symbols between '<' and '>', i.e. in relative IRIs.

That's why my own TURTLE parser is configurable to selectively report or
ignore these errors. In addition I can relax TURTLE syntax to include
popular violations like redundant delimiters and/or try to recover from
lexical errors as much as it is possible, even if I should lose some ill
triples together with some limited number of proper triples around them
("GIGO mode", for "Garbage In Garbage Out").

Best Regards,

Ivan Mikhailov
OpenLink Software
Received on Thursday, 15 April 2010 11:02:16 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:05 UTC