Re: DBpedia hosting burden

> Last time I checked (which was quite a while ago though), loading 
> DBpedia in a normal triple store such as Jena TDB didn't work very well 
> due to many issues with the DBpedia RDF (e.g., problems with the URIs of 
> external links scraped from Wikipedia).

Agree. Common errors in LOD are:

-- single quoted and double quoted strings with newlines;
-- bnode predicates (but SPARQL processor may ignore them!);
-- variables, but triples with variables are ignored;
-- literal subjects, but triples with them are ignored;
-- '/', '#', '%' and '+' in local part of QName ("Qname with path");
-- invalid symbols between '<' and '>', i.e. in relative IRIs.

That's why my own TURTLE parser is configurable to selectively report or
ignore these errors. In addition I can relax TURTLE syntax to include
popular violations like redundant delimiters and/or try to recover from
lexical errors as much as it is possible, even if I should lose some ill
triples together with some limited number of proper triples around them
("GIGO mode", for "Garbage In Garbage Out").

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

Received on Thursday, 15 April 2010 11:02:16 UTC