- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Thu, 15 Apr 2010 18:01:43 +0700
- To: Malte Kiesel <malte.kiesel@dfki.de>
- Cc: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
> Last time I checked (which was quite a while ago though), loading
> DBpedia in a normal triple store such as Jena TDB didn't work very well
> due to many issues with the DBpedia RDF (e.g., problems with the URIs of
> external links scraped from Wikipedia).
Agree. Common errors in LOD are:
-- single quoted and double quoted strings with newlines;
-- bnode predicates (but SPARQL processor may ignore them!);
-- variables, but triples with variables are ignored;
-- literal subjects, but triples with them are ignored;
-- '/', '#', '%' and '+' in local part of QName ("Qname with path");
-- invalid symbols between '<' and '>', i.e. in relative IRIs.
That's why my own TURTLE parser is configurable to selectively report or
ignore these errors. In addition I can relax TURTLE syntax to include
popular violations like redundant delimiters and/or try to recover from
lexical errors as much as it is possible, even if I should lose some ill
triples together with some limited number of proper triples around them
("GIGO mode", for "Garbage In Garbage Out").
Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com
Received on Thursday, 15 April 2010 11:02:16 UTC