- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Thu, 15 Apr 2010 18:01:43 +0700
- To: Malte Kiesel <malte.kiesel@dfki.de>
- Cc: public-lod@w3.org, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
> Last time I checked (which was quite a while ago though), loading > DBpedia in a normal triple store such as Jena TDB didn't work very well > due to many issues with the DBpedia RDF (e.g., problems with the URIs of > external links scraped from Wikipedia). Agree. Common errors in LOD are: -- single quoted and double quoted strings with newlines; -- bnode predicates (but SPARQL processor may ignore them!); -- variables, but triples with variables are ignored; -- literal subjects, but triples with them are ignored; -- '/', '#', '%' and '+' in local part of QName ("Qname with path"); -- invalid symbols between '<' and '>', i.e. in relative IRIs. That's why my own TURTLE parser is configurable to selectively report or ignore these errors. In addition I can relax TURTLE syntax to include popular violations like redundant delimiters and/or try to recover from lexical errors as much as it is possible, even if I should lose some ill triples together with some limited number of proper triples around them ("GIGO mode", for "Garbage In Garbage Out"). Best Regards, Ivan Mikhailov OpenLink Software http://virtuoso.openlinksw.com
Received on Thursday, 15 April 2010 11:02:16 UTC