- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Tue, 09 Apr 2013 21:36:39 -0400
- To: "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <5164C227.4050501@openlinksw.com>
FYI On 4/9/13 8:01 PM, Paul A. Houle wrote: > PRESS RELEASE > Paul Houle, Ontology2 founder, stated that "we updated Infovore to > accept > data from DBpedia, and ran a head to head test, in terms of RDF > validity, > between Freebase and DBpedia Live." > "Unlike most scientific results", he said, "these results are > repeatable, > because you can reproduce them yourself with Infovore 1.1. I > encourage you > to use this tool to put other RDF data sets, large and small, to the > test." > The tool parallelSuperEyeball was run against both the 2013-03-31 Freebase > RDF dump and the 2012-04-30 edition of DBpedia Live. > Although Freebase asserts roughly 1.2 billion facts, Infovore rejects > roughly > 200 million useless facts in pre-filtering. Downstream of that we found > 944,909,025 valid facts and than 66,781,906 invalid facts, in addition to > 5 especially malformed facts. > This is a serious regression compared to the 2013-01-27 RDF dump, in > which > only about 13 million invalid triples were discovered. The main cause > of the > increase is the introduction of 40 million or so "triples" lacking an > object > connected with the predictate ns:common.topic.notable_for. Previously, > the bulk of the invalid triples were incorrectly formatted dates. > The rate of invalid triples in Dbpedia Live was found to be orders of > magnitude > less than Freebase. > Only 8,664 invalid facts were found in DBpedia Live, compared to > 247,557,030 > valid facts. The predominant problem in DBpedia Live turned out to be > noncomfortmant IRIs that came in from Wikipedia. This is comparable in > magnitude to the number of facts found invalid in the old Freebase > quad dump > in the process of creating :BaseKB Pro. > Just one of the tools included with Infovore, parallelSuperEyeball is an > industrial strength RDF validator that uses streaming processing and > the Map/Reduce > paradigm to attain nearly perfect parallel speedup at many tasks on common > four core computers. Infovore 1.1 brings many improvements, > including a threefold > speedup of parallelSuperEyeball and the new Infovore shell. > Please take a look at our github project at > https://github.com/paulhoule/infovore/wiki > and feel free to fork or star it. Note that many infovore data > products are > also available at > http://basekb.com/ > Because infovore is memory efficient, it is possible to use it to > handle much > large data sets than can be kept in a triple store on any given > hardware. The > main limitation in handling large RDF data sets is running out of disk > space, > which it can do quickly by avoiding random access I/O. > "We challenge RDF data providers to put their data to the test", said > Paul > Houle, "Today it's an expectation that people and organizations > publish only > valid XML files, and the publication of superParallelEyeball is a step to > a world that speaks valid RDF and that can clean and repair invalid > files." > Ontology2 is a privately held company that develops web sites and data > products > based on Freebase, DBpedia, and other sources. Contact > paul@ontology2.com with > questions about Ontology2 products and services. -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Wednesday, 10 April 2013 01:37:02 UTC