- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Wed, 13 Feb 2013 17:33:24 -0500
- To: "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <511C14B4.3050203@openlinksw.com>
FYI On 2/13/13 5:26 PM, paul@ontology2.com wrote: > A system called parallelSuperEyeball has been added to the freebase > processing chain. I took apart the parser from the Jena framework to > extract something that parses individual nodes in N-Triples files so > that invalid triples do not stop the triple parsing process. The > earlier partitionFreebaseRDF removes superfluous information and > reformats the data for scalable parallel processing. > I call the resulting product, which partitions valid and invalid > facts from Freebase, ":BaseKB Lime", and it's a refereshing > alternative to the difficulties that people have with off-brand > Linked Data products that don't conform to industry standards. > You can confirm these claim for yourself by downloading > https://github.com/paulhoule/infovore/archive/t20130213.tar.gz > cd infovore > mvn clean install > cd hydroxide-apps > mvn appassembler::assemble > cd .. > source ./hydroxide-apps/path.sh > export INFOVORE_BASE=/freebase/ > export INFOVORE_FREEBASE_FILE=/freebase/freebase-rdf-2013-01-27-00-00.gz > export INFOVORE_INSTANCE=2013-01-27 > mkdir /freebase/data/$INFOVORE_INSTANCE > partitionFreebaseRDF > superParallelEyeball > And then in /freebase/data/2013-01-27/work you'll find > baseKBLime -- 716 million valid triples to load in your RDF store or > otherwise use > baseKBLimeRejected -- 13 million invalid "triples" > freebase-raw-rejected.tsv -- quite literally a handful of completely > broken lines from the quad dump that don't even end with a period. > I'm planning on fine tuning the rules on what the first stage > accepts, getting a newer version of the quad dump, and publishing > :BaseKB Lime for download soon. > > > _______________________________________________ > You are receiving this message because you are subscribed to the Freebase-discuss mailing list. > To post a message to the list: Freebase-discuss@freebase.com > To unsubscribe, view archives, etc: http://lists.freebase.com/mailman/listinfo/freebase-discuss -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Wednesday, 13 February 2013 22:33:52 UTC