- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 23 Sep 2013 18:12:20 -0400
- To: "Paul A. Houle" <paul@ontology2.com>
- CC: dbpedia-discussion@lists.sourceforge.net, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <5240BCC4.7030905@openlinksw.com>
On 9/23/13 3:48 PM, Paul A. Houle wrote: > One of the goals of the infovore project is to develop something > that targets this latency problem. > https://github.com/paulhoule/infovore/wiki > I’ve talked with a number of organizations that use DBpedia and > Freebase data and almost all of them have either no solution or an > incomplete solution for dealing with changes over time, something > that’s absolutely necessary for sustainable social-semantic systems. > Many of them have considered developing it but decided against > developing it in house. I bet they have :-) > When Freebase changed the format of the RDF dump I was able to > adapt in less than a week (most of the time delay was that no official > dump came out that week and I didn’t know what was going on); after > fixing my code I was able to run against it interactively. > Infovore is not using Hadoop so much for “big data”, but rather for > “low latency”. Not extremely low latency, but once I trust the system > enough it ought to have Freebase processed before I wake up on > Sunday. The files are smaller than the official dump and will load > faster, both things that will lower latency for the consumer. > Right now the process is limited by the not-so-parallel process of > ungzipping and re-gzipping the Freebase dump, but I believe a > processing pipeline much more complex than the current one could still > be run in less than a hour if you throw enough AWS instances at it > The framework ought to work for any RDF data, including DBpedia > (for which it has been tested), and I have a lot of stuff planned, > including something that could “smush” Dbpedia identifiers to Freebase > identifiers or the other way around to create a merged data set. Nice! > Yes, what I am doing today is much simpler than what DBpedia is > doing, but I’m taking a multi-pronged approach that focuses on > process as much as technology. I’m keeping a notebook of how much > time it takes me to do everything and learning how to squeeze out the > errors and waste time with a battery of methods that are being > documented. Yes, that's the way to approach this matter. First pass, manual so you can get a good handle on the real time costs. > It is possible to run clusters in Amazon EMR by simply providing a > credential pair – you don’t need to know much at all about AWS or Hadoop. > I invite all of you to follow the this project and github and also > follow the Google Group > https://groups.google.com/forum/#!forum/infovore-basekb > <https://groups.google.com/forum/#%21forum/infovore-basekb> I am following it. > where you’ll get roughly two status reports a week and where > people with questions get quick answers. > I can definitely use contributions too, because the list of > things I’d like to see are long and my own work will be focused on my > own needs. Even if you don’t contribute, I welcome feature requests > on the issue tracker. This should be interesting to fellow DBpedia and LOD folk, for sure. Kingsley > *From:* Kingsley Idehen <mailto:kidehen@openlinksw.com> > *Sent:* Monday, September 23, 2013 1:37 PM > *To:* dbpedia-discussion@lists.sourceforge.net > <mailto:dbpedia-discussion@lists.sourceforge.net> > *Subject:* Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, > including wider infobox coverage, additional type statements, and new > YAGO and Wikidata links > On 9/23/13 1:00 PM, Tom Morris wrote: >> Congratulations on the new release! >> On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer <chris@bizer.de >> <mailto:chris@bizer.de>> wrote: >> >> >> 1. the new release is based on updated Wikipedia dumps dating >> from March / >> April 2013 (the 3.8 release was based on dumps from June 2012), >> leading to >> an overall increase in the number of concepts in the English >> edition from >> 3.7 to 4.0 million things. >> >> What accounts for the long latency between the date of the dumps and >> the date of the release? >> Tom > > A number of things: > > 1. Dataset QA -- the datasets are generated from mapping efforts > 2. Dataset Loading & QA > -- Linked Data Deployment (i.e., new URIs resolve to the new data) > -- SPARQL Endpoint (new data is accessible via SPARQL endpoint) . > > > Kingsley >> >> >> ------------------------------------------------------------------------------ >> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! >> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint >> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes >> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. >> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk >> >> >> _______________________________________________ >> Dbpedia-discussion mailing list >> Dbpedia-discussion@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > -- > > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Company Web:http://www.openlinksw.com > Personal Weblog:http://www.openlinksw.com/blog/~kidehen > Twitter/Identi.ca handle: @kidehen > Google+ Profile:https://plus.google.com/112399767740508618350/about > LinkedIn Profile:http://www.linkedin.com/in/kidehen > > > > > ------------------------------------------------------------------------ > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the > most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > > > ------------------------------------------------------------------------ > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 23 September 2013 22:12:45 UTC