Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

On 9/23/13 3:48 PM, Paul A. Houle wrote:
>     One of the goals of the infovore project is to develop something 
> that targets this latency problem.
> https://github.com/paulhoule/infovore/wiki
>     I’ve talked with a number of organizations that use DBpedia and 
> Freebase data and almost all of them have either no solution or an 
> incomplete solution for dealing with changes over time,  something 
> that’s absolutely necessary for sustainable social-semantic systems.  
> Many of them have considered developing it but decided against 
> developing it in house.

I bet they have :-)

>    When Freebase changed the format of the RDF dump I was able to 
> adapt in less than a week (most of the time delay was that no official 
> dump came out that week and I didn’t know what was going on);  after 
> fixing my code I was able to run against it interactively.
>    Infovore is not using Hadoop so much for “big data”, but rather for 
> “low latency”.  Not extremely low latency, but once I trust the system 
> enough it ought to have Freebase processed before I wake up on 
> Sunday.  The files are smaller than the official dump and will load 
> faster,  both things that will lower latency for the consumer.
>    Right now the process is limited by the not-so-parallel process of 
> ungzipping and re-gzipping the Freebase dump,  but I believe a 
> processing pipeline much more complex than the current one could still 
> be run in less than a hour if you throw enough AWS instances at it
>    The framework ought to work for any RDF data, including DBpedia 
> (for which it has been tested),  and I have a lot of stuff planned,  
> including something that could “smush” Dbpedia identifiers to Freebase 
> identifiers or the other way around to create a merged data set.

Nice!

>    Yes,  what I am doing today is much simpler than what DBpedia is 
> doing,  but I’m taking a multi-pronged approach that focuses on 
> process as much as technology.  I’m keeping a notebook of how much 
> time it takes me to do everything and learning how to squeeze out the 
> errors and waste time with a battery of methods that are being 
> documented.

Yes, that's the way to approach this matter. First pass, manual so you 
can get a good handle on the real time costs.

> It is possible to run clusters in Amazon EMR by simply providing a 
> credential pair – you don’t need to know much at all about AWS or Hadoop.
>     I invite all of you to follow the this project and github and also 
> follow the Google Group
> https://groups.google.com/forum/#!forum/infovore-basekb 
> <https://groups.google.com/forum/#%21forum/infovore-basekb>

I am following it.

>     where you’ll get roughly two status reports a week and where 
> people with questions get quick answers.
>      I can definitely use contributions too,  because the list of 
> things I’d like to see are long and my own work will be focused on my 
> own needs.  Even if you don’t contribute, I welcome feature requests 
> on the issue tracker.

This should be interesting to fellow DBpedia and LOD folk, for sure.

Kingsley
> *From:* Kingsley Idehen <mailto:kidehen@openlinksw.com>
> *Sent:* Monday, September 23, 2013 1:37 PM
> *To:* dbpedia-discussion@lists.sourceforge.net 
> <mailto:dbpedia-discussion@lists.sourceforge.net>
> *Subject:* Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, 
> including wider infobox coverage, additional type statements, and new 
> YAGO and Wikidata links
> On 9/23/13 1:00 PM, Tom Morris wrote:
>> Congratulations on the new release!
>> On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer <chris@bizer.de 
>> <mailto:chris@bizer.de>> wrote:
>>
>>
>>     1. the new release is based on updated Wikipedia dumps dating
>>     from March /
>>     April 2013 (the 3.8 release was based on dumps from June 2012),
>>     leading to
>>     an overall increase in the number of concepts in the English
>>     edition from
>>     3.7 to 4.0 million things.
>>
>> What accounts for the long latency between the date of the dumps and 
>> the date of the release?
>> Tom
>
> A number of things:
>
> 1. Dataset QA -- the datasets are generated from mapping efforts
> 2. Dataset Loading & QA
>     -- Linked Data Deployment (i.e., new URIs resolve to the new data)
>     -- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .
>
>
> Kingsley
>>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>>
>>
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
> -- 
>
> Regards,
>
> Kingsley Idehen 
> Founder & CEO
> OpenLink Software
> Company Web:http://www.openlinksw.com
> Personal Weblog:http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile:https://plus.google.com/112399767740508618350/about
> LinkedIn Profile:http://www.linkedin.com/in/kidehen
>
>
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the 
> most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk 
>
>
> ------------------------------------------------------------------------
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


-- 

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Monday, 23 September 2013 22:12:45 UTC