RE: The Guardian Open Platform and Data Store

On Tue, 2009-03-10 at 22:16 +0100, Georgi Kobilarov wrote:
> Hi Andraz,
> 
> > We actually created mappings between Guardian tags and Linked Data
> > entities, if anyone wants to explore that path further...  (they use
> > controlled vocabulary).
> 
> very interesting, can you tell a little bit more about that? what kind
> of controlled vocabulary is that, which sources to you map to and how?

Well you can get a full set of tags from their API (around 7k of them).
We took that and a bit more info that we were provided with and
reconciliated those 'controlled tags' with DBpedia & Freebase. I am not
sure if we also used MusicBrainz and Semantic Crunchbase, can look it up
tomorrow.

So basically links between those 7k guardian tags and corresponding LOD
entities were established where possible. But there are some messy
details. What I can do is provide this mapping we created for anyone
interested (tomorrow). We then took different route to create a demo
with their api (http://labs.zemanta.com/guardian), so we didn't use
them.

How did we do it?
For each tag in vocabulary, we looked up Guardian stories tagged with it
in our aggregator (guardian puts those tags into their RSS, so they land
in our engine). This provided us with background knowledge about each
tag (= what kind of stories it was used for). Then we disambiguated the
tags (with that background knowledge) into LOD by calling Zemanta API.

However there are some messy details to take care of if anyone picks it
up from here.

bye
andraz


> Cheers,
> Georgi
> 
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
-- 
Andraz Tori, CTO
Zemanta Ltd, London, Ljubljana
www.zemanta.com
mail: andraz@zemanta.com
tel: +386 41 515 767
twitter: andraz, skype: minmax_test

Received on Tuesday, 10 March 2009 21:32:31 UTC