W3C home > Mailing lists > Public > public-lod@w3.org > March 2009

RE: The Guardian Open Platform and Data Store

From: Andraz Tori <andraz@zemanta.com>
Date: Tue, 10 Mar 2009 22:31:47 +0100
To: Georgi Kobilarov <georgi.kobilarov@gmx.de>
Cc: John Goodwin <John.Goodwin@ordnancesurvey.co.uk>, public-lod@w3.org
Message-Id: <1236720707.29197.599.camel@minmax-laptop>
On Tue, 2009-03-10 at 22:16 +0100, Georgi Kobilarov wrote:
> Hi Andraz,
> > We actually created mappings between Guardian tags and Linked Data
> > entities, if anyone wants to explore that path further...  (they use
> > controlled vocabulary).
> very interesting, can you tell a little bit more about that? what kind
> of controlled vocabulary is that, which sources to you map to and how?

Well you can get a full set of tags from their API (around 7k of them).
We took that and a bit more info that we were provided with and
reconciliated those 'controlled tags' with DBpedia & Freebase. I am not
sure if we also used MusicBrainz and Semantic Crunchbase, can look it up

So basically links between those 7k guardian tags and corresponding LOD
entities were established where possible. But there are some messy
details. What I can do is provide this mapping we created for anyone
interested (tomorrow). We then took different route to create a demo
with their api (http://labs.zemanta.com/guardian), so we didn't use

How did we do it?
For each tag in vocabulary, we looked up Guardian stories tagged with it
in our aggregator (guardian puts those tags into their RSS, so they land
in our engine). This provided us with background knowledge about each
tag (= what kind of stories it was used for). Then we disambiguated the
tags (with that background knowledge) into LOD by calling Zemanta API.

However there are some messy details to take care of if anyone picks it
up from here.


> Cheers,
> Georgi
> --
> Georgi Kobilarov
> Freie Universit├Ąt Berlin
> www.georgikobilarov.com
Andraz Tori, CTO
Zemanta Ltd, London, Ljubljana
mail: andraz@zemanta.com
tel: +386 41 515 767
´╗┐twitter: andraz, skype: minmax_test
Received on Tuesday, 10 March 2009 21:32:31 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:55 UTC