- From: Andraz Tori <andraz@zemanta.com>
- Date: Tue, 10 Mar 2009 22:31:47 +0100
- To: Georgi Kobilarov <georgi.kobilarov@gmx.de>
- Cc: John Goodwin <John.Goodwin@ordnancesurvey.co.uk>, public-lod@w3.org
On Tue, 2009-03-10 at 22:16 +0100, Georgi Kobilarov wrote: > Hi Andraz, > > > We actually created mappings between Guardian tags and Linked Data > > entities, if anyone wants to explore that path further... (they use > > controlled vocabulary). > > very interesting, can you tell a little bit more about that? what kind > of controlled vocabulary is that, which sources to you map to and how? Well you can get a full set of tags from their API (around 7k of them). We took that and a bit more info that we were provided with and reconciliated those 'controlled tags' with DBpedia & Freebase. I am not sure if we also used MusicBrainz and Semantic Crunchbase, can look it up tomorrow. So basically links between those 7k guardian tags and corresponding LOD entities were established where possible. But there are some messy details. What I can do is provide this mapping we created for anyone interested (tomorrow). We then took different route to create a demo with their api (http://labs.zemanta.com/guardian), so we didn't use them. How did we do it? For each tag in vocabulary, we looked up Guardian stories tagged with it in our aggregator (guardian puts those tags into their RSS, so they land in our engine). This provided us with background knowledge about each tag (= what kind of stories it was used for). Then we disambiguated the tags (with that background knowledge) into LOD by calling Zemanta API. However there are some messy details to take care of if anyone picks it up from here. bye andraz > Cheers, > Georgi > > -- > Georgi Kobilarov > Freie Universität Berlin > www.georgikobilarov.com -- Andraz Tori, CTO Zemanta Ltd, London, Ljubljana www.zemanta.com mail: andraz@zemanta.com tel: +386 41 515 767 twitter: andraz, skype: minmax_test
Received on Tuesday, 10 March 2009 21:32:31 UTC