Re: General tuning for Dbpedia Spotlight

Hugh, my difficulties with this service have been related to looking up place references.  Unfortunately, the service seems to privilege entities with more inbound links than text distance matching.

One suggested solution was to do lookups in Freebase, which seems to do a better job with the text matching, and pull DBpedia URIs from the descriptions retrieved from their entities.

Cheers,

  Sands Fish
  Data Scientist / Software Engineer
  MIT Libraries
  sands@mit.edu



________________________________
From: Hugh Glaser Hugh Glaser<mailto:hugh@glasers.org>
Date: January 16, 2014 at 6:33:50 AM
To: public-lod community public-lod@w3.org<mailto:public-lod@w3.org>
Subject:  General tuning for Dbpedia Spotlight
Hi.
I am trying to use Dbpedia Spotlight to find stuff in arbitrary English texts.
Following the instructions, I found it very easy to download and install the whole shebang on my Mac laptop - thanks!
It does pretty well in finding stuff, but gets some strange things wrong for me (choosing people called Monday instead of the day of the week, for example, or Municipalities of Germany for Municipalities).
That’s fine - I understand that there is always a precision/recall thing going on.
But I want to use it to mark up web pages, so having even a small number of strange links is not too good.

So my question is:
What are the parameters I should set to get a set of results with high precision (even if low recall) for arbitrary English text?
I assume that I need to set Confidence and Annotation Score, and probably some Types.

Related to this, I am using the Lucene version. I see there is a Statistical version, but can’t work out what the difference might be. Should I be using that to get more precise results?

Sorry if this is somewhere in the docs, but I couldn’t find it easily.
My guess is that this is something that quite a few people have been through?

I am using it from php via http, if anyone can actually provide the code! :-)

Best
Hugh

Received on Thursday, 16 January 2014 18:57:09 UTC