General tuning for Dbpedia Spotlight

-----Original Message-----
From: Margaret Warren [mailto:mm@zeroexp.com] 
Sent: Friday, January 17, 2014 4:59 PM
To: 'Hugh Glaser'
Subject: RE: General tuning for Dbpedia Spotlight

Hi Hugh, 

You can try out various word combinations..at: http://www.imagesnippets.com

if you register an account, all you have to do is select one of the sample
images (so you don't have to upload any) - go to the 'Description' tab, type
in any text and push the auto-entity extraction button. 

We return matching entities from dbPedia based on how Michael Brunnbauer has
it configured - we use a combination of dbPedia Spotlight and TextRazor 

bottom line is - feel free to type in lots of word combinations  they don't
have to match the image and you don't have to ' do' anything with the
responses when they come back, just type in more text and try again. 
Ultimately, if you want to create triples in the triple-editor window you
can (you are not restricted to our properties - just type in any properly
formatted URI (or add the namespace in the namespace button). 
Once you create the triples, you can copy and paste them out of the 'View
HTML/RDFa' button

We don't have a way for users to tweak parameters here, but you can
certainly tweak certain word combinations and see what is returned. 

I think we have a per day limit right now with text razor before I need to
pay for it, but we haven't come close to reaching that limit yet. 

Best,
Margaret Warren



-----Original Message-----
From: hugh.glaser@seme4.com [mailto:hugh.glaser@seme4.com] On Behalf Of Hugh
Glaser
Sent: Friday, January 17, 2014 11:26 AM
To: public-lod community
Subject: Re: General tuning for Dbpedia Spotlight

Thank you for the responses, both on- and off-list.

So I see perhaps I should recast my question, with maybe wider scope.

I have a load of abstract-style text fragments - that is perhaps 100 words
each, on a wide variety of topics, although there is a bit of a technical
bent.

I want to be able to do linkage between them and to other things, based
around our lovely Linked Data world.
That is, have lots triples something like :docIDn :some-pred :conceptURI It
would be a bonus to know which words in the text triggered the generation of
the triple.
Of course, the system doesn't actually have to generate the triples - I can
build them if I get sufficiently sensible output, including the sort of html
output that Spotlight does.
And because it goes automatically to users, I need quite high precision,
even if recall suffers (I think is the terminology).
Oh, and ideally free, although not necessarily.
My current preference is for dbpedia or freebase URIs, but wordnet is
probably OK too.

I think this must be something that there are people who have done this (a
lot). Or at least there should be.
There are certainly quite a lot of systems that can do it, some more or less
playing well with Linked Data URIs.

I think my problem (apart from laziness) is that the systems I look at seem
to want me to care about what they do, or at least engage with tuning and
things, which means I need some understanding of what they do, which I don't
have (and I probably don't care either :-) ).

So, does anyone (else) feel they can point me at a system for doing this
that I can just use out of the box (possibly having been told some
parameters to use)?

Of course, maybe I am just asking too much of the technology at the moment,
but I can hope!
Best
Hugh
--
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

Received on Friday, 17 January 2014 22:00:06 UTC