[ANN] DBpedia Spotlight v0.5 Released (Text Annotation with DBpedia)

Hi all,
We are happy to announce the release of DBpedia Spotlight v0.5 - Shedding
Light on the Web of Documents.

DBpedia Spotlight is a tool for annotating mentions of DBpedia entities and
concepts in text, providing a solution for linking unstructured information
sources to the Linked Open Data cloud through DBpedia. The DBpedia Spotlight
Architecture is composed by the following modules:
    * Web application, a demonstration client (HTML/Javascript UI) that
allows users to enter/paste text into a Web browser and visualize the
resulting annotated text.
    * Web Service, a RESTful Web API that exposes the functionality of
annotating and/or disambiguating resources in text. The service returns XML,
JSON or XHTML+RDFa.
    * Annotation Java / Scala API, exposing the underlying logic that
performs the annotation/disambiguation.
    * Indexing Java / Scala API, executing the data processing necessary to
enable the annotation/disambiguation algorithms used.

In this release we have provided many enhancements to the Web Service,
installation process, as well as the spotting, candidate selection,
disambiguation and annotation stages. More details on the enhancements are
provided below.

The new version is deployed at:
* http://spotlight.dbpedia.org/dev/demo/ (Demonstration Web Interface)
* http://spotlight.dbpedia.org/dev/rest/ (Web Service)

Instructions on how to use the Web Service are available at:
http://spotlight.dbpedia.org

We invite your comments on the new version before we deploy it on our
production server. We will keep it on the "dev" server until October 6th,
when we will finally make the switch to the production server at
http://spotlight.dbpedia.org/demo/ and http://spotlight.dbpedia.org/rest/
If you are a user of DBpedia Spotlight, please join
dbp-spotlight-users@lists.sourceforge.net for announcements and other
discussions.

Changelog

Changes since last public release (v0.1):
* Uses DBpedia 3.7 resources, including types from DBpedia Ontology,
Freebase and Schema.org.
* New Web API method /rest/candidates provides a ranked list of candidates
for each surface form. This will allow the use of DBpedia Spotlight in
semi-automatic annotation (e.g. of blog posts), where users can "fix" a
mistake made by our system by choosing another candidate from the
suggestions provided by the service.
* New disambiguation implementations, including a two-step disambiguator
with simpler context scoring provides up to 200x faster annotation with
modest accuracy loss (~7%) in our preliminary tests.
* SpotSelector classes allow one to discard non-entities early in the
process to improve time performance and conformance with annotation policies
(e.g. do not annotate common words).
* jQuery plugin for DBpedia Spotlight allows one to annotate a Web page with
one line of javascript code: $('div').annotate();
* Cross Origin Resource Sharing (CORS) is now enabled by default on the Web
API, allowing javascript code on your page to call our service without need
for proxies.
* Enhanced candidate selection stage (with approximate matching) improves
coverage of candidate URIs for surface forms with small variations in
spelling.
* Debian packaging allows one to install DBpedia Spotlight via the package
manager in many Linux distros.
* Easier installation: fully mavenized process, auto-generated jars, more
configuration parameters accessible via property files.
* Better modularization: dependence on the DBpedia Extraction Framework was
moved to module "index". Users that only want to run the service can now
ignore that dependence.
* Web API description provided via Web Application Description Language
(WADL). It allows you to create clients automatically via IDEs such as
Eclipse, NetBeans, etc.
* Downloads: full index, compressed index, spotter dictionaries with
different thresholds, etc. available from
http://spotlight.dbpedia.org/download
* Removed restriction on the number of characters. Beware that short texts
will have lower performance since they normally provide less context for
disambiguation.
* Accepts POST requests in addition to GET. This allows longer text. Unless
explicitly specified, long texts automatically use the Document-centric
(faster) disambiguation algorithm.
* A bookmarklet allows user to select text in any Web page using their good
old browser and call DBpedia Spotlight directly from there in order to
obtain annotated text.

Acknowledgements

Many thanks to the growing community of DBpedia Spotlight users for your
feedback and energetic support. We would like to especially thank:
* Jo Daiber for his great work on better spotters, additional types, cuter
interfaces and many other improvements to the tool;
* Paul Houle for the extensive feedback on the system, great suggestions for
improvement and patches;
* Scott White for the invaluable discussions on the architecture and other
advice;
* Rob DiCiuccio for his real-world use case description and PHP client
implementation;
* Giuseppe Rizzo for his friendly push for releasing the /candidates API and
feedback on the API's design;
* Thomas Steiner and Rainer Simon for opening the Known Uses list (
http://dbpedia.org/spotlight/knownuses), and Rob DiCiuccio, A. Elizabeth
Cano et al., Ali Khalili, Raphaël Troncy and Giuseppe Rizzo for letting us
know of their uses of DBpedia Spotlight.

With this release we also have the pleasure of welcoming Jo Daiber as a
committer. We are looking forward to continuing this fruitful collaboration.

This release of DBpedia Spotlight was supported by The European Commission
through the project LOD2 – Creating Knowledge out of Linked Data (
http://lod2.eu/).

DBpedia Spotlight's source code is provided under the terms of the Apache
License, Version 2.0. Part of the code uses LingPipe under the Royalty Free
License. The source code can be downloaded from:
http://sourceforge.net/projects/dbp-spotlight

A paper describing DBpedia Spotlight was published at I-SEMANTICS 2011:

Pablo N. Mendes, Max Jakob, Andrés García-Silva and Christian Bizer. DBpedia
Spotlight: Shedding Light on the Web of Documents. In the Proceedings of the
7th International Conference on Semantic Systems (I-Semantics). Graz,
Austria, 7–9 September 2011.

Happy annotating!

Cheers,
Pablo, Max, Jo, Chris

Received on Thursday, 29 September 2011 16:52:43 UTC