ANN: Silk - Link Discovery Framework Version 2.4 release

Hi all,

we are happy to announce version 2.4 of the Silk - Link Discovery
Framework for the Web of Data.

The central idea of the Web of Data is to interlink data items using
RDF links. However, in practice most data sources are not sufficiently
interlinked with related data sources. The Silk Link Discovery
Framework addresses this problem by providing tools to generate links
between data items based on user-provided link specifications. It can
be used by data publishers to generate links between datasets as well
as by Linked Data consumers to augment Web data with additional RDF
links.

Link specifications can either be written manually or developed using
the new Silk Workbench. The Silk Workbench, is a web application which
guides the user through the process of interlinking different data
sources. It’s being shipped with the 2.4 version of Silk.
The Silk Workbench offers the following features:
- It enables the user to manage different sets of data sources and
linking tasks.
- It offers a graphical editor which enables the user to easily create
and edit link specifications.
- As finding a good linking heuristics is usually an iterative
process, the Silk Workbench makes it possible for the user to quickly
evaluate the links which are generated by the current link
specification.
- It allows the user to create and edit a set of reference links used
to evaluate the current link specification.

The Silk Link Discovery Framework includes three applications to
execute the link specifications which address different use cases:
1. Silk Single Machine is used to generate RDF links on a single
machine. The datasets that should be interlinked can either reside on
the same machine or on remote machines which are accessed via the
SPARQL protocol. Silk Single Machine provides multithreading and
caching. In addition, the performance can be further enhanced using an
optional blocking feature.
2. Silk Server can be used as an identity resolution component within
applications that consume Linked Data from the Web. Silk Server
provides an HTTP API for matching instances from an incoming stream of
RDF data while keeping track of known entities. It can be used for
instance together with a Linked Data crawler to populate a local
duplicate-free cache with data from the Web.
3. Silk MapReduce is used to generate RDF links between datasets using
a cluster of multiple machines. Silk MapReduce is based on Hadoop and
can for instance be run on Amazon Elastic MapReduce. Silk MapReduce
enables Silk to scale out to very big datasets by distributing the
link generation to multiple machines.

More information about the Silk framework, the Silk Link Specification
Language, as well as several examples that demonstrate how Silk is
used to set links between different data sources in the LOD cloud is
found at:

http://www4.wiwiss.fu-berlin.de/bizer/silk/

The Silk framework is provided under the terms of the Apache License,
Version 2.0 and can be downloaded from

http://www4.wiwiss.fu-berlin.de/bizer/silk/releases/

The development of Silk was supported by Vulcan Inc. as part of its
Project Halo (www.projecthalo.com) and by the EU FP7 project LOD2 -
Creating Knowledge out of Interlinked Data (http://lod2.eu/, Ref. No.
257943).

Thanks to  Christian Becker, Michal Murawicki and Andrea Matteini for
contributing to the Silk Workbench.

Happy linking,

Robert Isele, Anja Jentzsch and Chris Bizer

Received on Wednesday, 1 June 2011 14:38:06 UTC