[extended CFP] Semantic Web Journal - Special Issue on Linked Data for Information Extraction (deadline extension)

for Information Extraction
*EXTENDED Submission deadline: 05 May 2017, Hawaii-Time

Information Extraction (IE) is the task of automatically extracting
structured information from unstructured and/or semi-structured
machine-readable documents. It is a crucial technology to enable the
Semantic Web vision.
One of the major bottlenecks for the current state of the art in IE is
the availability of learning materials (e.g., seed data, training
corpora), which, traditionally are manually created but are expensive to
build and maintain. Linked Data (LD) defines best practices for
exposing, sharing, and connecting data, information, and knowledge on
the Semantic Web using uniform means such as URIs and RDF. It has so far
created a gigantic knowledge source of Linked Open Data (LOD), which now
constitutes billions of triples (facts). This has created unprecedented
opportunities for Information Extraction. Linked Data offers a uniform
approach to link resources uniquely identifiable by URIs. This creates a
large knowledge base of entities and concepts, connected by semantic
relations. Such resources can be valuable resources to seed distant
learning. Moreover, initiatives such as RDFa (supported by W3C) or
Microformats (used by schema.org and supported by major search engines)
constantly produce a vast amount of annotated web pages which can be
used as training data in the traditional machine learning paradigm.
However, powering IE using LOD faces major challenges. First,
discovering relevant learning materials on LOD for specific IE tasks is
non-trivial due to (i) the highly heterogeneous vocabularies used by
data publishers and (ii) the lack of contextual information for
annotated content on web pages (e.g., annotations often predominantly
found in page headers) and the skewed distribution towards popular
entities. Users are often required to be familiar with the datasets,
vocabularies, as well as query languages that data publishers use to
expose their data. Unfortunately, considering the sheer size and the
diversity of LOD, imposing such requirements on users is infeasible.
Second, it is known that the coverage of domains can be very imbalanced
and for certain domains the data can be very sparse. Furthermore, the
majority of LOD are created automatically by converting legacy databases
with limited or no human validation, thus data inconsistency and
redundancy are widespread.
Another crucial aspect in IE research is the shift of attention from
purely unstructured text to semi-structured content. Two main source of
interest are Web tables and Open Data (often available as csv files).
These data are particularly rich of content and relations but often lack
contextual data, often used in classical IE methods.
The aim of this special issue is to foster research on methodologies
that exploit Linked Data for Information Extraction, to answer questions
such as: to what extent can we identify domain-specific learning
resources for IE; how to identify and deal with noise in the learning
resources; how can these learning resources be used to train IE models,
both for classical unstructured text and for semi-structured content;
and how should the information extracted by such models integrate into
the existing LOD.

Topics of Interest
We solicit original papers addressing the challenges and research
questions mentioned above. Topics of interest are listed (but not
limited to) the ones below. Note that work must make use of Linked Data
of any form and must be related to Information Extraction in some way.
Please contact the editors if in doubt.

- Methods for generating seed data for IE (e.g., distant supervision)
from Linked Data
- Methods for identifying labelled data for IE from the annotated
webpage content under the initiative such as RDFa and Microdata format
- IE tasks exploiting Linked Data in any form, such as (not limited to)
     * wrapper induction
     * table annotation
     * named entity recognition
     * relation extraction
     * ontology population, ontology expansion (A-box)
     * ontology learning (T-box)
- Methods for identifying and reducing noise in the context of IE tasks
- Disambiguation using Linked Data
- IE for knowledge graph construction

Submission Instructions

Submissions shall be made through the Semantic Web journal website at
http://www.semantic-web-journal.net. Prospective authors must take
notice of the submission guidelines posted at
http://www.semantic-web-journal.net/authors. Note that you need to
request an account on the website for submitting a paper. Please
indicate in the cover letter that it is for the "Linked Data for
Information Extraction" special issue.
All manuscripts will be reviewed based on the SWJ open and transparent
review policy and will be made available online during the review process.

Guest editors
Anna Lisa Gentile, University of Mannheim, Germany
Ziqi Zhang, Nottingham Trent University, UK

The call is also available at the official journal website:



Dr. Ziqi Zhang

ERD288, Erasmus Darwin Building, Clifton Campus

Nottingham Trent University

Tel: +44 (0)115 848 8348

Web: https://ziqizhang.github.io/

Office hours: 3pm-4pm Tuesdays, no appointment needed. Otherwise please email to book an appointment.

DISCLAIMER: This email is intended solely for the addressee. It may contain private and confidential information. If you are not the intended addressee, please take no action based on it nor show a copy to anyone. In this case, please reply to this email to highlight the error. Opinions and information in this email that do not relate to the official business of Nottingham Trent University shall be understood as neither given nor endorsed by the University. Nottingham Trent University has taken steps to ensure that this email and any attachments are virus-free, but we do advise that the recipient should check that the email and its attachments are actually virus free.... This is in keeping with good computing practice.

Received on Monday, 3 April 2017 08:48:16 UTC