- From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
- Date: Wed, 11 Nov 2015 21:30:49 +0100
- To: public-ld4lt@w3.org, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
- Message-ID: <5643A579.7090402@cit-ec.uni-bielefeld.de>
Dear all,
the LIDER project has developed guidelines describing how to discover
and exploit language resources published as Linked Open Data.
The guidelines are summarized here:
https://www.w3.org/community/bpmlod/wiki/LLD_Exploitation
As a motivating example, imagine a company developing sentiment analysis
and opinion mining software that has a working system for the English
language and wants to port the system to also support German. The
company wants to find a corpus that is annotated at the sentiment level
and extract a first seed lexicon of German subjective expressions with
their polarity (positive, negative, neutral).
How could Linked Data support them in finding an exploiting a German
sentiment lexicon easily?
According to our guidelines, they would perform the following steps:
* Search and discovery: the company would enter the query "sentiment
corpus German" into LingHub and reach the following page:
http://linghub.lider-project.eu/search/?property=&query=sentiment+corpus+german.
It would get two results. Clicking for instance on the usage review
dataset it would reach the following page:
http://linghub.lider-project.eu/datahub/usage-review-corpus#Nedfa753871df4052a5e6074d9389e901
* Licensing: They would would check the license
http://opendatacommons.org/licenses/by/1.0/ and see that it is
compatible with their purposes.
* Distribution: The company would understand from the metadata page of
the usage review dataset that a download is available at
http://data.lider-project.eu/usage/usage.nt.gz and that a SPARQL
endpoint is available at: http://data.lider-project.eu/usage/sparql
* Extraction: Using the following SPARQL query
SELECT ?string ?polarity
WHERE {
?phrase <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#anchorOf> ?string ;
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#lang> <http://www.lexvo.org/page/iso639-3/deu> ;
<http://www.gsi.dit.upm.es/ontologies/marl/ns#hasPolarity> ?polarity .
}
they could easily extract a list of German subjective phrases together
with their polarity:
http://data.lider-project.eu/usage/sparql?query=SELECT+%3Fstring+%3Fpolarity+WHERE+{%3Fphrase+%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23anchorOf%3E+%3Fstring+%3B%0D%0A%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23lang%3E+%3Chttp%3A%2F%2Fwww.lexvo.org%2Fpage%2Fiso639-3%2Fdeu%3E+%3B%0D%0A%3Chttp%3A%2F%2Fwww.gsi.dit.upm.es%2Fontologies%2Fmarl%2Fns%23hasPolarity%3E+%3Fpolarity+.}
The company could then easily integrate these results into their workflow.
Most importantly: they would accomplish this by using only open and
non-proprietary technologies and web standards, and building on linked
data principles.
Any feedback on the guideline document is more than welcome!
Kind regards,
Philipp Cimiano
--
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld
Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de
Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany
Received on Wednesday, 11 November 2015 20:31:21 UTC