How to discover and exploit Linked Open Data resoruces from Philipp Cimiano on 2015-11-11 (public-ld4lt@w3.org from November 2015)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Wed, 11 Nov 2015 21:30:49 +0100
To: public-ld4lt@w3.org, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <5643A579.7090402@cit-ec.uni-bielefeld.de>

Dear all,

  the LIDER project has developed guidelines describing how to discover 
and exploit language resources published as Linked Open Data.

The guidelines are summarized here: 
https://www.w3.org/community/bpmlod/wiki/LLD_Exploitation

As a motivating example, imagine a company developing sentiment analysis 
and opinion mining software that has a working system for the English 
language and wants to port the system to also support German. The 
company wants to find a corpus that is annotated at the sentiment level 
and extract a first seed lexicon of German subjective expressions with 
their polarity (positive, negative, neutral).

How could Linked Data support them in finding an exploiting a German 
sentiment lexicon easily?

According to our guidelines, they would perform the following steps:

  * Search and discovery: the company would enter the query "sentiment
    corpus German" into LingHub and reach the following page:
    http://linghub.lider-project.eu/search/?property=&query=sentiment+corpus+german.
    It would get two results. Clicking for instance on the usage review
    dataset it would reach the following page:
    http://linghub.lider-project.eu/datahub/usage-review-corpus#Nedfa753871df4052a5e6074d9389e901

  * Licensing: They would would check the license
    http://opendatacommons.org/licenses/by/1.0/ and see that it is
    compatible with their purposes.
  * Distribution: The company would understand from the metadata page of
    the usage review dataset that a download is available at
    http://data.lider-project.eu/usage/usage.nt.gz and that a SPARQL
    endpoint is available at: http://data.lider-project.eu/usage/sparql
  * Extraction: Using the following SPARQL query

SELECT ?string ?polarity
WHERE {
     ?phrase <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#anchorOf> ?string ;
             <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#lang> <http://www.lexvo.org/page/iso639-3/deu> ;
             <http://www.gsi.dit.upm.es/ontologies/marl/ns#hasPolarity> ?polarity .
}


they could easily extract a list of German subjective phrases together 
with their polarity:

http://data.lider-project.eu/usage/sparql?query=SELECT+%3Fstring+%3Fpolarity+WHERE+{%3Fphrase+%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23anchorOf%3E+%3Fstring+%3B%0D%0A%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23lang%3E+%3Chttp%3A%2F%2Fwww.lexvo.org%2Fpage%2Fiso639-3%2Fdeu%3E+%3B%0D%0A%3Chttp%3A%2F%2Fwww.gsi.dit.upm.es%2Fontologies%2Fmarl%2Fns%23hasPolarity%3E+%3Fpolarity+.}

The company could then easily integrate these results into their workflow.

Most importantly: they would accomplish this by using only open and 
non-proprietary technologies and web standards, and building on linked 
data principles.

Any feedback on the guideline document is more than welcome!

Kind regards,

Philipp Cimiano

-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany

Received on Wednesday, 11 November 2015 20:31:20 UTC