How to discover and exploit Linked Open Data resoruces

Dear all,

  the LIDER project has developed guidelines describing how to discover 
and exploit language resources published as Linked Open Data.

The guidelines are summarized here: 
https://www.w3.org/community/bpmlod/wiki/LLD_Exploitation

As a motivating example, imagine a company developing sentiment analysis 
and opinion mining software that has a working system for the English 
language and wants to port the system to also support German. The 
company wants to find a corpus that is annotated at the sentiment level 
and extract a first seed lexicon of German subjective expressions with 
their polarity (positive, negative, neutral).

How could Linked Data support them in finding an exploiting a German 
sentiment lexicon easily?

According to our guidelines, they would perform the following steps:

  * Search and discovery: the company would enter the query "sentiment
    corpus German" into LingHub and reach the following page:
    http://linghub.lider-project.eu/search/?property=&query=sentiment+corpus+german.
    It would get two results. Clicking for instance on the usage review
    dataset it would reach the following page:
    http://linghub.lider-project.eu/datahub/usage-review-corpus#Nedfa753871df4052a5e6074d9389e901

  * Licensing: They would would check the license
    http://opendatacommons.org/licenses/by/1.0/ and see that it is
    compatible with their purposes.
  * Distribution: The company would understand from the metadata page of
    the usage review dataset that a download is available at
    http://data.lider-project.eu/usage/usage.nt.gz and that a SPARQL
    endpoint is available at: http://data.lider-project.eu/usage/sparql
  * Extraction: Using the following SPARQL query

SELECT ?string ?polarity
WHERE {
     ?phrase <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#anchorOf> ?string ;
             <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#lang> <http://www.lexvo.org/page/iso639-3/deu> ;
             <http://www.gsi.dit.upm.es/ontologies/marl/ns#hasPolarity> ?polarity .
}


they could easily extract a list of German subjective phrases together 
with their polarity:

http://data.lider-project.eu/usage/sparql?query=SELECT+%3Fstring+%3Fpolarity+WHERE+{%3Fphrase+%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23anchorOf%3E+%3Fstring+%3B%0D%0A%3Chttp%3A%2F%2Fpersistence.uni-leipzig.org%2Fnlp2rdf%2Fontologies%2Fnif-core%23lang%3E+%3Chttp%3A%2F%2Fwww.lexvo.org%2Fpage%2Fiso639-3%2Fdeu%3E+%3B%0D%0A%3Chttp%3A%2F%2Fwww.gsi.dit.upm.es%2Fontologies%2Fmarl%2Fns%23hasPolarity%3E+%3Fpolarity+.}

The company could then easily integrate these results into their workflow.

Most importantly: they would accomplish this by using only open and 
non-proprietary technologies and web standards, and building on linked 
data principles.

Any feedback on the guideline document is more than welcome!

Kind regards,

Philipp Cimiano

-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany

Received on Wednesday, 11 November 2015 20:31:21 UTC