RELEASE OF sar-graph 2.0

Apologies for cross-posting
Please forward this message to colleagues in the areas of interest

   RELEASE OF sar-graph 2.0

Changes at a glance:
- integration of WSD results into relation extraction patterns
- annotation of word senses of content words in the patterns and sar-graphs
- new representation of vertices
- new Java API
- WSD results and relevancy assessments of synsets wrt. semantic relations available as separate download

The resource is available at

A sar-graph is a graph containing linguistic knowledge at syntactic and lexical semantic levels for a given language and target relation. A sar-graph for a targeted relation assembles many linguistic patterns that are used in texts to mention this relation.  The term "semantically associated relations" graph was chosen since the patterns may either express the target relation directly or by expressing a semantically associated relation. The nodes in a sar-graph contain information from various levels of abstraction, including semantic arguments of a target relation, content words, word senses, etc.; all of them needed to express and recognize an instance of the target relation. The nodes are connected by two kinds of edges: syntactic dependency-structure relations and lexical semantic relations, thus they are labelled with dependency-structure tags provided by a parser or lexical-semantic relation tags. A definition can be found in (Uszkoreit and Xu, 2013). The individual patterns are assembled in one graph per target relation for an easier combination of mentions gathered across sentences, but all patterns could also be employed individually.

For a more detailed description see:
 From Strings to Things -- SAR-Graphs: A New Type of Resource for Connecting Knowledge and Language
 Hans Uszkoreit and Feiyu Xu (2013)
 In Proceedings of 1st International Workshop on NLP and DBpedia (NLP&DBPedia), volume 1064, Sydney, NSW, Australia, CEUR Workshop Proceedings, 10/2013

The current sar-graph version 2.0 contains syntactic dependency relations between content words, word senses, and semantic arguments; future versions will also integrate lexical semantic relations between word senses.

In the current release, the patterns have been automatically learned by the web-scale version (Krause et al., 2012) of the relation extraction system DARE (Xu et al., 2007) from dependency structures obtained by parsing sentential mentions of the target relation. The vertices in a sar-graph are either semantic arguments of a target relation or content words (to be more exact, their word senses) needed to express/recognize an instance of the target relation. Several dependency parsers have been employed, but the current set of sar-graphs is built from parsing results of the MALT parser. In contrast to the first release mid-2014, this release includes results from word-sense disambiguation on the source sentences of patterns and sar-graphs. This WSD information, plus target-relation-relevancy assessments of BabelNet synsets are made available for additional download. Also, a new, more flexible API has been implemented, in particular wrt. to future extensions of the sar-graph data structure. This includes a simplified XML format as well as updated GraphML export functionality.

Applications of sar-graphs are information extraction, question answering and summarisation. The resource might also be useful for research on paraphrases, textual entailment and syntactic variation within a language.

Release 2.0 has the following properties:

* Language: English
* Number of target relations: 25
* Arity of relations: n-ary relations (2≤n≤5)
* Domains of relations: biographic information, corporations, awards
* Format of patterns: DARE patterns in lemon format and specific xml schema (DTD provided)
* Format of sar-graphs: specific xml schema (DTD provided)
* API supports: reading and storing patterns and sar-graphs, accessing vertex
 and edge information of DARE patterns and sar-graphs, pattern visualization

More references:
Feedback via email:

Sar-graphs were conceived and defined at DFKI LT-Lab Berlin and then realized in a collaboration between DFKI LT-Lab and the BabelNet group at Sapienza University of Rome.

The development of sar-graphs is partially supported by
* the German Federal Ministry of Education and Research (BMBF) through the project Deependance (contract 01IW11003)
* the project LUcKY, a Google Focused Research Award in the area of Natural Language Understanding.


Feiyu Xu

Dr. Feiyu Xu

Senior Researcher
DFKI Research Fellow

DFKI  Projektbüro Berlin
Alt Moabit 91c
D-10559 Berlin
Phone +49-30-23895-1812
Sek      +49-30-23895-1800
Fax      +49-30-23895-1810




Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313


Received on Monday, 10 November 2014 18:45:39 UTC