CfP Workshop Discourse studies and linguistic data science (DiSLiDaS 2022), Jerusalem, May 24, 2022

Dear colleagues, (apologies for cross-posting)

the Cost Action CA18209 NexusLinguarum (https://nexuslinguarum.eu) is glad
to announce the Workshop on

Discourse studies and linguistic data science: Addressing challenges in
interoperability, multilinguality and linguistic data processing (DiSLiDaS
2022)

The workshop will be held in hybrid mode at Jerusalem, Jerusalem College of
Technology, 24 May 2022.

We invite extended abstracts on topics tackling discourse studies,
linguistic data science or interoperability challenges in the discourse
domain (detailed description below and available from the website:
http://dislidas.mozajka.co). As the workshop brings together aspects of
linguistics and web technologies, it may be of interest to the participants
of this list, in particular with respect to pragmatic features in lexical
resources and their use.

*Schedule and Submission*

- March 20, 2022 submission of extended abstracts
- April 20, 2022 notification of acceptance
- May 24, 2022 workshop
- July 20, 2022 full papers due (postproceedings)
- Oct 15, 2022 notification of acceptance (postproceedings)

Authors are invited to submit and extended abstract up to 4 pages in pdf
using the template of Springer LNCS proceedings to be accessed here:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines

Submissions must be anonymous and should be submitted electronically via
EasyChair: https://easychair.org/conferences/?conf=dislidas2022.

At least one author of each accepted extended abstract is required to
register for, and present the work at the workshop.

Accepted papers are expected to be published in a postproceedings volume by
John Benjamins.

More detailed information can be found on the website (
http://dislidas.mozajka.co).

*Detailed Description*

The purpose of the workshop is to gather current research advances in
discourse analysis and representation, in the context of multilinguality,
from a linguistic and computational perspective. We invite submissions
addressing challenges such as interoperability, linguistic linked open data
(LLOD), and language processing and analysis.

Discourse comprises a wide variety of linguistic phenomena, such as
discourse markers, discourse relations, speaker attitude, that have been
largely studied by different communities of practice from Linguistics and
Computation, rendering several theoretical frameworks (for instance, RST,
SDRT, PDTB, for discourse relations; appraisal theory for sentiment
analysis,...), and technological approaches, such as transformer models,
embeddings and alike. Nonetheless, there are open issues with regards to
interoperability, multilinguality, and language processing, in particular,
the existence of different annotation schemas, disambiguation, lack of
training data for machine learning, scarcity of effective language
phenomena detection and interpretation methods, diverse vocabularies,
insufficient multilingual parallel corpora of non-dialog and dialog,
initial stages of exploration of multimodality.

Discourse research is one of  the central research areas of natural
language processing (NLP) too. NLP research focuses on formalization,
identification and discovery of semantic phenomena, dialogue exchange
structure, and coherence of text. Some of the technological approaches of
NLP include the use of transformer models, word embeddings, linguistic
linked open data, constitution of aligned multilingual corpora,
vocabularies of language phenomena and alike. Computational discourse
explores the evidence that language consists not only in placing words in
the right order but also in detection and interpretation of the meaning and
deeper textual relations as well as organizing ideas into a logical textual
flow. The linguistic approaches study language phenomena referring to
coherence and cohesiveness of discourse, lexical, phrasal, syntactic,
semantic and pragmatic means to express discourse relations, represent
their roles and build language resources for them.

Despite all the advances, there are still plenty of unresolved problems
related to interoperability, multilinguality, and language processing. With
the growth of the Semantic Web and Linguistic Linked Data, interoperability
is key to read, to interpret and to adopt language resources. The existence
of different annotation schemas to encode discourse relations constitutes a
problem to allow data exchange and re-use on the one hand and to provide
theoretical consistency when producing annotated corpora. Ideally, the
model is custom designed to deal with all the specificities of a particular
dataset, but also broad enough so that it can be applied to other datasets.
Many proposals try to achieve this balance, one of them being ISO 24617.
The treatment of multilinguality is also complicated because of the
insufficiency of multilingual parallel corpora of collections of non-dialog
and dialog texts, that would allow systematic contrastive studies. As to
language processing, the lack of training data for machine learning,
coupled with the scarcity of effective language phenomena detection and
interpretation methods, the coexistence of diverse vocabularies, and the
minimal attention to the contribution of the tone of voice, intonation,
gestures to the meaning and the informative value of discourse elements
makes the task of discourse processing still very challenging.

The workshop intends to be a forum of discussion for researchers interested
in addressing the aforementioned challenges and in advancing
the-state-of-art in discourse studies and linguistic data science.

*Topics*

The workshop topics are the following (but not limited to):
- Discourse and dialog annotation: Parsing and representation across
languages and frameworks
- Discourse markers and discourse relations (RST, PDTB, SDRT):
Identification, prediction and extraction
- Attitudes discovery and interpretation in Discourse: Appraisal and
sentiment
- Effects of multimodality on discourse interpretation: Intonation, gesture
and text
- Interoperability for Multilingual language data: Challenges of rich and
distributed data
- Discourse data and machine learning: Methods and tools

*Program*

The Scientific Program will include one invited talk and oral presentations.

Invited Speaker
Bonnie Webber, University of Edinburgh

Program committee
Nicolas Asher, CNRS/IRIT, Toulouse, France
Johan Bos, University of Groningen, Groningen, The Netherlands
Paul Buitelaer, NUI Galway, Ireland
Harry Bunt, Tilburg University, Netherlands
Philip Cimiano, University Bielefeld, Germany
Ludivine Crible, Ghent University
Maria Josep Cuenca, Universitat de València
Vera Demberg, University of Saarland, Germany
Jorge Garcia, University of Zaragoza, Spain
Mikel Iruskieta, University of the Basque Country, Spain
John McCrae, NUI Galway, Ireland
Ted Sanders, Utrecht University
Merel Scholman, University of Saarland, Germany
Manfred Stede, University Potsdam, Germany
Radoslava Trnavac, University of Belgrade, Serbia
Amir Zeldes, The Georgetown University, USA

Organizing committee
Chaya Liebeskind, Jerusalem College of Technology, Jerusalem (Local
organizer)
Purificação Silvano, Faculty of Arts and Humanities of the University of
Porto, CLUP, Porto, Portugal
Christian Chiarcos, Applied Computational Linguistics, Goethe-Universität,
Frankfurt am Main, Germany
Mariana Damova, Mozaika, Ltd., Sofia, Bulgaria
Giedre Valunaite Oleskevicienė, Mykolas Romeris University, Institute of
Humanities, Vilnius, Lithuania
Dimitar Trajanov, Faculty of Computer Science and Engineering Ss. Cyril and
Methodius University, Skopje, North Macedonia
Ciprian-Octavian Truica, Faculty of Automatic Control and Computers,
University Politehnica of Bucharest, Bucharest, Romania
Elena-Simona Apostol, Faculty of Automatic Control and Computers,
University Politehnica of Bucharest, Bucharest, Romania
Anna Bączkowska, Institute of English and American Studies, University of
Gdansk, Gdansk, Poland

Contact:
organizers@dislidas.mozajka.co

Received on Tuesday, 22 February 2022 09:01:58 UTC