CONcreTEXT, Concreteness in Context: call for task participation - @Evalita 2020 from daniele radicioni on 2020-05-14 (semantic-web@w3.org from May 2020)

From: daniele radicioni <radicion@di.unito.it>
Date: Thu, 14 May 2020 12:31:48 +0200
To: semantic-web@w3.org
Message-Id: <F4E39CCE-D908-4309-9016-33A504B8E50B@di.unito.it>
[Apologies if you receive multiple copies of this announcement]


## Call for participants ##


### Task: CONcreTEXT @ EVALITA 2020 ###

website: https://lablita.github.io/CONcreTEXT/

---

The task CONcreTEXT (CONcreteness in conTEXT) focuses on automatic concreteness (and conversely, abstractness) recognition. Given a sentence along with a target word, we ask participants to propose a system able to assess the concreteness of the target word according to a [1-7] concreteness scale, where 1 stands for fully abstract (e.g., 'idempotence') and 7 for maximally concrete (e.g., 'car').

The concreteness score being assigned to the word must be evaluated in context: the word should not be considered in isolation, but as part of the given sentence. For example, systems are expected to assign different scores to the verb 'COVER' in the next two sentences:

- COVER the pot and bring the water to a vigorous boil;
- Your fees and tuition help to COVER the costs of providing these services.


Target words may be either verbs or nouns. The dataset used for this task is taken from the English-Italian parallel section of The Human Instruction Dataset, derived from WikiHow instructions (https://www.kaggle.com/paolop/human-instructions-multilingual-wikihow). The released dataset will be composed by overall 1,000 sentences (500 Italian sentences plus 500 English sentences), all annotated with human judgment by native speakers. 

Participants may decide to participate in either task, or both.

We invite participants to exploit all possible strategies to solve the task, including (but not limited to) knowledge bases, external training data, word embeddings, multimodal systems, etc..

Registration is now open, at URL:
  http://www.evalita.it/2020/taskregistration 



#### Definition of concreteness ####

Operationally, the very first issue is that it is not straightforward to define concreteness/abstractness. Provided that more fine grained distinctions on abstract and concrete word meanings can be drawn, the term 'concrete' has two main interpretations:

- what is closer to perception (as opposed to what cannot be experienced directly through the senses);
- what is more specific (as opposed to high-level, abstract).

We are mostly interested in the first aspect, that is perceptually salient concreteness/abstractness.



#### Motivation and state of the art ####

Ordinary experience suggests that semantic representation and lexical access and processing of concepts can be affected by concepts' concrete/abstract status: concrete meanings, closer to perceptual experience, are acknowledged to be more quickly and easily delivered in human communication than abstract meanings. Such kind of information grasps a complex combination of experiential (e.g., sensory, motor) and strictly linguistic features, such as verbal associations arising through co-occurrence patterns and syntactic information. These features make conceptual concreteness/abstractness a challenging though only marginally explored field, with the notable exception of some works at the intersection between Computational Linguistics and Cognitive Science.

The CONcreTEXT task is aimed at investigating how the concreteness information affects sense selection: different from past research, we are interested in assessing the concreteness of terms in context rather than in isolation. The concreteness score is assumed to be a property of word meanings rather than a property of word forms; thus, scoring the concreteness of a term in context implicitly requires to individuate its underlying sense, by handling lexical phenomena such as polysemy and homonymy.

The CONcreTEXT task may be relevant also for the psycho-linguistics community, where ratings about concreteness, imageability and other features are largely used as control variables in many experiments. The resulting annotated dataset itself (for both the Italian and English languages) will be a resource to be exploited for future researches focused on concreteness in a more contextual, and thus ecological, setting.



#### Important Dates #### 

- 29th May 2020: development and distribution of trial data sets

- 4th September 2020: development and distribution of datasets for testing

- 4th - 24th September 2020: evaluation windows and collection of participants results

- 2nd October 2020: assessment returned to participants

- 6th November 2020: technical reports due from participants

- 30th November – 3rd December 2020: EVALITA 2020 (co-located with CLiC-it 2020)

  

#### Organisers ####

- Lorenzo Gregori, Università di Firenze, lorenzo.gregori@unifi.it
- Maria Montefinese, Università di Padova, maria.montefinese@gmail.com
- Daniele Radicioni, Università di Torino, daniele.radicioni@unito.it
- Andrea Amelio Ravelli, Università di Firenze, andreaamelio.ravelli@unifi.it
- Rossella Varvara, Università di Firenze, rossella.varvara@unifi.it



: : : : : : : : : : : : : : : : : : :

Daniele Radicioni, PhD
Department of Computer Science
University of Turin
Corso Svizzera, 185
10149 - Torino
phone: +39 011 6706802
fax:   +39 011 751603
http://www.di.unito.it/~radicion
Received on Thursday, 14 May 2020 10:32:08 UTC