- From: Jorge Gracia <jgracia@fi.upm.es>
- Date: Mon, 6 Feb 2017 13:01:47 +0100
- To: "A list for those interested in open data in linguistics." <open-linguistics@lists.okfn.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>, public-ld4lt@w3.org
- Message-ID: <CANzuSaPciWLU+eykzJnFVjiDQHFKNE1DfCXMqS7U-bu4ACmBNg@mail.gmail.com>
(Please excuse cross-postings.)TIAD 2017 Shared Task: Translation Inference Across Dictionaries2nd Call for participation – Review Committee announced https://tiad2017.wordpress.comOverview Various methods and techniques have been explored in the past in the aim of automatically generating new bilingual (and multilingual) dictionaries from existing ones, for instance using one (or more) language(s) as a pivot between two other source and target languages. However, such efforts were usually conducted on different types of datasets and evaluated in different ways, making it difficult to compare due to the different experimental setups and evaluation metrics. TIAD-2017 is launched with the intention of offering quality lexical resources for a coherent experiment that enables reliable validation of results and solid comparison of methods and techniques used for the automatic generation of translations across languages. This initiative aims also to stimulate and enhance further research on the topic. It will make use of cross-lingual lexicographic data of K Dictionaries (KD), which will serve also to validate the results along with human assessment. The systems developed by participants and their results will be presented at a workshop that will be held as part of the first Language, Data and Knowledge conference in Galway, Ireland, on 18 June 2017 (http://ldk2017.org). The papers describing the participant systems will be published on CEUR-WS ( http://ceur-ws.org). Task definition The objective of the task is to indirectly generate translations for three language pairs, based on already known translations among eight languages in 14 bilingual dictionaries, involving four possible paths – all from German to Brazilian Portuguese – that feature between 1 to 4 pivot languages. The test dataset consists of 100 randomly-selected German dictionary entries with their translations into a second language, and recursively exploring further translations in chained-up dictionaries – including up to 817 entries with 1,532 translation equivalents in the largest language pair that is provided. Besides the headwords and translations, the data includes information about the parts of speech, subject domains and synonyms, as well as examples of usage and their translations. The following language pairs are provided for the four paths: (a) German > English > Portuguese (b) German > Japanese > Spanish > Portuguese (c) German > Danish > French > Spanish > Portuguese (d) German > Dutch > Spanish > Danish > French > Portuguese Also included are four Portuguese > German datasets, for *closing the loop* in each path, to help with the validation of the results. The three new language pairs that should be generated are: (1) German > Portuguese (2) Danish > Spanish (3) Dutch > French Evaluation of the results of each system will be carried out against KD’s manually compiled dictionaries for these pairs from the Global Series and other resources, as well as by human translators. Participants can contribute on either or both of the following tracks: (1) Systems that use only the KD data released for the task (2) Systems that exploit, in addition to the KD data, other freely available sources of background knowledge (e.g., lexical linked open data and parallel corpora) to improve performance Beyond performance, participants are encouraged to consider the following issues in particular: · The role of the language family with respect to the newly generated pairs · The asymmetry of pairs, and how translation direction affects the results · The behavior of different parts-of-speech among different languages · The role the number of pivots plays in the process Important Dates · 23.1.2017 – Call for participation / Test data released · 15.4.2017 – Submission of results by participants · 30.4.2017 – Evaluation of results communicated by organizers · 01.6.2017 – Submission of system description papers · 18.6.2017 – Workshop Organizers · Jorge Gracia, Ontology Engineering Group, Universidad Politécnica de Madrid · Noam Ordan, K Dictionaries and The Arab Academic College of Education, Haifa · Ilan Kernerman, K Dictionaries, Tel Aviv Review Committee Irith Ben-Arroyo Hartman, University of Haifa, Israel Thierry Declerck, German Research Center for Artificial Intelligence, Germany Thierry Fontenelle, Translation Center for the Bodies of the EU, Luxembourg Mikel Forcada, Universidad de Alicante, Spain Jorge Gracia, Universidad Politécnica de Madrid, Spain Miloš Jakubíček, Lexical Computing, Czech Republic Jelena Kallas, Institute of the Estonian Language, Estonia Ilan Kernerman, K Dictionaries, Israel Iztok Kosem, Trojina Institute and University of Ljubljana, Slovenia Nikola Ljubešić, University of Zagreb, Croatia Shervin Malmasi, Harvard University, USA John McCrae, National University of Ireland, Galway Elena Montiel-Ponsoda, Universidad Politécnica de Madrid, Spain Preslav Nakov, Hamad Bin Khalifa University, Qatar Noam Ordan, K Dictionaries and The Arab Academic College of Education, Israel Georg Rehm, German Research Center for Artificial Intelligence, Germany Victor Rodriguez-Doncel, Universidad Politécnica de Madrid, Spain Liling Tan, Nanyang Technological University, Singapore Carole Tiberius, Institute of Dutch Language, Netherlands Marta Villegas, Spain Marcos Zampieri, University of Köln, Germany Terms and Website A full description of TIAD-2017 and its binding terms and regulations are available on the website: https://tiad2017.wordpress.com. Contact Noam Ordan: noam@kdictionaries.com
Received on Monday, 6 February 2017 12:02:44 UTC