- From: Surupendu Gangopadhyay <surupendu.g@gmail.com>
- Date: Tue, 25 Jul 2023 12:42:12 +0530
- To: undisclosed-recipients:;
- Message-ID: <CA+k+8boNRka6R_Nyxo3u3NXYWELudJMpc5oqA_+B-kdyxuDX0Q@mail.gmail.com>
Apologies for multiple posting
***********************************
*------------------------------------------------------------------------------------------------Machine
Translation for Indian Languages (MTIL)
2023------------------------------------------------------------------------------------------------*
We invite all IR and NLP researchers and enthusiasts to participate in the
MTIL track (https://mtilfire.github.io/mtil/2023/) held in conjunction with
the Forum for Information Retrieval Evaluation (FIRE) 2023 (
http://fire.irsi.res.in/).
Indian languages have many linguistic complexities. Though some Indian
languages share syntactic similarities, some possess intricate
morphological structures. At the same time, some Indian languages are
low-resource. Therefore the machine translation models should address these
unique challenges in translating between Indian languages.
The MTIL track consists of two tasks:
1. *General Translation Task (Task 1):* Task participants should build a
machine translation model to translate sentences of the following language
pairs:
   1. Hindi-Gujarati
   2. Hindi-Kannada
   3. Kannada-Hindi
   4. Hindi-Odia
   5. Odia-Hindi
   6. Hindi-Punjabi
   7. Punjabi-Hindi
   8. Hindi-Sindhi
   9. Urdu-Kashmiri
   10. Telugu-Hindi
   11. Hindi-Telugu
   12. Urdu-Hindi
   13. Hindi-Urdu
2. *Domain Specific Translation Task (Task 2)*: Task participants will
build machine translation models for Governance and Healthcare domains.
    1. Healthcare:
        a. Hindi-Gujarati
        b. Kannada-Hindi
        c. Hindi-Odia
        d. Odia-Hindi
        e. Hindi-Punjabi
        f. Kannada-Hindi
   2. Governance:
       a. Hindi-Gujarati
       b. Kannada-Hindi
       c. Hindi-Odia
       d. Odia-Hindi
       e. Hindi-Punjabi
       f. Kannada-Hindi
*Dataset:*
The primary source of parallel language pairs is Bharat Parallel Corpus
Collection (BPCC), released by AI4Bharat (https://ai4bharat.iitm.ac.in/bpcc
).
Participants are encouraged to add datasets of their choice, including
parallel corpora and monolingual datasets, to train their models.
More information on registration and participation in the track can be
found here: https://mtilfire.github.io/mtil/2023/
This track is being done in association with BHASHINI (
https://bhashini.gov.in/)
*Organisers*
   - Prasenjit Majumder, DAIICT Gandhinagar,India and TCG CREST,
   Kolkata,India
   - Arafat Ahsan, IIIT-Hyderabad,India
   - Asif Ekbal, IIT-Patna,India
   - Saran Pandian, DAIICT Gandhinagar,India
   - Ramakrishna Appicharla, IIT-Patna ,India
   - Surupendu Gangopadhyay, DAIICT Gandhinagar,India
   - Ganesh Epili, DAIICT Gandhinagar,India
   - Dreamy Pujara, DAIICT Gandhinagar,India
   - Misha Patel, DAIICT Gandhinagar,India
   - Aayushi Patel, DAIICT Gandhinagar,India
   - Bhargav Dave, DAIICT Gandhinagar,India
   - Mukesh Jha, DAIICT Gandhinagar,India
Received on Tuesday, 25 July 2023 07:12:29 UTC