- From: Nathalie Aussenac <nathalie.aussenac-gilles@irit.fr>
- Date: Thu, 23 Apr 2015 11:23:03 +0200
- To: Elsnet-list@elsnet.org, semanticweb@yahoogroups.com, CORPORA@UIB.NO, TextAnalytics@yahoogroups.com, public-lod@w3.org, wikidata-l@lists.wikimedia.org
- CC: mouna kamel <kamel@irit.fr>, Cécile Fabre <Cecile.Fabre@univ-tlse2.fr>
- Message-ID: <5538B9F7.1080304@irit.fr>
*PhD position: Knowledge extraction from semi-structured documents – enrichment of DBpedia in French* *Context* We are seeking a candidate for a PhD position in the context of a collaboration between the MELODI group ( http://www.irit.fr/-Equipe-MELODI- )of the Research Institute in Informatics of Toulouse (IRIT, CNRS UMR 5505) and the CLLE-ERSS ( ttp://w3.erss.univ-tlse2.fr/ <http://w3.erss.univ-tlse2.fr/> ) team of the Cognition, Languages, Ergonomics laboratory (CLLE, UMR 5263 CNRS). These laboratories form one of the strongest potentials of research in France, in Informatics and Linguistics, respectively. The teams have been collaborating for 20 years and are recognized experts in natural language processing, linguistic analysis of corpora, and knowledge engineering. One of their research areas concerns the linguistic characterisation of semantic relations in corpora and the operationalization of these characterizations in order to facilitate the construction of knowledge models. Methods for analyzing both written texts - using lexico-syntactic patterns (Aussenac-Gilles and Jacques, 2008) or distributional analysis (Fabre et al 2014.) - and text structure (Kamel and al., 2014) have been developed. Methods have also been proposed for integrating different fragments of knowledge within a same model, by means of ontology alignments (Euzenat et al., 2013). Hence, this thesis aims at adapting and combining these methods and proposing novel ones, with a special focus on enriching the Web of data. The candidate will be co-supervised by Cécile Fabre, Professor of Linguistics at University of Toulouse 2, and Mouna Kamel, Assistant Professor at IRIT. The thesis will be funded in the context of a project « Communauté d’Universités et d’Établissements Toulouse – Région Midi-Pyrénées » (COMUE-Région). *Object* This thesis addresses the problem of building semantic resources from semi-structured text. The attributes of the text layout, which organise the text and contribute significantly to its semantics, areunderexploited by most classical NLP methods. A first aim of this thesis is to study the interaction between the visual structure and the discourse analysis, and thus to specify how the analysis of natural language and the analysis of the text structure can be combined together. The second aim is to evaluate the contribution of linguistic information within automated processes for theconstruction of semantic resources, for the identification of semantic relations, and for their integration into a knowledge model. The theoretical results will help to developing different knowledge extractors (in particular, semantic relation extractors) from semi-structured texts in French, in order to enrich a knowledge base. Each extractor will apply one particular technique (inspired or not by the methods developed by the teams) and will exploit the different properties (content and structure) of these texts. The experimental scenario will concern the enrichment of the French DBpedia resource (http://fr.dbpedia.org/), by extracting knowledge from Wikipedia pages in French. These pages are semi-structured and rich in knowledge expressing concepts (domain-specific or general), relations, and rules associating them and giving them meaning. However, as for the DBPedia in English, this resource is currently constructed from veryspecific structured data (infobox, categories, links, etc.) from Wikipedia pages, *Profile* We are looking for a candidate with a Msc in Computer Engineering/Science or an adjacent field. The candidate has to have followed lectures in natural language processing. She/he is required to have an interest in both linguistic (corpus analysis, study and description of linguistic phenomena, etc.) and statistical aspects that will allow her/him to develop learning-based approaches and distributional analysis techniques. Interest in the Semantic Web in general, and ontologies in particular, would also be appreciated. The student has to be fluent in French and has to have a very good level in English. We are currently offeringa 3-year fully-funded Studenship<http://kmi.open.ac.uk/studentships/vacancies/> commencing in Autumn 2015, thanks to fundings from the Toulouse COMUE and Midi-Pyrénées Region. Income will be about 20 000 euros /year. ** *Contact* ** If you are interested in the above, please contact : Cécile Fabre : cecile.fabre@univ-tlse2.fr <mailto:cecile.fabre@univ-tlse2.fr> Mouna Kamel : mouna.kamel@irit.fr <mailto:mouna.kamel@irit.fr> ** *References* ** (Aussenac-Gilles et Jacques, 2008) Aussenac–Gilles, N., Jacques, M.–P. : Designing and Evaluating Patterns for Relation Acquisition from Texts with Caméléon. In: Terminology 14,1, 145–73 (2008). (Euzenat et al., 2013) J. Euzenat, M. Rosoiu, C. Trojahn dos Santos : Ontology matching benchmarks: Generation, stability, and discriminability.Journal of Web Semantics 21: 30-48 (2013) (Fabre et al., 2014) Fabre, C., Hathout, N., Ho-Dac, L. M., Morlane-Hondère, F., Muller, P., Sajous, F., Tanguy, L., Van de Cruys, T. : Présentation de l'atelier SemDis 2014: sémantique distributionnelle pour la substitution lexicale et l'exploration de corpus spécialisés. Actes de l'atelier SemDis 2014, 21eConférencesurle TraitementAutomatiquedesLanguesNaturelles(TALN 2014),pp.196-205, (2014). (Kamel et al., 2014) Kamel, M., Rothenburger, B., Fauconnier, J-P. : Identification de relations sémantiques portées par les structures énumératives paradigmatiques : une approche symbolique et une approche par apprentissage supervisé. Revue d'Intelligence Artificielle, Hermès Science, Numéro spécial Ingénierie des Connaissances. Nouvelles évolutions., Vol. 28, N. 2-3, p. 271-296, (2014).
Received on Friday, 24 April 2015 07:20:56 UTC