- From: Cassia TROJAHN <Cassia.Trojahn@irit.fr>
- Date: Mon, 25 Oct 2021 11:37:27 +0200
- To: "Cassia TROJAHN" <Cassia.Trojahn@irit.fr>
- Cc: "Olivier Teste" <Olivier.Teste@irit.fr>
- Archived-At: <http://sympa.inria.fr/sympa/arcsearch_id/mailing-list-cla-2011/2021-10/183c-61767b00-f-5ac6fc80%40227684457>
Bonjour à tous, (Désolée pour les réceptions multiples) Nous recrutons un post-doc à l'IRIT sur le liage de données dans le cadre de l’ANR DACE-DL : DAta-CEntric AI-driven Data Linking Le recrutement est prévu tout début 2022 pour 24 mois. Merci de faire circuler ces deux offres dans vos réseaux. Cordialement, Cassia Trojahn et Olivier Teste ----------------------------------------------------------------- ** Post-doctoral position at IRIT: Data Linking ** * Context: ANR project DACE-DL (DAta-CEntric AI-driven Data Linking) * Data linking is the scientific challenge of automatically establishing typed links between the entities of two or more structured datasets. A variety of complex data linking systems exists, evaluated on public benchmarks. While they have allowed for the generation of vast amounts of linked data in the context of various dedicated projects, data generic systems often have limited applicability in many real-world scenarios, where data are highly heterogeneous and domain-specific. DACE-DL targets a paradigm shift in the data linking field with a data-centric bottom-up methodology relying on machine learning and representation learning models. We hypothesize there exists a finite number of identifiable and generalisable linking problem types (LPTs), that we need to categorize and analyse to provide better linking results. * Topic: Data collect, consolidation, and data linking systems modularization * This research is articulated in two main tasks. The first task consists in (1) carrying out an in-depth analysis of the quality of the existing data linking datasets, identifying erroneous statements and providing a high-quality set of datasets by correcting those statements; and (ii) generating additional links using existing high-precision linking systems on the chosen datasets. Data quality metrics such as accuracy, consistency and conciseness will be considered. The aim of the second task is manifold : (1) to provide an inventory of publicly available and functional linking tools that are able to deal with a large spectrum of data linking problem; (2) to propose a theoretical approach for the modularization of these tools into atomic modules easy to combine in order to build more complex solutions in a linking ecosystem; (3) to make the produced modules available to the data linking community. To do the modularization at scale, we plan to call upon unsupervised ML algorithms, enhanced by a human-in-the-loop approach. The objective is to provide a set of correspondences between the modules and the LPTs. Starting period: January 2022 – duration of 24 months * Work environment and Salary * Localization : Institut de Recherche en informatique de Toulouse (IRIT) – Universite Toulouse - Jean Jaures / Maison de la Recherche, 5, allees Antonio Machado 31058 Toulouse. Salary between 2200€ and 2700€ gross monthly depending on qualifications and situation. * How to apply * Applicants are required to have a PhD in Computer Science, a strong background in semantic web technologies, ontology matching and data linking. Fluency in written / spoken English is required too. A good publication record and strong programming skills will be a plus. Applications will be accepted until the position is closed. Applicants should send a full CV including a complete list of publications, a cover letter indicating their research interests, achievements to date and vision for the future, as well as either support letters or the name of 2 persons that have worked with them. Contact: Cassia Trojahn (cassia.trojahn@irit.fr) and Olivier Teste (olivier.teste@irit.fr)
Received on Monday, 25 October 2021 09:38:13 UTC