[job offer] Postdoc in AI for Omics Data Analysis: accessing a metabolomics Knowledge Graphs with a Large Language Model

Postdoc in AI for Omics Data Analysis: accessing a metabolomics Knowledge Graphs with a Large Language Model 
Apply online [ https://jobs.inria.fr/public/classic/en/offres/2024-07437 | https://jobs.inria.fr/public/classic/en/offres/2024-07437 ] 



Contract type : Fixed-term contract 


Level of qualifications required : PhD or equivalent 


Fonction : Post-Doctoral Research Visit 


About the research centre or Inria department 


The Inria centre at Université Côte d'Azur includes 37 research teams and 8 support services. The centre's staff (about 500 people) is made up of scientists of different nationalities, engineers, technicians and administrative staff. The teams are mainly located on the university campuses of Sophia Antipolis and Nice as well as Montpellier, in close collaboration with research and higher education laboratories and establishments (Université Côte d'Azur, CNRS, INRAE, INSERM ...), but also with the regional economic players. 

With a presence in the fields of computational neuroscience and biology, data science and modelling, software engineering and certification, as well as collaborative robotics, the Inria Centre at Université Côte d'Azur is a major player in terms of scientific excellence through its results and collaborations at both European and international levels. 
Context 


Recent advances in computational mass spectrometry-based metabolomics have unleashed a massive amount of chemical knowledge from metabolomics analysis. Yet, there is a need for a comprehensive computational framework that can better integrate the information derived from both experimental and computational analyses and help chemists analyse their results. 


We invite applications for a Postdoctoral Researcher position at the Nice Chemistry Institute (ICN) and the Wimmics team at Inria both hosted at the Université Côte d’Azur. The selected candidate will be jointly embedded and interact with PhD students and research engineers of the Interdisciplinary Institute for Artificial Intelligence (3iA) TechPool, and operates in the context of international collaboration between France and Switzerland. The position is initially offered for one year, with the possibility of extension up to three years. 

Key contacts are: 

    * Louis-Felix Nothias (project leader), Nice Chemistry Institute (ICN), Université Côte d'Azur, at [ mailto:Louis-Felix.NOTHIAS@univ-cotedazur.fr | Louis-Felix.NOTHIAS@univ-cotedazur.fr ] 
    * Fabien Gandon, Inria Université Côte d'Azur, at [ mailto:Fabien.Gandon@inria.fr | Fabien.Gandon@inria.fr ] 



Assignment 


The selected candidate will contribute to the development of a proof of concept obtained at University Côte d’Azur for accessing the content of a metabolomics knowledge graph (KG) with a large language model. It is Python prototype of a metabolomics assistant available as conversational agent that: 

    1. represents the core experimental information from processed and annotated mass spectrometry data results using the standardized Resource Description Framework (RDF). 
    2. enables advanced data mining queries using the SPARQL query language. 
    3. provides a natural language-based interface to perform these queries on the knowledge graph using a large language model (LLM). 
Main activities 


The topic of natural language access to knowledge graphs is gaining a growing attention both in top international academic conferences and in international industrial conferences. The ICN has a background in knowledge graphs representation and processing for mass spectrometry and metabolomics. The Wimmics team specializes in different AI techniques for knowledge graph providing open-source tools and has a long history in coupling natural language techniques and knowledge graphs. 

The R&D programme for that position includes several tasks: 

    * Generalize and abstract the bot from specific large language models (LLMs) and specific knowledge graph (KGs): (1) We will survey and study the impact of using different LLMs on the quality of the results and the potential cost/benefit trade-off in choosing models of different sizes, availability, freshness, etc. (2) We intend to propose and evaluate declarative approaches to loosely-couple the workflow of the bot to the knowledge graph and maximize domain-independence with the goal of incrementally moving from a specific chemistry knowledge graph dedicated prototype to a domain-independent solution. 
    * Design a generic and declarative method for tool integration, selection and automated use by the bot: (1) We propose to identify and implement a library of tools to perform important generic tasks on the knowledge graph including: name-entity recognition against a specific knowledge graph, knowledge extraction from schemata and graph data for context/prompt generation, query solving, etc. (2) We will compare the approaches for integrating a library of tools and their calls to interactions with an LLM in the context of an application to the task of question-answering (Q&A) on a knowledge graph. 
    * Go beyond the actual textual interface to support richer interactions: (1) We propose to study the realization of more complex tasks than textual answers generation to include the generation of graphical widgets and data visualization means when appropriate to an answer. (2) We will consider the possibility to generalize our approach by considering this as a special family of tools for the bot for its task when it comes to express results. 
    * Support longer dialogical interactions: (1) We will investigate the different alternatives for encoding the context of the users’ questions in terms of background knowledge, previous interactions, available tools, etc. (2) We intend to leverage the increasing context and prompt size to design a chaining mechanism that supports dialogs and follow-up queries. 



A longer-term perspective of the project is to consider other tasks than the support to accessing the content of graph. Methodologically, we imagine extending the previous steps to consider tasks such as contributing, maintaining, validating or semantically enriching a knowledge graph. 
Skills 


    * Technical skills: knowledge graphs, semantics web, Linked Data, LLMs, ChatBot 
    * Programming: Python, RDF, SPARQL. 
    * Relational skills: Ability to work in an interdisciplinary and international network of collaborators 
    * Other valued appreciated: autonomous, proactive, focused on the research program, deliver on time 
    * Languages: English 

Apply online [ https://jobs.inria.fr/public/classic/en/offres/2024-07437 | https://jobs.inria.fr/public/classic/en/offres/2024-07437 ] 

Received on Sunday, 7 April 2024 09:09:35 UTC