Ingénieur d'étude ou stage M2
Autre · Autres · 6 mois (renouvelable) Bac+5 / Master INRAe / Institut Sophia Agrobiotech · Sophia Antipolis (France)
Date de prise de poste : 1 mars 2025
Mots-Clés
LLMs omics autonomous pipeline
Description
Development of an autonomous pipeline to process raw omics data based on LLMs
Context and objectives
The global population is rapidly increasing, representing a major challenge for food supply, exacerbated by climate change and environmental degradation. Most of those depend on agriculture, however, plants' health and survival are threatened by various biotic stressors. Despite the massive use of chemicals since the end of the Second World War II, plant pathogens still represent the major cause of crop losses every year. While cropd yield boost is needed, the food production system needs a strong revolution to eliminate pesticide consumption leading to environmental pollution that is no longer viable. Advancing our knowledge about the functioning and success of plant-pathogen interactions is of prime importance to sustainably improve global plant health.
The biotechnological and digital advances of the last decade offer a great opportunity to overcome this stalemate. The flourishing of omics techniques has led to the possibility of studying complex biological systems, through systematic analysis of its content at the molecular level. A common paradox across scientific domains and particularly in biology, is the increasing ability to collect and create observational data far exceeding the ability to extract interpretable information and knowledge in this data deluge. Therefore, providing novel methodologies to address this challenge would represent a breakthrough improvement
By using as a biological model the tomato (Solanum lycopersicum), one of the most economically important vegetables throughout the world, we aim to answer the described challenges leveraging multi-omics integration, artificial intelligence, network inference and topological data analysis. So far, we have collected more than 2000 multi-omics samples of tomato under biotic stress from publicly available data. Although the metadata is already organized in an internal meta-database, the processing of raw transcriptomics data is tedious and several steps require human checks to decide and adapt the following steps. Recent developments and the success of large language models (LLMs) offer a unique opportunity to create autonomous systems capable of decision-making and calling the appropriate external tools for task execution. The main objective of this project is to set up an autonomous pipeline based on LLMs to analyse the 2000 multi-omics datasets collected by our team.
The main activities will consist in:
Objective 1: Production of a generalized, optimized and time-saving workflow for parallel multi-omics processing.
- Development/optimization of processing workflow to analyse raw omics data to minimize the manual curation and human intervention for quality checks of the different steps.
- Parallelization to perform multiple sample processing at the same time due to the huge amount of samples available (more than 2000 up to now).
Objective 2: Production of an autonomous pipeline based on LLMs for raw omics data processing.
- By taking inspiration from SeqMate (https://arxiv.org/html/2407.03381v1) create an automated pipeline that takes as input raw data and yields data cleaned, mapped, annotated and quantified.
Candidate's profile
Good knowledge of LLMs, pipeline optimization, workflow.
Knowledge of omics and omics processing is a plus but not necessary.
Master 2 internship for 6 months or ingénieur d’étude (IE), starting from March/April 2025.
Contact:
Dr. Silvia Bottini, junior professor chair and PI team SMILE https://eng-institut-sophia-agrobiotech.paca.hub.inrae.fr/research-teams/smile
Email : silvia.bottini@inrae.fr
Candidature
Procédure : Send email to: silvia.bottini@inrae.fr
Date limite : 31 janvier 2025
Contacts
Silvia Bottini
siNOSPAMlvia.bottini@inrae.fr
Offre publiée le 16 décembre 2024, affichage jusqu'au 31 janvier 2025