AI-Driven Drug Discovery in Oncology : harnessing LLM for Molecular Insights

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Oncodesign Precision Medicine · DIjon (France)  Indemnnité mensuelle brute de 1000€ + chèque déjeuner + transport

 Date de prise de poste : 1 février 2024


LLM - python - Molecular Biology - Alphamissense


Our Company

OPM is a biopharmaceutical company specialized in precision medicine. OPM's mission is to bring innovative therapeutic and diagnostic solutions to treat therapeutic resistance and metastasis evolution. The patient is at the center of our reflection, of our unique innovative model, and our investments. For OPM "our collective success is paramount", there can be no value creation without exchange, without dialogue. The value creation resulting for us from reciprocity, i.e. balanced and fair exchanges at all levels, whether between internal collaborators, or with our partners, therapists, patients, experts and investors.


The language of molecular biology, once deciphered only through rigorous experimentation, is on the brink of a revolution. As we enter the realm of large language models (LLMs) and deep learning, the promise of highly accurate in silico models for comprehending the intricate world of molecular biology, from DNA to gene expression to proteins, is within reach.

The fusion of cutting-edge artificial intelligence and molecular biology holds the potential to reshape medicine and pharmaceutical discovery. For our biopharmaceutical company, Oncodesign Precision Medicine (OPM), dedicated to identifying new therapeutic targets and developing drugs for the treatment of advanced, resistant, and metastatic cancers, harnessing the power of LLMs is not just a possibility; it's a necessity. LLMs emerge as a crucial tool in our pursuit of breakthroughs in oncology, enabling us to explore the complex molecular landscape of cancer, understand its underlying mechanisms, and discover novel avenues for treatment. Some examples of recent LLMs that our team is investigating:

  • scGPT (1), a foundation model designed for single-cell transcriptomics, chromatin accessibility, and protein abundance.
  • ESM1b (2) and Alpha Missense (3), LLMs that predict the pathogenic effect of missense variants in the human genome.
  • Geneformer (4), a foundation model pretrained on ~30 million single cell transcriptomes from a broad range of human tissues to enable context-aware predictions in settings with limited data in network biology. 

Missions & activities of the internship

Under co-supervision of two Senior Data Scientists holding PhD titles and interdisciplinary background in artificial intelligence, immunology, mathematics, genetics, genomics and bioinformatics, your duties is to deliver:

  • Evaluation of State-of-the-Art LLMs in Molecular Biology: Conduct an extensive review of the latest developments in LLMs, assessing their performance and identifying the most promising LLMs that can be leveraged for new therapeutic targets discovery.
  • Model Implementation: Develop the technical proficiency to apply LLMs. This involves understanding the practical aspects of model implementation and finetuning.
  • Data preprocessing and analysis: extract meaningful insights from public and/or proprietary datasets.
  • Git code repositories with well-documented scripts in Python and notebooks with any conducted analysis.
  • A report summarizing your findings and contributions.

Student expected background/Knowledge.

M2 student or last year in Engineer School with educational background in a relevant field (Computational biology, bioinformatics, artificial intelligence or related).

Essential skills include programming, machine learning, understanding of key concepts of molecular biology.

Familiarity with NLP and LLMs is a significant advantage.

Fluent in French & English languages


Procédure : Envoyer votre dossier de candidature (CV + lettre) sous réf "LLD4Molins"par email à Les candidatures sont d'abord classées sur la qualité du dossier. Les meilleures seront invitées à des entretiens sur site ou en visioconférence.

Date limite : 1 décembre 2023


Thierry Billoué

Offre publiée le 2 novembre 2023, affichage jusqu'au 1 décembre 2023