internship M2: Analysis of GRNs with multiparameter persistent homology
Stage · Stage M2 · 6 mois Bac+4 Inria / INRAe · Sophia Antipolis (France)
Date de prise de poste : 1 avril 2025
Mots-Clés
topological data analysis gene regulatory networks plant-pathogen interactions persistence diagrams
Description
Internship
Analysis of GRNs with multiparameter persistent homology
Description
Topological Data Analysis (TDA) is a growing field of data science, whose connections to machine learning have increasing tremendously within the last few years [1]. This is mostly due to TDA’s ability to produce compact descriptors, called the persistence diagrams, that can encode complex topological information from data sets. Applications in bioinformatics are of particular interest, since the continuous biological phenomena underlying many data sets can be efficiently retrieved with topology [10,11]. A recent example of such applications has been explored recently [2], and deals with studying gene regulatory networks (GRNs). It was shown in this article that the topological structures (encoded in persistence diagrams) that are present in GRNs are particularly useful and efficient at capturing high-order interactions between genes, which in turn is beneficial for characterizing subtle deviations in cancerous cells from healthy ones.
However, a big limitation of persistence diagrams is that they have to be computed from a given, usually user-defined, scalar-valued function. This has somehow impeded the capabilities of persistence diagrams, since the topological structures that they encode depend a lot on that function: if the function is poorly chosen, so will be the information retrieved by the persistence diagram. For instance, when dealing with a noisy data set, one might want to capture the geometric properties of the similarities between the points, and, at the same time, the signal vs. noise information given by some estimates of the density.
One possible solution is to use multiparameter persistent homology [12], a theoretical framework that allows to work with multiple functions on the data . Unfortunately, the theoretical results are not as advanced as in the one-parameter setting, and the corresponding topological descriptors are not always well-defined. Furthermore, their complicated definitions also make them difficult to compute, use and to feed to common statistical and machine learning pipelines. Some efforts have been made recently towards this direction, with the introduction of, e.g., MMA [14,15] or the signed barcodes [16,17], and their corresponding vectorizations. It remains unclear however how to properly apply them in several application scenarios, and to what extent they allow for new discoveries, especially in the context of GRNs. The objective of the internship is thus to extend, deploy and investigate the efficiency of multiparameter persistent homology tools in bioinformatics, with a specific focus on GRNs.
Specifically, we will focus on GRNs built to study plant-pathogens interactions. Plant-pathogens interactions are based on a molecular dialogue between the pathogen and its host that are influenced by factors, at different biological layers, altogether concurring for a successful or unsuccessful infections. The study of gene regulatory networks can shed light on this complex network of interactions. Leveraging TomTom, a knowledge network developed in Dr. Bottini’s team at INRAe collecting around 3 million molecular interactions of tomato, we will build GRNs by studying multi-transcriptomics data of tomato during the infection of multiple pathogens. The identification of communities and hubs using the novel extensions that will be developed during the internship will allow to characterize whether the tomato plant activates similar and/or specific mechanisms to respond to the pathogens.
Tasks
To achieve this goal, the student will work on one, or several, of the following objectives:
- From a theoretical perspective, the student will develop a theoretical framework for defining, computing and understanding the statistical and robustness properties of communities in GRNs with multiparameter persistent homology combined with the persistent-based clustering algorithm ToMATo [9,13]. This will begin with an in depth analysis of some descriptors proposed recently [4,6,14,15].
- From a practical perspective, the student will work on implementing new architectures and algorithms (in Python or C++) for using the previously mentioned community detection and statistical inference methods on practical GRNs from tomato-multiple pathogens studies.
Keywords
Topological data analysis, multiparameter persistence, machine learning, computational biology, gene regulatory networks, plant-pathogens interactions
Complementary information
Advisors:
Mathieu Carrière, mathieu.carriere@inria.fr, 04.92.38.77.57
Silvia Bottini, silvia.bottini@inrae.fr, 06.85.63.49.48
Place: The internship will take place at Centre Inria d’Université Côte d’Azur, SIRET 18008904700013, 2004 Route des Lucioles, 06902 Biot. As the internship is based on a collaboration between Inria and INRAE, frequent discussions and visits to INRAE are to be expected as well.
Dates: 6 months, starting in April/May 2025
Prerequisites
Level: Master 2
Language: French or English
Programming skills: Python, C++
Libraries: GUDHI, Scikit-Learn, TensorFlow.
Bibliography
[1] Gunnar Carlsson.
Topology and data.
Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
[2] Hosein Masoomy, Behrouz Askari, Samin Tajik, Abbas K. Rizi, and G. Reza Jafari.
Topological analysis of interaction patterns in cancer-specific gene regulatory network: persistent homology approach.
Scientific Reports, 11(1), 2021.
[3] Andrew Aukerman, Mathieu Carrière, Chao Chen, Kevin Gardner, Raúl Rabadán, and Rami Vanguri.
Persistent homology based characterization of the breast cancer immune microenvironment: a feasibility study.
In 36th International Symposium on Computational Geometry (SoCG 2020), pages 11:1–11:20. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2020.
[4] René Corbet, Ulderico Fugacci, Michael Kerber, Claudia Landi, and Bei Wang.
A kernel for multi-parameter persistent homology.
Computers & Graphics: X, 2:100005, 2019.
[5] Michael Kerber, Michael Lesnick, and Steve Oudot.
Exact computation of the matching distance on 2-parameter persistence modules.
In 35th International Symposium on Computational Geometry (SoCG 2019), volume 129, pages 46:1–46:15. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
[6] Oliver Vipond.
Multiparameter persistence landscapes.
Journal of Machine Learning Research, 21(61):1–38, 2020.
[7] Michael Lesnick and Matthew Wright.
Interactive visualization of 2D persistence modules.
In CoRR. ArXiv:1512.00180, 2015.
[8] David Cohen-Steiner, Herbert Edelsbrunner, and Dmitriy Morozov.
Vines and vineyards by updating persistence in linear time.
In 22nd Annual Symposium on Computational Geometry (SoCG 2006), pages 119–126. Association for Computing Machinery, 2006.
[9] Frédéric Chazal, Leonidas Guibas, Steve Oudot, and Primoz Skraba.
Persistence-based clustering in Riemannian manifolds.
Journal of the ACM, 60(6), 2013.
[10] Joseph Chan, Gunnar Carlsson, and Raúl Rabadán.
Topology of viral evolution.
Proceedings of the National Academy of Sciences of the United States of America, 110(46):18566–18571, 2013.
[11] Abbas Rizvi, Pablo Cámara, Elena Kandror, Thomas Roberts, Ira Schieren, Tom Maniatis, and Raúl Rabadán.
Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development.
Nature Biotechnology, 35:551–560, 2017.
[12] Gunnar Carlsson, Afra Zomorodian.
The Theory of Multidimensional Persistence.
Discrete Comput Geometry, 42:71–93, 2009.
[13] Luis Scoccola and Alexander Rolle.
Persistable: persistent and stable clustering.
Journal of Open Source Software, 8(83):5022, 2023.
[14] David Loiseaux, Mathieu Carrière, and Andrew Blumberg.
Fast, stable and efficient approximation of multi-parameter persistence modules with MMA.
In CoRR. ArXiv:2206.02026, 2022.
[15] David Loiseaux, Mathieu Carrière, and Andrew Blumberg.
A framework for fast and stable representations of multiparameter persistent homology decompositions.
In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
[16] David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Botnan, and Steve Oudot.
Stable vectorization of multiparameter persistent homology using signed barcodes as measures.
In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.
[17] Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, and Steve Oudot.
Differentiability and convergence of filtration learning with multiparameter persistence.
In 41st International Conference on Machine Learning (ICML 2024). PMLR, 2024.
Candidature
Procédure :
Date limite : 31 décembre 2024
Contacts
Silvia Bottini
siNOSPAMlvia.bottini@inrae.fr
Offre publiée le 29 octobre 2024, affichage jusqu'au 31 décembre 2024