Stage M2: Integrate short and long-read RNA-seq data in evolutionary splicing graphs

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Laboratory of Computational and Quantitative Biology (LCQB), Sorbonne Université · Paris (France)

 Date de prise de poste : 6 janvier 2025

Mots-Clés

alternative splicing evolutionary splicing graphs long reads short reads RNA-Seq

Description

Description:

Recent advances in high-throughput sequencing have revealed a vast array of RNA transcripts spliced from single genes [1], some of which give rise to distinct protein isoforms (proteoforms) [2] with unique structures [3], interactions [4], and functions [5]. The plethora of scenarios in which AS modulates protein functions and interactions play essential roles in muscle fiber diversification, nervous system development, and innate
immunity. Moreover, the combinatorial expression of various proteoforms can influence disease susceptibility and signaling outcomes in response to drugs, and AS misregulation is often linked to various diseases, including cancer. Yet, Experimentally determining how much of the splicing complexity uncovered by RNA-seq contributes to protein diversity remains a long-standing challenge [6].

In the absence of direct evidence, evolutionary conservation is a strong indicator of the functional relevance of splicing variants. In this context, we developed ThorAxe, a method that matches proteoform sequences from multiple genes or species into a unique structure called an Evolutionary Splicing Graph (ESG). We applied ThorAxe to the proteomes of 12 species (including human) from the Ensembl database, revealing a clear correlation between transcript conservation, tissue-specific regulation, and functional relevance [7]. We now aim to extend ThorAxe to integrate short- and long-read RNA-seq data, which capture many variations not annotated in Ensembl, into the ESG. On the one hand, short reads allow accurate detection of splice junctions, but due to their short length they do not allow for validating exon combinations (without assembly). On the other hand, long-read sequencing technologies may provide access to full-length transcripts, increasing the
accuracy of splice junction prediction, but at the cost of a much higher error rate. To support this development, we are offering a 2nd year Master’s level internship in Bioinformatics. The work will begin with a step of understanding what an ESG is. The candidate will then explore two alternatives. First, you will develop an algorithm to align a read in an ESG that has already been built (is there a path? If so, which one is the best?).
This step will make it possible to assess the extent to which an ESG built by including sequencing data would be modified. Second, you will integrate the reads directly in the construction of the ESG and will assess how it compares with ESG solely built from annotations (are the paths robust? How many new events can we detect?).

Profile and environment:

We are looking for a student with knowledge of molecular biology and algorithms, along with strong programming skills in Python. The project is a collaboration between two computational biology and bioinformatics teams: the BONSAI team specialized in sequence algorithms (CRISTAL, Lille) and the PROMISE team expert in evolutionary structural biology (LCQB, Paris). You will present and discuss your results with both teams. You will also interact with close collaborators from RKI, Berlin. We have extensive experience in high-throughput sequencing data analysis, and will provide guidance in your analytical and methodological choices.
However, the ability to work independently towards defined objectives and to solve programming challenges using online resources is essential.

References:

[1] Morillon, A., Gautheret, D. Bridging the gap between reference and real transcriptomes. Genome Biol 2019, 20 (112). https://doi.org/10.1186/s13059-019-1710-7.

[2] Smith, L. M.; Kelleher, N. L. Proteoforms as the next Proteomics Currency. Science 2018, 359 (6380), 1106–1107. https://doi.org/10.1126/science.aat1884.

[3] Birzele, F.; Csaba, G.; Zimmer, R. Alternative Splicing and Protein Structure Evolution. Nucleic Acids Res. 2008, 36 (2), 550–558. https://doi.org/10.1093/nar/gkm1054.

[4] Yang, X.; Coulombe-Huntington, J.; Kang, S.; Sheynkman, G. M.; Hao, T.; Richardson, A.; Sun, S.; Yang, F.; Shen, Y. A.; Murray, R. R.; Spirohn, K.; Begg, B. E.; Duran-Frigola, M.; MacWilliams, A.; Pevzner, S. J.; Zhong, Q.; Trigg, S. A.; Tam, S.; Ghamsari, L.; Sahni, N.; Yi, S.; Rodriguez, M. D.; Balcha, D.; Tan, G.; Costanzo, M.; Andrews, B.; Boone, C.; Zhou, X. J.; Salehi-Ashtiani, K.; Charloteaux, B.; Chen, A. A.; Calderwood, M. A.; Aloy, P.; Roth, F. P.; Hill, D. E.; Iakoucheva, L. M.; Xia, Y.; Vidal, M. Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing. Cell 2016, 164 (4), 805–817. https://doi.org/10.1016/j.cell.2016.01.029.

[5] Tapial, J.; Ha, K. C. H.; Sterne-Weiler, T.; Gohr, A.; Braunschweig, U.; Hermoso-Pulido, A.; Quesnel-Vallières, M.; Permanyer, J.; Sodaei, R.; Marquez, Y.; Cozzuto, L.; Wang, X.; Gómez-Velázquez, M.; Rayon, T.; Manzanares, M.; Ponomarenko, J.; Blencowe, B. J.; Irimia, M. An Atlas of Alternative Splicing Profiles and Functional Associations Reveals New Regulatory Programs and Genes That Simultaneously Express Multiple Major Isoforms. Genome Res. 2017, 27 (10), 1759–1768. https://doi.org/10.1101/gr.220962.117.

[6] Tress, L.; Abascal, F.; Valencia, A. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends in Biochemical Sciences 2017, 42, (2), 98-110. https://doi.org/10.1016/j.tibs.2016.08.008.

[7] Zea, D. J.; Laskina, S.; Baudin, A.; Richard, H.; Laine, E. Assessing Conservation of Alternative Splicing with Evolutionary Splicing Graphs. Genome Res. 2021, 31 (8), 1462–1473. https://doi.org/10.1101/gr.274696.120.

Candidature

Procédure : Send a CV and cover letter to elodie.laine@sorbonne-universite.fr, jean-stephane.varre@univ-lille.fr and arnaud.liehrmann@sorbonne-universite.fr

Date limite : None

Contacts

Elodie Laine

 elNOSPAModie.laine@sorbonne-universite.fr

Offre publiée le 15 octobre 2024, affichage jusqu'au 31 décembre 2024