Development of in silico samples and benchmarking of splicing and fusion detection tools applied in

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Hospices Civils de Lyon · Bron (France)

 Date de prise de poste : 6 janvier 2025

Mots-Clés

diagnostic oncology RNA-seq splicing fusion silico samples simulation

Description

Context

Tumours consist of various cells with distinct genetic variants (DNA) and abnormal transcript expression patterns
(RNA), which can vary widely even with a similar clinical presentation. To face this heterogeneity, treatments are
now prescribed to patients according to the genetic characteristics of their own tumours, in a so-called
“personalized medicine”.
RNA sequencing can be used to detect abnormal transcript expression, including splicing (alternative transcripts
originating from the same gene) and fusion events (two different transcripts merged into a single chimeric
transcript). However, its application in a clinical context remains challenging, especially with Formalin-Fixed
Paraffin-Embedded (FFPE) tumour samples in which DNA and RNA are scarcer and of lower quality. To assess
the limits of bioinformatics tools developed in-house or published by others in such a challenging context, a
thorough benchmark is required. As only very few control samples with known events are available, synthetic
datasets could provide the statistical robustness needed for this benchmark.
Results have already been obtained during a previous internship. This proof of concept produced in silico samples
showing a skip of exon 14 of the MET gene.

Mission

The M2 student will develop a protocol for generating synthetic sequencing data (FASTQ) as similar as possible
to real data produced in our facility. This protocol should enable the following:
1. Adjusting parameters, such as the number of total and altered reads, insert lengths, or read quality.
2. Introducing known splicing and fusion events with a specific number of reads.
3. To generalise the results of the proof of concept to other events (exon skipping or fusions).
4. Automate and implement the tool in a pipeline so that it can be used routinely to validate new targets.
The student will also test splicing detection tools on the generated in silico samples.
She/He will be supervised by a bioinformatician and a clinician.
Depending on the student's affinities, other bioinformatics pipeline developments may be envisaged: methylation,
variant calling, identity monitoring, etc.

Location

The student will be located at the Hospices Civils de Lyon (HCL), Groupement Hospitalier Est (59 Boulevard
Pinel; 69677 Bron; France), specifically in the bioinformatic group of the Centre de Biologie Pathologie Est
(CBPE).
The team develops and implements bioinformatics tools for DNA and RNA analysis in a clinical context, later
used by molecular pathologists to identify relevant genetic variants for diagnosis and to guide specific treatment
decisions.

Profile

We are seeking a highly motivated and detail-oriented M2 student. The candidate should possess the following
qualifications and skills:
- Educational Background: Currently enrolled in or recently graduated from an M2 program in bioinformatics.
- Programming Skills: Programming skills in scripting languages such as bash, Python or R are critical. Knowledge
in Next Generation Sequencing (NGS) pipeline implementation (Nextflow) and containers manipulation
(Singularity).
- Knowledge: Proficiency in Next Generation Sequencing data analysis and bioinformatic tools, knowledge of
molecular biology, and pipeline implementation. Experience in RNA-seq data analysis, especially in the context
of cancer biology, will be appreciated.

Contact

Marc Barritault: marc.barritault@chu-lyon.fr
Valentin Wucher: valentin.wucher@univ-lyon1.fr

Candidature

Procédure : Send an email with a CV and a motivation letter to Marc Barritault (marc.barritault@chu-lyon.fr) and Valentin Wucher (valentin.wucher@univ-lyon1.fr).

Date limite : 4 juillet 2025

Contacts

Marc Barritault and Valentin Wucher

 maNOSPAMrc.barritault@chu-lyon.fr

Offre publiée le 18 novembre 2024, affichage jusqu'au 3 mars 2025