Stage Master 2 : Bioinformatics analysis of SARS-CoV-2: evaluating and improving sgRNA detection met
Stage · Stage M2 · 6 mois Bac+5 / Master CRIStAL · Lille (France)
Mots-Clés
SARS-COV bioinformatic long read nanopore mapping
Description
"Bioinformatics analysis of SARS-CoV-2: evaluating and improving sgRNA detection methods"
Context
This internship takes place in Lille University (campus cité Scientifique, Villeneuve d'Ascq), and will be supervised by members of the BONSAI team (sequence bioinformatics, CRIStAL laboratory).
Viral subgenomic RNA (sgRNA)[] plays a major role in SARS-COV2's replication, pathogenicity, and evolution. Recent sequencing protocols, such as the ARTIC[1] protocol for Nanopore data, have been established. However, due to viral-specific biological processes, analyzing sgRNA through viral-specific read sequencing data is a computational challenge. Current methods rely on computational tools designed for eukaryote genomes, resulting in a gap in the tools designed specifically for sgRNA detection. Periscope[2] is a tool specifically designed to detect and quantify sgRNA based on a two step approach.
The first step uses a mapping tool called minimap2[3]. Mapping tools are designed to locate the result of a sequencing run (called a read) to a reference. The second step consists in locally aligning a part of the sequence to a reference in order to find a specific motif.
A previous study conducted by our team highlighted some biases in Periscope. To fix these biases our team developed Periscope_multi a new approach based on the multi-reference approach for the mapping part.
It will now be interesting to know if using another mapper can improve the results on such data and how the parametrization of minimap2 affects the result.
Tasks
-Create a new implementation of periscope and periscope_multi with different mappers.
-Improve an existing pipeline to compare the effect of parameterization of the mapper and of the local aligner on the results.
-Perform benchmarks using synthetic and real Nanopore data.
Profile
Applicants ideally have a background in either programming/computer science/bioinformatics.
Contacts: thomas.baudeau, mikael.salson at univ-lille
References
[1] Improvements to the ARTIC multiplex PCR method for SARS-CoV-2 genome sequencing using nanopore, Tyson et al., bioRXiv, 2020, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480024/
[2] Subgenomic RNA identification in SARS-CoV-2 genomic sequencing data, Parker MD, Genome Res. 2021, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015849/
[3] Minimap2: pairwise alignment for nucleotide sequences, Li, Bioinformatics, 2018, https://doi.org/10.1093/bioinformatics/bty191
Candidature
Procédure :
Date limite : None
Contacts
Thomas Baudeau
thNOSPAMomas.baudeau+stage@univ-lille.fr
Offre publiée le 2 novembre 2023, affichage jusqu'au 11 décembre 2023