Stage Master 2 Bioinformatique
Stage · Stage M2 · 6 mois Bac+5 / Master INSERM U1052 - Hepatitis Viruses and Pathobiology of Chronic Liver Disease - CRCL · Lyon (France)
Mots-Clés
Nanopore, Nextflow, Genomics, Hepatitis B Virus
Description
Title: Development and optimization of a Nextflow pipeline for HBV variant genotyping from Nanopore sequencing data
Background:
Hepatitis B virus (HBV) infection remains a major global health problem, affecting over 250 million people worldwide. So far, 10 genotypes have been described (named A to J) driven by regional differences (Liu et al., 2021; McNaughton et al., 2020; Toyé et al., 2023). Accurate identification and characterization of HBV variants are crucial for monitoring virus evolution and spreading. With the advent of long-read sequencing technologies like Oxford Nanopore Technologies (ONT), it is now possible to generate full-length genomic data for HBV that can reveal important insights into viral diversity and evolution. However, there is a lack of standardized bioinformatic pipelines for processing and analyzing ONT sequencing data for HBV.
Objective:
The goal of this project is to develop, adapt, and optimize a draft of a Nextflow pipeline for genotyping HBV variants from ONT sequencing data. This pipeline will involve several steps, including quality control, read alignment, variant calling, and post-processing of results. The student will use existing bioinformatic tools and workflows as a starting point and modify them as needed to improve accuracy, speed, and usability, including the development of ad hoc scripts in R and/or python.
Methodology:
The following tasks will be carried out during the internship:
-
Tool selection: the student will select appropriate bioinformatic tools for each step of the pipeline. These may include tools for quality control (e.g., FastQC, pycoQC), read alignment (e.g., Minimap2), variant calling (e.g., Nanopolish), and post-processing.
-
Workflow implementation: The student will design a Nextflow workflow that integrates the selected tools into a cohesive pipeline. The workflow will be modular, allowing users to easily customize parameters.
-
Testing and validation: The student will test the pipeline on simulated and/or real ONT sequencing datasets to evaluate performance.
-
Documentation: The student will create comprehensive user documentation, including a detailed manual and a README.
-
Production: Participate to translational studies including genotyping patients’ samples..
The student will have developed a robust and user-friendly Nextflow pipeline for HBV variant genotyping from ONT sequencing data. The pipeline will be tested and validated on multiple datasets and will be publicly available through Git.
Required skills :
Knowledge in Nextflow and Docker,
Programmation skills in R and Python,
Ease of use in a UNIX environment including bash.
The student will join the bioinformatics team, they will attend weekly team meetings and will have the opportunity to present their work at those meetings.
Languages : English (mandatory), French (greatly appreciated but not mandatory).
References:
Liu, Z., Zhang, Y., Xu, M., Li, X., Zhang, Z., 2021. Distribution of hepatitis B virus genotypes and subgenotypes. Medicine (Baltimore) 100, e27941. https://doi.org/10.1097/MD.0000000000027941
McNaughton, A.L., Revill, P.A., Littlejohn, M., Matthews, P.C., Ansari, M.A., 2020. Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences. J Gen Virol 101, 271–283. https://doi.org/10.1099/jgv.0.001387
Toyé, R.M., Loureiro, C.L., Jaspe, R.C., Zoulim, F., Pujol, F.H., Chemin, I., 2023. The Hepatitis B Virus Genotypes E to J: The Overlooked Genotypes. Microorganisms 11, 1908. https://doi.org/10.3390/microorganisms11081908
Candidature
Procédure :
Date limite : None
Contacts
Xavier Grand
xaNOSPAMvier.grand@inserm.fr
Offre publiée le 29 juillet 2024, affichage jusqu'au 31 décembre 2024