Bioinformatician (M/F) - Description, Storage, and Standardization of Datasets and Workflows

 CDD-OD · Ingénieur autre  · 23 mois    Bac+5 / Master   Institut Pasteur · Paris (France)

 Date de prise de poste : 1 octobre 2023

Mots-Clés

Workflows, annotation, standards, knowledge base

Description

Context: 

The ShareFAIR project (PEPR “Digital Health”) aims to promote the sharing and exchange of health data and their analysis, with a focus on interoperability, reusability, and transparency.

Bioinformatic analyses are complex and rely on various tools that need to be configured and chained together. In this context, improving the reproducibility of the obtained results is of paramount importance, especially in the field of health. This is typically achieved through the design, implementation, and execution of workflows (e.g. Snakemake, Nextflow), which offer numerous advantages, such as improving the reproducibility of analyses and better tracking of data provenance.

These workflows are generally scattered across public repositories, poorly annotated, and difficult to query. Challenges, therefore, include the standardization and annotation of datasets and workflows, as well as their synthesis into interoperable, shareable, and reusable workflows.

Subject:

Within the scope of this project, we are seeking an engineer specialized in bioinformatics workflows, data, and knowledge engineering to contribute to the definition and implementation of standards and best practices to achieve these objectives. The successful candidate will work closely with a multidisciplinary team, including bioinformatics researchers and engineers, developers, and data management experts.

Main Missions and Activities: 

Main responsibilities:

  • Identification of standards for the representation and annotation of workflows:
  • Perform an in-depth analysis of existing standards such as RO-Crate, EDAM, and others that are relevant.
  • Evaluate their applicability to the specific needs of the ShareFAIR project.
  • Recommend and justify appropriate choices of standards for the representation and annotation of workflows.
  • Construction of a knowledge base integrating the identified standards:
  • Design and implement an infrastructure for the creation of a consolidated knowledge base, using the selected standards.
  • Develop automated pipelines for the integration and management of data from different sources.
  • Collaborate with the team to ensure the quality, consistency, and accuracy of data in the knowledge base.
  • Adaptation and improvement of concepts borrowed from standards:
  • Examine the scope and limitations (in terms of quality and coverage) of the identified standards.
  • Propose improvements and adaptations to meet the specific needs of the ShareFAIR project.
  • Implement these improvements in collaboration with the development team.

Candidate Profile: 

Bachelor’s degree (Bac +5) in computer science or bioinformatics.

The Hub of Bioinformatics and Biostatistics and Institut Pasteur are committed to promoting gender equality, and female candidates are encouraged to apply.

Required Education and Skills:

Proficiency in Python and/or Java for software development.

Solid knowledge of databases, including SQL and/or NoSQL.

Familiarity with knowledge representation formats such as JSON and RDF.

Understanding of ontologies and bioinformatics workflows (an advantage).

Ability to work independently and collaborate effectively within a multidisciplinary team.

Good communication and documentation skills.

Proficiency in professional English.

Candidature

Procédure : Send CV and motivation letter by email.

Date limite : None

Contacts

Hervé Ménager, Frédéric Lemoine

 heNOSPAMrve.menager@pasteur.fr

 https://research.pasteur.fr/en/job/bioinformatician-m-f-description-storage-and-standardization-of-datasets-and-workflows/

Offre publiée le 26 juillet 2023, affichage jusqu'au 31 décembre 2023