Mots-Clés
automatization in creating the designed subfamilies within the CAZy data
Description
Subfamilies of fungal carbohydrate-active enzymes
Filamentous fungi degrade plant cell wall polymers to acquire nutrients. They hence play pivotal roles in the carbon cycle via the degradation of dead organic matter, and constitute invaluable resources for the biotechnological production of chemicals from renewable biomass, as an alternative to fossil reserves. To improve our knowledge of fungal carbohydrate-active enzymes, a precise cartography of the protein sequence space into families (or subfamilies) of specific (or unknown) function is highly desirable. Following the deluge of sequences, phylogenetic approaches become obsolete and sequence similarity networks (SSN) appeared as a promising alternative. After using SSN to divide the large multifunctional GH16 family into function-specific subfamilies [2], we recently designed a method to guide and accelerate subfamily creations [3].
The first goal of the internship will be to use the aforementioned program for the analysis of a defined set of families (mostly of fungal origin) to be divided into subfamilies.
The project could then develop either towards:
• methodological developments: with the implementation of (i) solutions to cope with the most diverse and/or large families; (ii) visual representations of information (e.g. taxonomy, modularity) to better guide human decisions; (iii) automatization in creating the designed subfamilies within the CAZy data (SQL DB, multiple sequence alignments and HMM searches).
• data analysis: by the identification of subfamilies without characterized member and the selection of the most relevant targets for future biochemical characterization based on (i) their taxonomic/ecological distribution (ii) potential syntenies; (iii) shared companion modules; (iv) secretomics data in BBF fsec-DB.
The candidate will integrate the OxyMist project which aims to decipher the role of O2 in the degradation ability of the microbial communities. This collaborative project involves teams from Copenhagen, Cambridge and Marseille. In Marseille, the OxyMist project is piloted by the INRAE laboratory BBF which has a long-standing collaboration with the Glycogenomics group in AFMB laboratory. The Glycogenomics group, who will host the internship, maintains and develops since >20 years the CAZy database, the worlwide reference classification of enzyme families involved in glycan assembly and breakdown [1]. Both laboratories are located on the Luminy campus in the Calanques National Park.
Expected skills:
Bioinformatics of sequence analysis (blast, hmmer, cytoscape)
Programming (e.g. Python)
Database querying (SQL) or Comparative genomics
Communication (oral, written and electronic)
Additional knowledge in biotechnology, glycobiology, fungal secretomics could be an added value.
A personnal computer will be provided in the office and an access to to an in-house computing cluster is possible