Advancing algorithm design for phylogenetic inference using agreement forests and graph exploration

  • Whidden, Christopher (PI)

Projet: Research project

Détails sur le projet

Description

Evolutionary analysis is fundamental to modern medicine and biology. For example, disease-causing bacteria evolve drug resistance and influenza viruses require new vaccines annually due to their high mutation rates. Researchers around the world use evolutionary tree-building techniques, called phylogenetics, to learn about these evolutionary processes from data such as DNA and protein sequences. Phylogenetic "family trees" are a primary tool used for studying evolution. Phylogenetic inferences have applications across biology, from reconstructing the spread of antimicrobial resistance through mutation and gene sharing to understanding why some subspecies of Atlantic salmon grow larger before returning from the ocean. Although phylogenetic inference methods are used worldwide by thousands of researchers, the development of new phylogenetic algorithms has not kept pace with the astounding increase in available data. Current phylogenetic inference methods rely on being able to effectively search through very large sets of possible phylogenetic trees. However, this mathematical "tree space" is still poorly understood. As a direct result, current phylogenetic methods either rapidly seek a single estimate or require weeks to explore statistically representative sets of candidate trees. In addition, a tree shows only part of the picture and we need to consider evolutionary processes like gene transfer and recombination that don't follow only one tree. We need new, fast algorithms for these problems. This proposal encompasses three complementary projects that will advance our ability to infer, compare, and evaluate phylogenies. First, we will develop new phylogenetic inference software "Phylogenetic Topographer" (PT) that uses a better understanding of tree space to directly explore the most likely trees and estimate their probability. This will enable fast tree inference with statistical confidence. Second, we will develop new gene transfer inference software by combining all plausible transfer scenarios. This will provide inference of specific transfers between specific bacteria as opposed to my previous work on "highways" of gene transfer. Third, we will develop novel algorithms for computing distances between and reconciling differences in time trees. This will enable rapid analysis of recombination in viruses such as the novel coronavirus. My research program will enable phylogenetic algorithm development to catch up to the scale of modern datasets. Taken together, these algorithms will help us estimate the evolutionary tree for thousands of organisms with statistical confidence and help us understand evolutionary patterns that deviate from that tree. In particular, this research will help us find specific transfers of antibiotic resistance between bacteria and specific recombination of infection traits in viruses. This understanding will be essential as we try to find solutions to many problems in biodiversity and human health.

StatutActif
Date de début/de fin réelle1/1/22 → …

Financement

  • Natural Sciences and Engineering Research Council of Canada: 21 515,00 $ US

ASJC Scopus Subject Areas

  • Ecology, Evolution, Behavior and Systematics
  • Computer Science(all)