Abstract
Background: With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Findings. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. Conclusions: TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.
Original language | English |
---|---|
Article number | 145 |
Journal | BMC Research Notes |
Volume | 6 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2013 |
Bibliographical note
Funding Information:We thank the Bigelowiella natans and Guillardia theta nuclear genome sequencing project members for contributions to the early stages of program development, and F. Burki for valuable feedback and for providing and testing training data. L. Eme is thanked for critical reading of the manuscript and helpful suggestions. S.M. is a JSPS Postdoctoral Fellow for Research Abroad of the Japan Society for the Promotion of Science. J.M.A. is a Fellow of the Canadian Institute for Advanced Research, Program in Integrated Microbial Biodiversity, and a New Investigator Award holder from the Canadian Institutes of Health Research. Support in the form of a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada is also acknowledged.
ASJC Scopus Subject Areas
- General Biochemistry,Genetics and Molecular Biology
PubMed: MeSH publication types
- Journal Article
- Research Support, Non-U.S. Gov't