A phylogenetic mixture model for the identification of functionally divergent protein residues

Daniel Gaston; Edward Susko; Andrew J. Roger

doi:10.1093/bioinformatics/btr470

A phylogenetic mixture model for the identification of functionally divergent protein residues

Daniel Gaston, Edward Susko, Andrew J. Roger

Research output: Contribution to journal › Article › peer-review

24 Citations (Scopus)

Abstract

Motivation: To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. Results:We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.

Original language	English
Article number	btr470
Pages (from-to)	2655-2663
Number of pages	9
Journal	Bioinformatics
Volume	27
Issue number	19
DOIs	https://doi.org/10.1093/bioinformatics/btr470
Publication status	Published - Oct 2011

Bibliographical note

Funding Information:
D.G. would like to thank William Fletcher for implementing several suggested changes to INDELible, to B.W. Brandt for providing a script for the Multi-Harmony web server that allowed testing of a large number of datasets and to Olivier Lichtarge and Angela Dawn Wilkins for running real value ET on the 11 biological datasets Funding: Nova Scotia Health Research Foundation graduate student research award (to D.G.); Natural Sciences and Engineering Research Council of Canada, Discovery Grant (227085-2011 to A.J.R. and E.S.).

ASJC Scopus Subject Areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

PubMed: MeSH publication types

Journal Article
Research Support, Non-U.S. Gov't

Access to Document

10.1093/bioinformatics/btr470

Cite this

@article{baf969bfc650494ea9324e43f340c25f,

title = "A phylogenetic mixture model for the identification of functionally divergent protein residues",

abstract = "Motivation: To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. Results:We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.",

author = "Daniel Gaston and Edward Susko and Roger, {Andrew J.}",

note = "Funding Information: D.G. would like to thank William Fletcher for implementing several suggested changes to INDELible, to B.W. Brandt for providing a script for the Multi-Harmony web server that allowed testing of a large number of datasets and to Olivier Lichtarge and Angela Dawn Wilkins for running real value ET on the 11 biological datasets Funding: Nova Scotia Health Research Foundation graduate student research award (to D.G.); Natural Sciences and Engineering Research Council of Canada, Discovery Grant (227085-2011 to A.J.R. and E.S.).",

year = "2011",

month = oct,

doi = "10.1093/bioinformatics/btr470",

language = "English",

volume = "27",

pages = "2655--2663",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "19",

}

TY - JOUR

T1 - A phylogenetic mixture model for the identification of functionally divergent protein residues

AU - Gaston, Daniel

AU - Susko, Edward

AU - Roger, Andrew J.

N1 - Funding Information: D.G. would like to thank William Fletcher for implementing several suggested changes to INDELible, to B.W. Brandt for providing a script for the Multi-Harmony web server that allowed testing of a large number of datasets and to Olivier Lichtarge and Angela Dawn Wilkins for running real value ET on the 11 biological datasets Funding: Nova Scotia Health Research Foundation graduate student research award (to D.G.); Natural Sciences and Engineering Research Council of Canada, Discovery Grant (227085-2011 to A.J.R. and E.S.).

PY - 2011/10

Y1 - 2011/10

N2 - Motivation: To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. Results:We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.

AB - Motivation: To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. Results:We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.

UR - http://www.scopus.com/inward/record.url?scp=80053447612&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053447612&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr470

DO - 10.1093/bioinformatics/btr470

M3 - Article

C2 - 21840876

AN - SCOPUS:80053447612

SN - 1367-4803

VL - 27

SP - 2655

EP - 2663

JO - Bioinformatics

JF - Bioinformatics

IS - 19

M1 - btr470

ER -

A phylogenetic mixture model for the identification of functionally divergent protein residues

Abstract

Bibliographical note

ASJC Scopus Subject Areas

PubMed: MeSH publication types

Access to Document

Other files and links

Fingerprint

Cite this