TY - JOUR
T1 - parallelnewhybrid
T2 - an R package for the parallelization of hybrid detection using newhybrids
AU - Wringe, Brendan F.
AU - Stanley, Ryan R.E.
AU - Jeffery, Nicholas W.
AU - Anderson, Eric C.
AU - Bradbury, Ian R.
N1 - Funding Information:
The authors wish to thank Marion Sinclair-Waters and Mallory Van Wynegaarden for their help bug checking the code. We also thank Thierry Gosselin for encouraging us to publish this package. This work was supported by a Natural Sciences and Engineering Research Council Strategic project Grant and Fisheries and Oceans Canada funding (International Governance Strategy; Program for Aquaculture Regulatory Research; Genomics research and Development Initiative) to I.R.B.
Publisher Copyright:
© 2016 John Wiley & Sons Ltd
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Hybridization among populations and species is a central theme in many areas of biology, and the study of hybridization has direct applicability to testing hypotheses about evolution, speciation and genetic recombination, as well as having conservation, legal and regulatory implications. Yet, despite being a topic of considerable interest, the identification of hybrid individuals, and quantification of the (un)certainty surrounding the identifications, remains difficult. Unlike other programs that exist to identify hybrids based on genotypic information, newhybrids is able to assign individuals to specific hybrid classes (e.g. F1, F2) because it makes use of patterns of gene inheritance within each locus, rather than just the proportions of gene inheritance within each individual. For each comparison and set of markers, multiple independent runs of each data set should be used to develop an estimate of the hybrid class assignment accuracy. The necessity of analysing multiple simulated data sets, constructed from large genomewide data sets, presents significant computational challenges. To address these challenges, we present parallelnewhybrid, an r package designed to decrease user burden when undertaking multiple newhybrids analyses. parallelnewhybrid does so by taking advantage of the parallel computational capabilities inherent in modern computers to efficiently and automatically execute separate newhybrids runs in parallel. We show that parallelization of analyses using this package affords users several-fold reductions in time over a traditional serial analysis. parallelnewhybrid consists of an example data set, a readme and three operating system-specific functions to execute parallel newhybrids analyses on each of a computer's c cores. parallelnewhybrid is freely available on the long-term software hosting site github (www.github.com/bwringe/parallelnewhybrid).
AB - Hybridization among populations and species is a central theme in many areas of biology, and the study of hybridization has direct applicability to testing hypotheses about evolution, speciation and genetic recombination, as well as having conservation, legal and regulatory implications. Yet, despite being a topic of considerable interest, the identification of hybrid individuals, and quantification of the (un)certainty surrounding the identifications, remains difficult. Unlike other programs that exist to identify hybrids based on genotypic information, newhybrids is able to assign individuals to specific hybrid classes (e.g. F1, F2) because it makes use of patterns of gene inheritance within each locus, rather than just the proportions of gene inheritance within each individual. For each comparison and set of markers, multiple independent runs of each data set should be used to develop an estimate of the hybrid class assignment accuracy. The necessity of analysing multiple simulated data sets, constructed from large genomewide data sets, presents significant computational challenges. To address these challenges, we present parallelnewhybrid, an r package designed to decrease user burden when undertaking multiple newhybrids analyses. parallelnewhybrid does so by taking advantage of the parallel computational capabilities inherent in modern computers to efficiently and automatically execute separate newhybrids runs in parallel. We show that parallelization of analyses using this package affords users several-fold reductions in time over a traditional serial analysis. parallelnewhybrid consists of an example data set, a readme and three operating system-specific functions to execute parallel newhybrids analyses on each of a computer's c cores. parallelnewhybrid is freely available on the long-term software hosting site github (www.github.com/bwringe/parallelnewhybrid).
UR - http://www.scopus.com/inward/record.url?scp=84994702025&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994702025&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.12597
DO - 10.1111/1755-0998.12597
M3 - Article
C2 - 27617417
AN - SCOPUS:84994702025
SN - 1755-098X
VL - 17
SP - 91
EP - 95
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 1
ER -