A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Cindy Xin Feng

doi:10.1186/s40488-021-00121-4

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Cindy Xin Feng

Medicine

Résultat de recherche: Review article › examen par les pairs

168 Citations (Scopus)

Résumé

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Langue d'origine	English
Numéro d'article	8
Journal	Journal of Statistical Distributions and Applications
Volume	8
Numéro de publication	1
DOI	https://doi.org/10.1186/s40488-021-00121-4
Statut de publication	Published - déc. 2021

Note bibliographique

Funding Information:
This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant. The funder played no role in any of the design of the study, analysis, interpretation of data, and writing the manuscript.

Publisher Copyright:
© 2021, The Author(s).

ASJC Scopus Subject Areas

Statistics and Probability
Computer Science Applications
Statistics, Probability and Uncertainty

PubMed: MeSH publication types

Journal Article
Review

Accès au document

10.1186/s40488-021-00121-4

Autres fichiers et liens

Citer

@article{3a32b794332c441fb59ecd3d56df5c9b,

title = "A comparison of zero-inflated and hurdle models for modeling zero-inflated count data",

abstract = "Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.",

author = "Feng, {Cindy Xin}",

note = "Funding Information: This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant. The funder played no role in any of the design of the study, analysis, interpretation of data, and writing the manuscript. Publisher Copyright: {\textcopyright} 2021, The Author(s).",

year = "2021",

month = dec,

doi = "10.1186/s40488-021-00121-4",

language = "English",

volume = "8",

journal = "Journal of Statistical Distributions and Applications",

issn = "2195-5832",

publisher = "Springer Open",

number = "1",

}

TY - JOUR

T1 - A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

AU - Feng, Cindy Xin

N1 - Funding Information: This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant. The funder played no role in any of the design of the study, analysis, interpretation of data, and writing the manuscript. Publisher Copyright: © 2021, The Author(s).

PY - 2021/12

Y1 - 2021/12

N2 - Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

AB - Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

UR - http://www.scopus.com/inward/record.url?scp=85108826270&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108826270&partnerID=8YFLogxK

U2 - 10.1186/s40488-021-00121-4

DO - 10.1186/s40488-021-00121-4

M3 - Review article

C2 - 34760432

AN - SCOPUS:85108826270

SN - 2195-5832

VL - 8

JO - Journal of Statistical Distributions and Applications

JF - Journal of Statistical Distributions and Applications

IS - 1

M1 - 8

ER -

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Résumé

Note bibliographique

ASJC Scopus Subject Areas

PubMed: MeSH publication types

Accès au document

Autres fichiers et liens

Empreinte numérique

Citer