Analyzing clustered count data with a cluster specific random effect zero-inflated Conway-Maxwell-Poisson distribution

In recent years, data analysis techniques have been developed in biological and medical research areas with different types of count distributions. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays....

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:Journal of applied statistics. - 1991. - 45(2018), 5 vom: 28., Seite 799-814
1. Verfasser: Choo-Wosoba, Hyoyoung (VerfasserIn)
Weitere Verfasser: Datta, Somnath
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2018
Zugriff auf das übergeordnete Werk:Journal of applied statistics
Schlagworte:Journal Article Gaussian-Hermite (G-H) quadrature Mixed effects model Next- generation sequencing (NGS) data Poisson distribution Under- and over-dispersions
Beschreibung
Zusammenfassung:In recent years, data analysis techniques have been developed in biological and medical research areas with different types of count distributions. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays. Perhaps, the most common count distribution which has been used for analyzing such data is the Poisson distribution. However, a Poisson distribution, having a single underlying parameter, cannot cope with any other data dispersion pattern besides equidispersion. A negative binomial distribution is capable of modeling overdispersed, but not underdispersed data. However, a Conway-Maxwell-Poisson (CMP) distribution (Conway, R. W., and Maxwell, W. L., 1962) can handle not only overdispersion but also underdispersion. We show with an illustrative data set on next generation sequencing of maize hybrids that both underdispersion and overdispersion can be present in genetic data. Furthermore, if count data consists of clustered observations, one of the most efficient statistical technique is to introduce a cluster specific random effect term. Once again, the maize hybrids data presents such a situation. We develop inference procedures for a zero-inflated CMP regression that incorporates a cluster specific random effect term. Unlike, the Gaussian models, the underlying likelihood is computationally challenging. We use a numerical approximation via a Gaussian quadrature to circumvent this issues. A test for checking zero-inflation has also been developed in our setting. Finite sample properties of our estimators and test have been investigated by extensive simulations. Finally, the statistical methodology has been applied to analyze the maize data mentioned before
Beschreibung:Date Revised 25.02.2020
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:0266-4763
DOI:10.1080/02664763.2017.1312299