Title: | Unified Zero-Inflated Hurdle Regression Models |
---|---|
Description: | Run a Gibbs sampler for hurdle models to analyze data showing an excess of zeros, which is common in zero-inflated count and semi-continuous models. The package includes the hurdle model under Gaussian, Gamma, inverse Gaussian, Weibull, Exponential, Beta, Poisson, negative binomial, logarithmic, Bell, generalized Poisson, and binomial distributional assumptions. The models described in Ganjali et al. (2024) <doi:...>. |
Authors: | Taban Baghfalaki [cre, aut] , Mojtaba Ganjali [aut] , Narayanaswamy Balakrishnan [aut] |
Maintainer: | Taban Baghfalaki <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.3.0 |
Built: | 2024-11-18 04:53:46 UTC |
Source: | https://github.com/tbaghfalaki/uhm |
Simulated data was generated with x1 following a Bernoulli distribution with a success probability of 0.4, x2 following a standard normal distribution, and y following a zero-inflated Beta regression model.
dataB
dataB
A data frame which contains x1, x2 and y.
the response variable
Binary covariate
Continuous covariate
Simulated data was generated with x1 following a Bernoulli distribution with a success probability of 0.4, x2 following a standard normal distribution, and y following a zero-inflated Gaussian regression model.
dataC
dataC
A data frame which contains x1, x2 and y.
the response variable
Binary covariate
Continuous covariate
Simulated data was generated where x1 follows a Bernoulli distribution with a success probability of 0.2, x2 follows a standard normal distribution, and y follows a zero-inflated Poisson regression model.
dataD
dataD
A data frame which contains x1, x2 and y.
the response variable
Binary covariate
Continuous covariate
Simulated data was generated with x1 following a Bernoulli distribution with a success probability of 0.4, x2 following a standard normal distribution, and y following a zero-inflated inverse Gaussian regression model.
dataI
dataI
A data frame which contains x1, x2 and y.
the response variable
Binary covariate
Continuous covariate
Simulated data was generated with x1 following a Bernoulli distribution with a success probability of 0.4, x2 following a standard normal distribution, and y following a zero-inflated exponential regression model.
dataP
dataP
A data frame which contains x1, x2 and y.
the response variable
Binary covariate
Continuous covariate
Computing a prediction for new observations
Prediction(object, data)
Prediction(object, data)
object |
an object inheriting from class ZIHR |
data |
dataset of observed variables with the same format as the data in the object |
It provides a summary of the output of the ZIHR function, including parameter estimations.
Estimation, standard errors and 95% credible intervals for predictions
Taban Baghfalaki [email protected], Mojtaba Ganjali [email protected]
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)
Computing a summary of the outputs of the ZIHR function
SummaryZIHR(object)
SummaryZIHR(object)
object |
an object inheriting from class ZIHR |
It provides a summary of the output of the ZIHR function, including parameter estimations.
Estimation list of posterior summary includes estimation, standard deviation, lower and upper bounds for 95% credible intervals, and Rhat (when n.chain > 1). DIC deviance information criterion LPML Log Pseudo Marginal Likelihood (LPML) criterion
Taban Baghfalaki [email protected], Mojtaba Ganjali [email protected]
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)
Run a Gibbs sampler for hurdle models. The package includes the hurdle generalized linear model under Gaussian, exponential, Gamma, Weibull, inverse Gaussian, Poisson, negative binomial, logarithmic, logistic, and binomial distributional assumptions. The package also considers hurdle generalized Poisson models and hurdle Beta regression models. For model comparison, Deviance Information Criterion (DIC) and Log Pseudo Marginal Likelihood (LPML) are presented.
Taban Baghfalaki [email protected], Mojtaba Ganjali [email protected], Narayanaswamy Balakrishnan [email protected]
Ganjali, M., Baghfalaki, T. & Balakrishnan, N. (2024). A Unified Bayesian approach for Modeling Zero-Inflated count and continuous outcomes.
Useful links:
Fits zero-inflated hurdle regression models
ZIHR( modelY, modelZ, data, n.chains = n.chains, n.iter = n.iter, n.burnin = n.burnin, n.thin = n.thin, family = "Gaussian" )
ZIHR( modelY, modelZ, data, n.chains = n.chains, n.iter = n.iter, n.burnin = n.burnin, n.thin = n.thin, family = "Gaussian" )
modelY |
a formula for the mean of the count response. This argument is identical to the one in the "glm" function. |
modelZ |
a formula for the probability of zero. This argument is identical to the one in the "glm" function. |
data |
data set of observed variables. |
n.chains |
the number of parallel chains for the model; default is 1. |
n.iter |
integer specifying the total number of iterations; default is 1000. |
n.burnin |
integer specifying how many of n.iter to discard as burn-in ; default is 5000. |
n.thin |
integer specifying the thinning of the chains; default is 1. |
family |
Family objects streamline the specification of model details for functions like glm. They cover various distributions like "Gaussian", "Exponential", "Weibull", "Gamma", "Beta", "inverse.gaussian", "Poisson", "NB", "Logarithmic", "Bell", "GP", and "Binomial". Specifically, "NB" and "GP" are tailored for hurdle negative binomial and hurdle generalized Poisson models, respectively, while the others are utilized for the corresponding models based on their names. |
A function utilizing the 'JAGS' software to estimate the linear hurdle regression model.
MCMC chains for the unknown parameters
Est list of posterior mean for each parameter
SD list of standard error for each parameter
L_CI list of 2.5th percentiles of the posterior distribution serves as the lower bound of the Bayesian credible interval
U_CI list of 97.5th percentiles of the posterior distribution serves as the lower bound of the Bayesian credible interval
Rhat Gelman and Rubin diagnostic for all parameter
beta the regression coefficients of mean of the hurdle model
alpha the regression coefficients of probability of the hurdle model
The variance, over-dispersion, dispersion, or scale parameters of models depend on the family used
DIC deviance information criterion
LPML Log Pseudo Marginal Likelihood (LPML) criterion
Taban Baghfalaki [email protected], Mojtaba Ganjali [email protected]
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)
# Example 1 data(dataD) index <- 1:(dim(dataD)[1]) IND_new <- sample(index, .5 * length(index)) datat <- dataD[IND_new, ] datav <- dataD[-IND_new, ] modelY <- y~x1 + x2 modelZ <- z~x1 D1 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Poisson" ) SummaryZIHR(D1) Prediction(D1, data = datav) D2 <- ZIHR(modelY, modelZ, data = datat, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Bell" ) SummaryZIHR(D2) # Example 2 data(dataC) modelY <- y~x1 + x2 modelZ <- z~x1 C <- ZIHR(modelY, modelZ, data = dataC, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gaussian" ) SummaryZIHR(C) Prediction(C, data = datav) # Example 3 data(dataP) modelY <- y~x1 + x2 modelZ <- z~x1 P1 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Exponential" ) SummaryZIHR(P1) P2 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Gamma" ) SummaryZIHR(P2) P3 <- ZIHR(modelY, modelZ, data = dataP, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Weibull" ) SummaryZIHR(P3) # Example B data(dataB) modelY <- y~x1 + x2 modelZ <- z~x1 P <- ZIHR(modelY, modelZ, data = dataB, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "Beta" ) SummaryZIHR(P) # Example C data(dataI) modelY <- y~x1 + x2 modelZ <- z~x1 P4 <- ZIHR(modelY, modelZ, data = dataI, n.chains = 2, n.iter = 1000, n.burnin = 500, n.thin = 1, family = "inverse.gaussian" ) SummaryZIHR(P4)