Estimation of marginal regression models using Generalized Estimating Equations (GEE).
Marginal regression model fit using Generalized Estimating Equations.
GEE can be used to fit Generalized Linear Models (GLMs) when the data have a grouped structure, and the observations are possibly correlated within groups but not between groups.
Parameters: | endog : array-like
exog : array-like
groups : array-like
time : array-like
family : family class instance
cov_struct : CovStruct class instance
offset : array-like
dep_data : array-like
constraint : (ndarray, ndarray)
update_dep : bool
weights : array-like
missing : str
|
---|
Notes
Only the following combinations make sense for family and link
+ ident log logit probit cloglog pow opow nbinom loglog logc
Gaussian | x x x
inv Gaussian | x x x
binomial | x x x x x x x x x
Poission | x x x
neg binomial | x x x x
gamma | x x x
Not all of these link functions are currently available.
Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.
The “robust” covariance type is the standard “sandwich estimator” (e.g. Liang and Zeger (1986)). It is the default here and in most other packages. The “naive” estimator gives smaller standard errors, but is only correct if the working correlation structure is correctly specified. The “bias reduced” estimator of Mancl and DeRouen (Biometrics, 2001) reduces the downard bias of the robust estimator.
The robust covariance provided here follows Liang and Zeger (1986) and agrees with R’s gee implementation. To obtain the robust standard errors reported in Stata, multiply by sqrt(N / (N - g)), where N is the total sample size, and g is the average group size.
Examples
Logistic regression with autoregressive working dependence:
>>> import statsmodels.api as sm
>>> family = sm.families.Binomial()
>>> va = sm.cov_struct.Autoregressive()
>>> model = sm.GEE(endog, exog, group, family=family, cov_struct=va)
>>> result = model.fit()
>>> print(result.summary())
Use formulas to fit a Poisson GLM with independent working dependence:
>>> import statsmodels.api as sm
>>> fam = sm.families.Poisson()
>>> ind = sm.cov_struct.Independence()
>>> model = sm.GEE.from_formula("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
>>> result = model.fit()
>>> print(result.summary())
Equivalent, using the formula API:
>>> import statsmodels.api as sm
>>> import statsmodels.formula.api as smf
>>> fam = sm.families.Poisson()
>>> ind = sm.cov_struct.Independence()
>>> model = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
>>> result = model.fit()
>>> print(result.summary())
Attributes
cached_means |
Methods
cluster_list(array) | Returns array split into subarrays corresponding to the cluster structure. |
estimate_scale() | Returns an estimate of the scale parameter at the current parameter value. |
fit([maxiter, ctol, start_params, ...]) | Fits a marginal regression model using generalized estimating equations (GEE). |
from_formula(formula, groups, data[, ...]) | |
mean_deriv(exog, lin_pred) | Derivative of the expected endog with respect to the parameters. |
mean_deriv_exog(exog, params[, offset_exposure]) | Derivative of the expected endog with respect to exog. |
predict(params[, exog, offset, exposure, linear]) | Return predicted values for a marginal regression model fit using GEE. |
update_cached_means(mean_params) | cached_means should always contain the most recent calculation |
Methods
cluster_list(array) | Returns array split into subarrays corresponding to the cluster structure. |
estimate_scale() | Returns an estimate of the scale parameter at the current parameter value. |
fit([maxiter, ctol, start_params, ...]) | Fits a marginal regression model using generalized estimating equations (GEE). |
from_formula(formula, groups, data[, ...]) | |
mean_deriv(exog, lin_pred) | Derivative of the expected endog with respect to the parameters. |
mean_deriv_exog(exog, params[, offset_exposure]) | Derivative of the expected endog with respect to exog. |
predict(params[, exog, offset, exposure, linear]) | Return predicted values for a marginal regression model fit using GEE. |
update_cached_means(mean_params) | cached_means should always contain the most recent calculation |
Attributes
cached_means | |
endog_names | Names of endogenous variables |
exog_names | Names of exogenous variables |