Logo

Statistics stats

Introduction

This section collects various statistical tests and tools. Some can be used independently of any models, some are intended as extension to the models and model results.

API Warning: The functions and objects in this category are spread out in various modules and might still be moved around.

Residual Diagnostics and Specification Tests

durbin_watson(resids) Calculates the Durbin-Watson statistic
jarque_bera(resids) Calculate residual skewness, kurtosis, and do the JB test for normality
omni_normtest(resids[, axis]) Omnibus test for normality
acorr_ljungbox(x[, lags, boxpierce]) Ljung-Box test for no autocorrelation
acorr_lm(x[, maxlag, autolag, store]) Lagrange Multiplier tests for autocorrelation
breaks_cusumolsresid(olsresidual) cusum test for parameter stability based on ols residuals
breaks_hansen(olsresults) test for model stability, breaks in parameters for ols, Hansen 1992
CompareCox Cox Test for non-nested models
CompareJ J-Test for comparing non-nested models
compare_cox Cox Test for non-nested models
compare_j J-Test for comparing non-nested models
het_breushpagan(resid, x[, exog]) Lagrange Multiplier Heteroscedasticity Test by Breush-Pagan
HetGoldfeldQuandt test whether variance is the same in 2 subsamples
het_goldfeldquandt see class docstring
het_goldfeldquandt2(y, x, idx[, split, retres]) test whether variance is the same in 2 subsamples
het_white(y, x[, retres]) Lagrange Multiplier Heteroscedasticity Test by White
unitroot_adf(x[, maxlag, trendorder, ...])
neweywestcov(resid, x) Did not run yet
recursive_olsresiduals(olsresults[, skip, ...]) calculate recursive ols with residuals and cusum test statistic
recursive_olsresiduals2(olsresults, skip) this is my original version based on Greene and references

See also the notes on notes on regression diagnostics

Goodness of Fit Tests and Measures

some tests for goodness of fit for univariate distributions
powerdiscrepancy(o, e[, lambd, axis, ddof]) Calculates power discrepancy, a class of goodness-of-fit tests as a measure of discrepancy between observed and expected data.
gof_chisquare_discrete(distfn, arg, rvs, ...) perform chisquare test for random sample of a discrete distribution
gof_binning_discrete(rvs, distfn, arg[, nsupp]) get bins for chisquare type gof tests for a discrete distribution

Non-Parametric Tests

mcnemar(x, y[, exact, correction]) McNemar test
median_test_ksample(x, groups) chisquare test for equality of median/location
runstest_1samp(x[, cutoff]) use runs test on binary discretized data above/below cutoff
runstest_2samp(x[, y, groups]) Wald-Wolfowitz runstest for two samples
cochran_q(x) Cochran’s Q test for identical effect of k treatments
Runs(x) class for runs in a binary sequence

Multiple Tests and Multiple Comparison Procedures

multipletests is a function for p-value correction, which also includes p-value correction based on fdr in fdrcorrection. tukeyhsd performs simulatenous testing for the comparison of (independent) means. These three functions are verified. GroupsStats and MultiComparison are convenience classes to multiple comparisons similar to one way ANOVA, but still in developement

multipletests(pvals[, alpha, method, ...]) test results and p-value correction for multiple tests
fdrcorrection0(pvals[, alpha, method]) pvalue correction for false discovery rate
tukeyhsd(mean_all, nobs_all, var_all[, df, ...]) simultaneous Tukey HSD
GroupsStats(x[, useranks, uni, intlab]) statistics by groups (another version)
MultiComparison(x, groups) Tests for multiple comparisons

The following functions are not (yet) public (here for my own benefit, JP)

varcorrection_pairs_unbalanced(nobs_all[, ...]) correction factor for variance with unequal sample sizes for all pairs
varcorrection_pairs_unequal(var_all, ...) return joint variance from samples with unequal variances and unequal
varcorrection_unbalanced(nobs_all[, srange]) correction factor for variance with unequal sample sizes
varcorrection_unequal(var_all, nobs_all, df_all) return joint variance from samples with unequal variances and unequal
StepDown(vals, nobs_all, var_all[, df]) a class for step down methods
catstack(args)
ccols
compare_ordered(vals, alpha) simple ordered sequential comparison of means
distance_st_range(mean_all, nobs_all, var_all) pairwise distance matrix, outsourced from tukeyhsd
ecdf(x) no frills empirical cdf used in fdrcorrection
get_tukeyQcrit(k, df[, alpha]) return critical values for Tukey’s HSD (Q)
homogeneous_subsets(vals, dcrit) recursively check all pairs of vals for minimum distance
line str(object) -> string
maxzero(x) find all up zero crossings and return the index of the highest
maxzerodown(x) find all up zero crossings and return the index of the highest
mcfdr([nrepl, nobs, ntests, ntrue, mu, ...]) MonteCarlo to test fdrcorrection
qcrit str(object) -> string
randmvn(rho[, size, standardize]) create random draws from equi-correlated multivariate normal distribution
rankdata(x) rankdata, equivalent to scipy.stats.rankdata
rejectionline(n[, alpha]) reference line for rejection in multiple tests
set_partition(ssli) extract a partition from a list of tuples
set_remove_subs(ssli) remove sets that are subsets of another set from a list of tuples
tiecorrect(xranks) should be equivalent of scipy.stats.tiecorrect

Basic Statistics and t-Tests with frequency weights

CompareMeans(d1, d2) temporary just to hold formulas
DescrStatsW(data[, weights, ddof]) descriptive statistics with weights for simple case
tstat_generic(value, value2, std_diff, dof, ...) generic ttest to save typing

Moment Helpers

These are utility functions to convert between central and non-central moments, skew, kurtosis and cummulants.

cum2mc(_kappa) convert non-central moments to cumulants
mc2mnc(_mc) convert central to non-central moments, uses recursive formula
mc2mvsk(args) convert central moments to mean, variance, skew, kurtosis
mnc2cum(_mnc) convert non-central moments to cumulants
mnc2mc(args) convert four non-central moments to central moments
mnc2mvsk(args) convert central moments to mean, variance, skew, kurtosis
mvsk2mc(args) convert mean, variance, skew, kurtosis to central moments
mvsk2mnc(args) convert mean, variance, skew, kurtosis to non-central moments