Logo

scikits.statsmodels.sandbox.stats.runs.runstest_2samp

scikits.statsmodels.sandbox.stats.runs.runstest_2samp(x, y=None, groups=None)[source]

Wald-Wolfowitz runstest for two samples

This tests whether two samples come from the same distribution.

Parameters :

x : array_like

data, numeric, contains either one group, if y is also given, or both groups, if additionally a group indicator is provided

y : array_like (optional)

data, numeric

groups : array_like

group labels or indicator the data for both groups is given in a single 1-dimensional array, x. If group labels are not [0,1], then

groups : {‘mean’, ‘median’} or number

This specifies the cutoff to split the data into large and small values. This

Returns :

z_stat : float

test statistic, asymptotically normally distributed

p-value : float

p-value, reject the null hypothesis if it is below an type 1 error level, alpha .

See also

runs_test_1samp, Runs, RunsProb

Notes

Wald-Wolfowitz runs test.

If there are ties, then then the test statistic and p-value that is reported, is based on the higher p-value between sorting all tied observations of the same group

This test is intended for continuous distributions SAS has treatment for ties, but not clear, and sounds more complicated (minimum and maximum possible runs prvent use of argsort) (maybe it’s not so difficult, idea: add small positive noise to first one, run test, then to the other, run test, take max(?) p-value - DONE This gives not the minimum and maximum of the number of runs, but should be close. Not true, this is close to minimum but far away from maximum. maximum number of runs would use alternating groups in the ties.) Maybe adding random noise would be the better approach.

SAS has exact distribution for sample size <=30, doesn’t look standard but should be easy to add.

currently two-sided test only

Previous topic

sm.sandbox.stats.runs.runstest_1samp

Next topic

sm.sandbox.stats.runs.cochran_q

This Page