Create a mosaic plot from a contingency table.
It allows to visualize multivariate categorical data in a rigorous and informative way.
Parameters: | data : dict, pandas.Series, np.ndarray, pandas.DataFrame
index: list, optional :
ax : matplotlib.Axes, optional
horizontal : bool, optional (default True)
gap : float or array of floats
labelizer : function (key) -> string, optional
properties : function (key) -> dict, optional
statistic: bool, optional (default False) :
title: string, optional :
axes_label: boolean, optional :
label_rotation: float or list of float :
|
---|---|
Returns: | fig : matplotlib.Figure
rects : dict
|
See also
Examples
The most simple use case is to take a dictionary and plot the result
>>> data = {'a': 10, 'b': 15, 'c': 16}
>>> mosaic(data, title='basic dictionary')
>>> pylab.show()
A more useful example is given by a dictionary with multiple indices. In this case we use a wider gap to a better visual separation of the resulting plot
>>> data = {('a', 'b'): 1, ('a', 'c'): 2, ('d', 'b'): 3, ('d', 'c'): 4}
>>> mosaic(data, gap=0.05, title='complete dictionary')
>>> pylab.show()
The same data can be given as a simple or hierarchical indexed Series
>>> rand = np.random.random
>>> from itertools import product
>>>
>>> tuples = list(product(['bar', 'baz', 'foo', 'qux'], ['one', 'two']))
>>> index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
>>> data = pd.Series(rand(8), index=index)
>>> mosaic(data, title='hierarchical index series')
>>> pylab.show()
The third accepted data structureis the np array, for which a very simple index will be created.
>>> rand = np.random.random
>>> data = 1+rand((2,2))
>>> mosaic(data, title='random non-labeled array')
>>> pylab.show()
If you need to modify the labeling and the coloring you can give a function tocreate the labels and one with the graphical properties starting from the key tuple
>>> data = {'a': 10, 'b': 15, 'c': 16}
>>> props = lambda key: {'color': 'r' if 'a' in key else 'gray'}
>>> labelizer = lambda k: {('a',): 'first', ('b',): 'second', ('c',): 'third'}[k]
>>> mosaic(data, title='colored dictionary', properties=props, labelizer=labelizer)
>>> pylab.show()
Using a DataFrame as source, specifying the name of the columns of interest >>> gender = [‘male’, ‘male’, ‘male’, ‘female’, ‘female’, ‘female’] >>> pet = [‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’] >>> data = pandas.DataFrame({‘gender’: gender, ‘pet’: pet}) >>> mosaic(data, [‘pet’, ‘gender’]) >>> pylab.show()