Find the nearest correlation matrix with factor structure to a given square matrix.
Parameters: | corr : square array
rank : positive integer
ctol : positive real
lam_min : float
lam_max : float
maxiter : integer
|
---|---|
Returns: | rslt : Bunch
|
Notes
A correlation matrix has factor structure if it can be written in the form I + XX’ - diag(XX’), where X is n x k with linearly independent columns, and with each row having sum of squares at most equal to 1. The approximation is made in terms of the Frobenius norm.
This routine is useful when one has an approximate correlation matrix that is not positive semidefinite, and there is need to estimate the inverse, square root, or inverse square root of the population correlation matrix. The factor structure allows these tasks to be done without constructing any n x n matrices.
This is a non-convex problem with no known gauranteed globally convergent algorithm for computing the solution. Borsdof, Higham and Raydan (2010) compared several methods for this problem and found the spectral projected gradient (SPG) method (used here) to perform best.
The input matrix corr can be a dense numpy array or any scipy sparse matrix. The latter is useful if the input matrix is obtained by thresholding a very large sample correlation matrix. If corr is sparse, the calculations are optimized to save memory, so no working matrix with more than 10^6 elements is constructed.
References
R Borsdof, N Higham, M Raydan (2010). Computing a nearest correlation matrix with factor structure. SIAM J Matrix Anal Appl, 31:5, 2603-2622. http://eprints.ma.man.ac.uk/1523/01/covered/MIMS_ep2009_87.pdf
Examples
Hard thresholding a correlation matrix may result in a matrix that is not positive semidefinite. We can approximate a hard thresholded correlation matrix with a PSD matrix as follows, where corr is the input correlation matrix.
>>> import numpy as np
>>> from statsmodels.stats.correlation_tools import corr_nearest_factor
>>> np.random.seed(1234)
>>> b = 1.5 - np.random.rand(10, 1)
>>> x = np.random.randn(100,1).dot(b.T) + np.random.randn(100,10)
>>> corr = np.corrcoef(x.T)
>>> corr = corr * (np.abs(corr) >= 0.3)
>>> rslt = corr_nearest_factor(corr, 3)