• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Graph Lasso on sklearn Issue

  • Thread starter Thread starter Skander
  • Start date Start date
Joined
12/25/15
Messages
241
Points
138
Dear All,

I am working on replicating a paper titled “Improving Mean Variance Optimization through Sparse Hedging Restriction”. The authors’ idea is to use Graphical Lasso algorithm to infuse some bias in the estimation process of the inverse of the sample covariance matrix. The graphical lasso algorithm works perfectly fine in R, but when I use python on the same data with the same parameters I get two sorts of errors:

1- If I use coordinate descent (cd ) mode as a solver, I get a floating point error saying that: the matrix is not symmetric positive definite and that the system is too ill-conditioned for this solver. “FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this solver” (The thing that bugs me is that I tried this solver on a simulated Positive definite matrix and It game me this error)

2- If I use the Least Angle Regression (LARS) mode (Which is less stable but recommended for ill-conditioned matrices) I get an Overflow error stating that the integer is too large to be converted to a float “OverflowError: int too large to convert to float”

To my knowledge, unlike C++ and other languages, python is not restricted by an upper maximum for integer numbers (besides the capacity of the machine itself). Whereas the floats are restricted. I think this might be the source of the later problem. (I have also heard in the past that R is much more robust in terms of dealing ill-conditioned matrices). I would be glad to hear you experiences with graph lasso in R or python.

With this email, I have attached a little python code that simulates this problem in a few lines. Any input will be of great appreciation.

Thank you all,
Code:
from sklearn.covariance import graph_lasso
from sklearn.datasets import make_spd_matrix

symetric_PD_mx= make_spd_matrix(100)
glout = graph_lasso(emp_cov=symetric_PD_mx, alpha=0.01,mode="lars")

Skander
 
I'm not familiar with this specific problem but my hunch is the code checks that it is not PD and the deal is off I reckon. What numerical analysts do is to regularize the problem to make it into a well-posed problem. Maybe something similar in this case as well?

http://www.scottsarra.org/math/papers/sarra_RSPD.pdf

You might find some clues here as well.

The Nearest Correlation Matrix

BTW what is the 'quality' of the input data?

On Wiki they mention this(?)
skggm/skggm
 
Last edited:
Hi Professor Duffy,
I have heard about the skggm repo and i am investigating this route as we speak. My input matrix is PD, so there is really no need to use a mapping to the nearest PD here. (Especially in the simulated example, because i simulate the SPD matrix to start with).
Also, the R code uses coordinate descent to solve the problem, whereas graph_lasso in python throw an error saying that the matrix is too ill conditioned (which seems not correct because I am inputting a SPD matrix). I have posted also the issue on the sci-kit learn library, let's see what we can learn from there.

PS: The former paper is interesting

Thank you again,

Skander
 
Dear All,

I am working on replicating a paper titled “Improving Mean Variance Optimization through Sparse Hedging Restriction”. The authors’ idea is to use Graphical Lasso algorithm to infuse some bias in the estimation process of the inverse of the sample covariance matrix. The graphical lasso algorithm works perfectly fine in R, but when I use python on the same data with the same parameters I get two sorts of errors:

1- If I use coordinate descent (cd ) mode as a solver, I get a floating point error saying that: the matrix is not symmetric positive definite and that the system is too ill-conditioned for this solver. “FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this solver” (The thing that bugs me is that I tried this solver on a simulated Positive definite matrix and It game me this error)

2- If I use the Least Angle Regression (LARS) mode (Which is less stable but recommended for ill-conditioned matrices) I get an Overflow error stating that the integer is too large to be converted to a float “OverflowError: int too large to convert to float”

To my knowledge, unlike C++ and other languages, python is not restricted by an upper maximum for integer numbers (besides the capacity of the machine itself). Whereas the floats are restricted. I think this might be the source of the later problem. (I have also heard in the past that R is much more robust in terms of dealing ill-conditioned matrices). I would be glad to hear you experiences with graph lasso in R or python.

With this email, I have attached a little python code that simulates this problem in a few lines. Any input will be of great appreciation.

Thank you all,
Code:
from sklearn.covariance import graph_lasso
from sklearn.datasets import make_spd_matrix

symetric_PD_mx= make_spd_matrix(100)
glout = graph_lasso(emp_cov=symetric_PD_mx, alpha=0.01,mode="lars")

Skander
did you post this question on Stack Overflow and the sklearn mailing list? They are usually very helpful.
 
did you post this question on Stack Overflow and the sklearn mailing list? They are usually very helpful.
did you post this question on Stack Overflow and the sklearn mailing list? They are usually very helpful.
Yes I did, I posted the question on Stack Overflow as well as on the sci-kit library. My post was approved today, so let's see if we get any fruitful comment today:)
 
Even typing the question in Google with append of Stackoverflow tends to give

Graphical lasso numerical problem (not SPD matrix result)
Yes Professor Duffy, I came across this post and the comment mention some pre-processing on the covariance matrix (shrinkage technique such as Ledoit and Wolf's) but I am replicating a paper that compares the results of graph_lasso method to the results of other methods (including shrinkage methods). I feel that it would be some sort of cheating if I pre-process the covariance matrix and then plug it into graph_lasso(). However, it would be nice to compare the results of (graph_lasso() vs. (graph_lasso() + shrinkage as a pre-processing or cleaning method)
 
Back
Top