Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble running code with CovarianceMatrix(sqrt=True) #108

Open
turmeric-blend opened this issue Dec 26, 2020 · 8 comments
Open

Trouble running code with CovarianceMatrix(sqrt=True) #108

turmeric-blend opened this issue Dec 26, 2020 · 8 comments

Comments

@turmeric-blend
Copy link
Contributor

turmeric-blend commented Dec 26, 2020

Merry Christmas @jankrepl

I am having trouble running code with:

cov_layer = CovarianceMatrix(sqrt=True)
...
covmat_sqrt = cov_layer(rets)

I was wondering if this would be an equivalent replacement?:

cov_layer = CovarianceMatrix(sqrt=False)
...
covmat_sqrt = torch.sqrt(cov_layer(rets))

Just in case you were curious, the error I was getting with CovarianceMatrix(sqrt=True):

rets has NAN
warnings.warn('rets has NAN')
....
Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.
....
/deepdow/layers/misc.py", line 162, in compute_sqrt
_, s, v = m.svd()
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 16)

However, when I checked rets before passing into cov_layer with assert torch.all(torch.isnan(rets)), no assert error occured.

@jankrepl
Copy link
Owner

jankrepl commented Dec 26, 2020

Merry Christmas @turmeric-blend!

Thank you for the question!

First of all, matrix square root and elementwise (Hadamard?) square root are two different operations.
The CovarianceMatrix computes the matrix one. Whereas torch.sqrt would compute the elementwise one.

The reason why in deepdow we care about the matrix square root is because of the cvxpylayers based allocation layers. They expect you to feed the matrix square root of the covariance matrix. Why?

img-7c8c09b4f6f273c1

where S is the matrix square root of the covariance matrix. It is only the last expression (created via cvxpy.sum_squares) that leads to a discipled parametrized program (see paper).

With that in mind, deepdow has a convenience layer CovarianceMatrix that can do 2 unrelated things

  1. Estimation of a covariance matrix from data (input tensor)
  2. Numerically computing a matrix square root (using singular value decomposition). As noted above on the wiki page, one needs to make sure that the input matrix is positive semidefinite otherwise the matrix square root is not guaranteed to exist and to be unique.

If one constructs CovarianceMatrix(sqrt=False) only step 1 is run during a forward pass. If one constructs CovarianceMatrix(sqrt=True) then step 1 is run and then the step 2 is run. Unfortunately, the step 2 might fail when the assumption (PSD) is not met (or close to not being met) and that is what most likely happened in your case. I might be wrong though. See https://deepdow.readthedocs.io/en/latest/source/layers.html#covariancematrix for more details. With that being said, there are ways how to enforce (in most cases) the PSD property

  • Have more samples than variables (assets)
  • Try different shrinkage strategies in step 1
  • Do not compute the square root at all (skip step 2) - this totally depends on your application and whether the input tensors you feed to CovarianceMatrix are just some abstract hidden features extracted by a neural network or time series of asset returns.

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Dec 27, 2020

Thanks for the detailed explanation!

Have more samples than variables (assets)

  1. By samples, do you mean batch size?

Do not compute the square root at all (skip step 2) - this totally depends on your application and whether the input tensors you feed to CovarianceMatrix are just some abstract hidden features extracted by a neural network or time series of asset returns.

  1. (side note) For this I noticed that BachelierNet is using CovarianceMatrix(sqrt=False) and passes the covmat into NumericalMarkowitz layer, but the NumericalMarkowitz layer description accepts sqrt of covmat instead. Is this a bug since the covmat didn't pass through a neural network?

  2. Otherwise my tensor that I feed into CovarianceMatrix is indeed hidden features from a neural network, which I then pass into the covmat_sqrt argument of the NumericalMarkowitz layer.
    In this case what you are saying is that it doesn't matter if I use CovarianceMatrix(sqrt=False) or CovarianceMatrix(sqrt=True) since what comes out of CovarianceMatrix is also sort of a hidden feature and it could imply a covmat with or without sqrt, or even neither? Am I understanding this right?

@jankrepl
Copy link
Owner

  1. No, I mean the lookback. If you check the implementation there is just a for loop over the batch_size that computes the sample covariance matrix over the lookback dimension.
    return torch.stack([wrapper(self.compute_covariance(x[i].T.clone(),
  2. Fair point! However, just to argue with what you said, there's an instance normalization layer before it with 2 learnable parameters per channel - shift and scale. The covariance matrix should be shift (adding a constant) independent, but multiplying all returns by a constant can have an effect (var(aX + b) = a^2var(X)). In financial terms, this volatility multiplier a can be learned by the network. So one is not guaranteed to feed the original time series of returns into the NumericalMarkowitz.
  3. Yeh, exactly like you described. One can even take a more extreme view on this and see the allocation layers as blackboxes that are capable of returning valid allocation vectors. What happens inside does not matter as long as the validation loss is good:)

@turmeric-blend
Copy link
Contributor Author

Okay that makes sense.

Lastly since we are on the topic of CovarianceMatrix, I have two questions regarding it being implemented in Resample.

  1. Based on previous discussion,

Otherwise my tensor that I feed into CovarianceMatrix is indeed hidden features from a neural network, which I then pass into the covmat_sqrt argument of the NumericalMarkowitz layer.
In this case what you are saying is that it doesn't matter if I use CovarianceMatrix(sqrt=False) or CovarianceMatrix(sqrt=True) since what comes out of CovarianceMatrix is also sort of a hidden feature and it could imply a covmat with or without sqrt, or even neither.

In this case the same thing applies to this part where it doesn't matter if we use sqrt or not if what goes in comes from a neural network?

  1. I noticed that you pass covmat/rets to MultivariateNormal and then compute its covmat again after sampling. I know we need to pass a 'sampled' covmat into NumericalMarkowitz, but I am trying to make sense of double computing covmat and what are the effects/does it matter?

@jankrepl
Copy link
Owner

jankrepl commented Dec 29, 2020

  1. That is true. However, there it seems to be hardcoded that for the NumericalMarkowitz allocator we always compute the matrix square root. I do not really have a strong opinion on this and I am totally open to making things more unified / readable:)
  2. The Resample layer was conceptually motivated by this paper. Is it exactly the algorithm he proposed? I am not sure how exactly he did the sampling in step 1, so that is why I say "motivated". Anyway, the reasoning behind this approach is the following. We suspect that the point estimates of expected returns and covariance matrix are not precise (the parameters of forward). Since the Markowitz optimization is really sensitive to the input parameters we try to make the weight allocation more "robust". How? We use the point estimates to define a probability distribution that we can sample from as many times as we want. We basically reestimate the covmat and expected returns n_portfolios times. Note that each of these estimates will most likely be different from the original point estimates (the parameters of forward). Therefore each of the portfolio allocations will be different. Finally, we average the portfolio allocations and hope that this final allocation will be more robust.

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Jan 4, 2021

hi @jankrepl , is there a rule of thumb for this? Like what is the typical ratio of lookback to assets.

Have more samples than variables (assets)

@jankrepl
Copy link
Owner

jankrepl commented Jan 4, 2021

hi @jankrepl , is there a rule of thumb for this? Like what is the typical ratio of lookback to assets.

Have more samples than variables (assets)

I guess the main goal is to make sure that the covariance matrix is positive definite (this way the standard Markowitz optimization problem has a unique solution) https://quant.stackexchange.com/questions/57331/why-does-portfolio-optimization-require-a-positive-definite-covariance-matrix

However, if lookback < n_assets then the sample covariance matrix will be by construction singular https://stats.stackexchange.com/questions/60622/why-is-a-sample-covariance-matrix-singular-when-sample-size-is-less-than-number. This implies that it has a 0 eigenvalue -> cannot be positive definite. Another implication is that one cannot invert this matrix.

I guess I would just use shrinkage or make sure lookback > n_assets (not a sufficient condition for a PD sample covmat) and hope the resulting covariance matrix is nice enough.

@turmeric-blend
Copy link
Contributor Author

turmeric-blend commented Feb 19, 2021

Hi @jankrepl , I have stumbled upon a promising covariance estimation method that would do better when lookback < n_assets compared to the current sample covariance method, called T-CorEx. Compared to other SOTA covariance estimation methods, T-CorEx has

linear stepwise computational complexity

as metioned by the paper.

I think this would be a good addition to the deepdow library as one component of deepdow is to have a model capable of estimating returns and covariance, and in the current era, this model will most likely employ some sort of deep learning model. However, deep learning models are notorious for being data hungry, where the more data samples available, the better it learns. In a time series sense, as we reserve more data for training, naturally the lookback length would be limited and relatively small compared to n_assets. To put in perspective, for roughly 250 trading days a year for 20 years gives 5000 total data points (days). The S&P 500 already has 500 assets, to have a positive definite matrix (if we use sample covariance method) we require lookback > 500, say 600 (ideally more). That leaves 4400 data points which is not a lot in a deep learning sense. Most of the time, lookback < n_assets would be the case.

One caveat is that since my knowledge is limited, I'm not sure how it helps with the positive definite properly (the authors use time-averaged negative log-likelihood as a measure of performance), maybe you could provide insights into this.

The code is here, and it would be nice to have it in the style of a deepdow layer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants