Forums  > Trading  > Pca-based portfolio properties and trading implications  
Page 1 of 1
Display using:  


Total Posts: 15
Joined: Feb 2018
Posted: 2018-06-23 09:34
It's known that if we deal with portfolios based on the sorts then the covariance matrix of history of those sorts resembles special kind of Toeplitz matrix: Kac-Murdock-Szego. Structure of eigen vectors of those matrices is known, level-slope-curvature factors follow from the KMS structure. So KMS hints that history of sorted-based portfolio has special properties

It is interesting to think about trading implications to the equities as wide cross-section allows to construct sorted portfolios based on a wide variety of principles and the form of eigenvectors could hint something about trading principles

First eigenvector could be interpreted as market, second - hints to create a long-short portfolio

Third one has some resemblance to the following:
-construction of the portfolio based on the illiquidity factor
-principle of building portfolio based on the stability of the score that has forecasting power
-if one treats the middle of the forecasting score as the noise then as portfolio based on the middle of the forecasting score has positive correlation with the portfolio based on the full original score - we can 'subtract' that noise either by going to the tails or by neutralizing original portfolio to the noise component hence able to get improved information ratio

As to the fourth component and further - they have oscillating structure so we can try to build portfolios based on the relative-value principle within the original sorted structure and get mean-reverting signal

It would be great to uncover and understand the other interpretations of the eigenvectors of the portfolios based on sorts and possibly get the other trading hints)))


Total Posts: 1013
Joined: May 2004
Posted: 2018-06-25 12:25
It is not entirely clear to me what you are trying to do, but I have looked at PCA-based portfolios in another context (i.e. not equities).

It is important to decide if you want to do your PCA on the covariance or correlation matrix, how much data to use and at what frequency - make sure data is synchronous, or you are working with rubbish. I preferred correlation because it is then easier to use some basic Random Matrix Theory to figure out how many eigenvalues are significant. You'll find that only a handful are, the rest is noise. I found some trending behavior in the eigenvectors associated with "big" eigenvalues, and mean-reversion in the "small" ones. Transaction costs killed that one though, as these portfolios have very high turnover. They are noise, after all. I am pretty sure there is much more one could do, like, possibly eigenvector returns are more predictable than individual stock returns.


"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016)


Total Posts: 15
Joined: Feb 2018
Posted: 2018-06-26 20:59
I'm trying to get the interpretations of the structure of eigenvectors that will lead (could be not that straightforwardly) to the construction of portfolios having alpha. For example, if we have 1000 assets in the universe and split them into deciles according to some score then take loadings for the 10th component - then their oscillating structure could hint that instead of trading 500 long/500 short we could attempt to trade 100 vs 100 in the first 200 and so on. If the score is just several days price-momentum, then it possibly doesn't lead to anything having value, but if suppose that we melt that short-term momentum with some news sentiment data then trading long-shorts in the small groups apparently does the neutralization hence we are betting on the clean new information.


Total Posts: 315
Joined: Jan 2015
Posted: 2018-06-26 22:56
What you want is a factor model. You want to isolate out the variance into orthogonal sub-components. No matter how you approach this, you're going to divide this into common factors and single-stock idiosyncratic variance. Then use those orthogonally deconstructed returns for signal and risk models.

The PCA approach works and produces something usable. Like @NeroTulip said, the most difficult part of the problem is deciding how many N eigenvalues to retain. You can either do this through random matrix theory or cross-validation.

However PCA falls well short of the gold standard. From a Bayesian perspective you're making a few major assumptions by using PCA. One is that you know nothing about the stocks themselves or how they correlate. This throws out a lot of information. Obviously we have priors about certain stocks being more likely to cluster with other stocks. The second is that you're assuming non-sparsity. The common factors are not penalized for non-zero weights. This makes you very likely to throw out important factors like industry grouping. Another is that volatility and correlation is constant across the fitted periods, i.e. no heteroskedacity.

Rather then reinventing the wheel, I suggest you just use an off-the-shelf commercial factor model that's already ready to go. These models already incorporate market beta, country and regional factors, style factors (like momentum or value), and industry factors among other things. The work is based off decades of research in academic finance, and incorporates much more information then you'll get from a black-boxed approach like PCA. Barra is one of the major provider, and it's worth exploring their options if you have the funds. For example here's their brochure for their US equity factor model.

Good questions outrank easy answers. -Paul Samuelson


Total Posts: 15
Joined: Feb 2018
Posted: 2018-06-27 06:15
Not reinventing the wheel, I'm familiar with factors/RMT/shrinkage...
Just trying to understand if there's something else left to use


Total Posts: 1041
Joined: Nov 2004
Posted: 2018-06-27 06:26
> The work is based off decades of research in academic finance,...

Well, that is the issue with this path. Pure overfit.

If you are not living on the edge you are taking up too much space.


Total Posts: 15
Joined: Feb 2018
Posted: 2018-06-27 06:30
goldorak, could you explain?


Total Posts: 125
Joined: Jul 2013
Posted: 2018-06-27 18:54
Past does not always explain the future, but with factor models especially in the linear world the issues come out when you really need it to work the most. Reason why it is important to reduce to the minimum the amount of guessing a statistical model makes by for example sampling stocks by groups, sectors. The idea is in identifying strong and stable relationships that are robust to mispecification out of sample.
I dont know about vendors but I guess you can fairly well build baskets or hedging portfolios by exploring each factor and the relationship with exogenous variables to make sure the hedge does not break when you need it.

"amicus Plato sed magis amica Veritas"


Total Posts: 315
Joined: Jan 2015
Posted: 2018-06-28 00:10

Ah, I see. If the intention is speculative research, let me suggest a more interesting approach. Maybe, first start with a Barra-like factor model. Then strip out factors to get idiosyncratic returns, and run try the PCA approach on that dataset. It would be enlightening to see, what if anything isn't already being captured by Barra. Plus, you're not wasting statistical power explaining effects that you already know exist, e.g. market beta, HML, SMB, dollar exposure, etc.

I'd also suggest trying some of the various flavors of sparse PCA. The problem with regular PCA is that you're biased towards market-spanning factors. Because of the inherent noise in 1000+ columns, you rarely can use anymore than a handful of recovered factors. But with sparse PCA, by limiting to smaller subsets, you're more likely to recover a larger number of stable factors.

For example, oil exposure is a good example. 90% of US stocks have de minims exposure to oil prices , outside the general impact on market beta. But oil drillers stocks have strong positive correlation, while airlines have strong negative correlation. It's unlikely that this would be anywhere near the top of your eigenvectors, because it's getting penalized for not explaining any variance in the majority of names. So most likely it would get overwhelmed by the noisy eigenvalues and cut off by your threshold.

Good questions outrank easy answers. -Paul Samuelson


Total Posts: 15
Joined: Feb 2018
Posted: 2018-06-28 07:32
thanks, that is reasonable

>Then strip out factors to get idiosyncratic returns, and run try the PCA approach >on that dataset

thanks for pointing this out, I expect this approach will reveal rather clean signals with high IR and turnover which are factor-neutral flavors of the cross-sectional reversion and momentum factors depending on the horizons. If this is what you meant then it is also close to the idea of using residual momentum/reversion (as for example in the article by Blitz - Residual Momentum)

>I'd also suggest trying some of the various flavors of sparse PCA
yeah, that is a good point too, thanks
Previous Thread :: Next Thread 
Page 1 of 1