Forums  > Trading  > Pca-based portfolio properties and trading implications  
     
Page 1 of 1
Display using:  

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-06-23 09:34
It's known that if we deal with portfolios based on the sorts then the covariance matrix of history of those sorts resembles special kind of Toeplitz matrix: Kac-Murdock-Szego. Structure of eigen vectors of those matrices is known, level-slope-curvature factors follow from the KMS structure. So KMS hints that history of sorted-based portfolio has special properties

It is interesting to think about trading implications to the equities as wide cross-section allows to construct sorted portfolios based on a wide variety of principles and the form of eigenvectors could hint something about trading principles

First eigenvector could be interpreted as market, second - hints to create a long-short portfolio

Third one has some resemblance to the following:
-construction of the portfolio based on the illiquidity factor
-principle of building portfolio based on the stability of the score that has forecasting power
-if one treats the middle of the forecasting score as the noise then as portfolio based on the middle of the forecasting score has positive correlation with the portfolio based on the full original score - we can 'subtract' that noise either by going to the tails or by neutralizing original portfolio to the noise component hence able to get improved information ratio

As to the fourth component and further - they have oscillating structure so we can try to build portfolios based on the relative-value principle within the original sorted structure and get mean-reverting signal

It would be great to uncover and understand the other interpretations of the eigenvectors of the portfolios based on sorts and possibly get the other trading hints)))

NeroTulip


Total Posts: 1013
Joined: May 2004
 
Posted: 2018-06-25 12:25
It is not entirely clear to me what you are trying to do, but I have looked at PCA-based portfolios in another context (i.e. not equities).

It is important to decide if you want to do your PCA on the covariance or correlation matrix, how much data to use and at what frequency - make sure data is synchronous, or you are working with rubbish. I preferred correlation because it is then easier to use some basic Random Matrix Theory to figure out how many eigenvalues are significant. You'll find that only a handful are, the rest is noise. I found some trending behavior in the eigenvectors associated with "big" eigenvalues, and mean-reversion in the "small" ones. Transaction costs killed that one though, as these portfolios have very high turnover. They are noise, after all. I am pretty sure there is much more one could do, like, possibly eigenvector returns are more predictable than individual stock returns.

HTH

"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016)

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-06-26 20:59
I'm trying to get the interpretations of the structure of eigenvectors that will lead (could be not that straightforwardly) to the construction of portfolios having alpha. For example, if we have 1000 assets in the universe and split them into deciles according to some score then take loadings for the 10th component - then their oscillating structure could hint that instead of trading 500 long/500 short we could attempt to trade 100 vs 100 in the first 200 and so on. If the score is just several days price-momentum, then it possibly doesn't lead to anything having value, but if suppose that we melt that short-term momentum with some news sentiment data then trading long-shorts in the small groups apparently does the neutralization hence we are betting on the clean new information.

EspressoLover


Total Posts: 331
Joined: Jan 2015
 
Posted: 2018-06-26 22:56
What you want is a factor model. You want to isolate out the variance into orthogonal sub-components. No matter how you approach this, you're going to divide this into common factors and single-stock idiosyncratic variance. Then use those orthogonally deconstructed returns for signal and risk models.

The PCA approach works and produces something usable. Like @NeroTulip said, the most difficult part of the problem is deciding how many N eigenvalues to retain. You can either do this through random matrix theory or cross-validation.

However PCA falls well short of the gold standard. From a Bayesian perspective you're making a few major assumptions by using PCA. One is that you know nothing about the stocks themselves or how they correlate. This throws out a lot of information. Obviously we have priors about certain stocks being more likely to cluster with other stocks. The second is that you're assuming non-sparsity. The common factors are not penalized for non-zero weights. This makes you very likely to throw out important factors like industry grouping. Another is that volatility and correlation is constant across the fitted periods, i.e. no heteroskedacity.

Rather then reinventing the wheel, I suggest you just use an off-the-shelf commercial factor model that's already ready to go. These models already incorporate market beta, country and regional factors, style factors (like momentum or value), and industry factors among other things. The work is based off decades of research in academic finance, and incorporates much more information then you'll get from a black-boxed approach like PCA. Barra is one of the major provider, and it's worth exploring their options if you have the funds. For example here's their brochure for their US equity factor model.

Good questions outrank easy answers. -Paul Samuelson

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-06-27 06:15
Not reinventing the wheel, I'm familiar with factors/RMT/shrinkage...
Just trying to understand if there's something else left to use

goldorak


Total Posts: 1046
Joined: Nov 2004
 
Posted: 2018-06-27 06:26
> The work is based off decades of research in academic finance,...

Well, that is the issue with this path. Pure overfit.

If you are not living on the edge you are taking up too much space.

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-06-27 06:30
goldorak, could you explain?

rickyvic


Total Posts: 125
Joined: Jul 2013
 
Posted: 2018-06-27 18:54
Past does not always explain the future, but with factor models especially in the linear world the issues come out when you really need it to work the most. Reason why it is important to reduce to the minimum the amount of guessing a statistical model makes by for example sampling stocks by groups, sectors. The idea is in identifying strong and stable relationships that are robust to mispecification out of sample.
I dont know about vendors but I guess you can fairly well build baskets or hedging portfolios by exploring each factor and the relationship with exogenous variables to make sure the hedge does not break when you need it.

"amicus Plato sed magis amica Veritas"

EspressoLover


Total Posts: 331
Joined: Jan 2015
 
Posted: 2018-06-28 00:10
@Zoho

Ah, I see. If the intention is speculative research, let me suggest a more interesting approach. Maybe, first start with a Barra-like factor model. Then strip out factors to get idiosyncratic returns, and run try the PCA approach on that dataset. It would be enlightening to see, what if anything isn't already being captured by Barra. Plus, you're not wasting statistical power explaining effects that you already know exist, e.g. market beta, HML, SMB, dollar exposure, etc.

I'd also suggest trying some of the various flavors of sparse PCA. The problem with regular PCA is that you're biased towards market-spanning factors. Because of the inherent noise in 1000+ columns, you rarely can use anymore than a handful of recovered factors. But with sparse PCA, by limiting to smaller subsets, you're more likely to recover a larger number of stable factors.

For example, oil exposure is a good example. 90% of US stocks have de minims exposure to oil prices , outside the general impact on market beta. But oil drillers stocks have strong positive correlation, while airlines have strong negative correlation. It's unlikely that this would be anywhere near the top of your eigenvectors, because it's getting penalized for not explaining any variance in the majority of names. So most likely it would get overwhelmed by the noisy eigenvalues and cut off by your threshold.

Good questions outrank easy answers. -Paul Samuelson

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-06-28 07:32
@rickyvic
thanks, that is reasonable

@EspressoLover
>Then strip out factors to get idiosyncratic returns, and run try the PCA approach >on that dataset

thanks for pointing this out, I expect this approach will reveal rather clean signals with high IR and turnover which are factor-neutral flavors of the cross-sectional reversion and momentum factors depending on the horizons. If this is what you meant then it is also close to the idea of using residual momentum/reversion (as for example in the article by Blitz - Residual Momentum)

>I'd also suggest trying some of the various flavors of sparse PCA
yeah, that is a good point too, thanks

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-09-04 23:28
As I'm still deeply interested in the topic and really would want to connect with someone who is up to something similar and somehow there's no apparent interest I will try to add some thoughts, sometimes they are not rigorous but the key idea is to get practical results using some kind of analogy which sometimes could be considered as purely theoretical

If we sort stocks based on the whatever forecasting score then eigenvectors of the returns series for the ranked stocks have specific structure: its elements oscillate around zero. So we can interpret this fact the following way: sorted stocks are cointegrated. How one can apply that observation to extract some forecasting power? Lets take some score that is known to posses forecasting power. One can find examples of such scores in the numerous amount of articles. So as the adjacent groups of stocks are going to be cointegrated and we can successfully build long-short portfolios. We can for example take the average value of our score during the long term period, rank according to that score, then get groups based on deciles. Apparently if we succeed in increasing spread between adjacent groups of stocks then we will get better performance of the resulting portfolio. So we can rank stocks within original deciles based on the fresh values of the score. Apparently this ranking within deciles is the same as trying to get zero exposure to the decile group "market". We can attempt to do better if we use portfolio optimization where we specify zero exposure of the resulting portfolio to the portfolio based on long period average score while keeping exposure to the fresh score.

The procedure described above strongly resembles Grinold and Kahn description of estimation of pure factor portfolios returns and also it resembles numerous articles devoted to "surprise in something" scores and their abnormal returns.

Practicle issues with using long period score averaging involves increase of exposure to cross-sectional price momentum. This could be overcome for example by incorporating constraint of zero exposure to momentum into the optimization task.

Hope those ideas would help to bring some interest to the topic. From my point of view it is fascinating to see how purely statistical properties of cross-sectional portfolios could give a rise for the practical ideas.

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-09-04 23:53
Also one can extend the aforementioned ideas and apply improvements to pca like enforcing sparsity like mentioned by EspressoLover. For example one can get the groups of cointegrating stocks and pull residual momentum/reversion using neutralization to the desired factors and for example supervised learning on top of the residuals scores

Zoho


Total Posts: 20
Joined: Feb 2018
 
Posted: 2018-09-18 21:39
implementation of the signals of the "surprise in something"-type that is done with the help of the XTX-style pure-factors could be seen as implementation of the approach by Fernholz who proposed to get smooth information ratio portfolios by hedging frequently rebalanced portfolio with slowly rebalanced portfolio
Previous Thread :: Next Thread 
Page 1 of 1