Forums  > Basics  > PCA for yield curve basics  
Page 1 of 1
Display using:  


Total Posts: 116
Joined: Nov 2005
Posted: 2016-09-15 23:30
I'm looking to use PCA to inform my yield curve trading. Using daily changes in yield I was able to find 2 or 3 principal components that explain 99% of variance. That matches what I read about PC1 being parallel yield curve moves and PC2 being slope.

I then broke the data up into 6 one-month periods, hoping to see that the PCs stayed fairly constant over time. Looking only at PC1, I have 5 that are similar and one that is similar but negative. What does that mean?


Total Posts: 202
Joined: Jan 2015
Posted: 2016-09-16 00:08
Doesn't mean anything. PCA sign is completely arbitrarily, and flipping it has no functional effect. Whatever your software picks is just an esoteric artifact of the linear algebra internals.

My only advice if you end up using PCA regularly, is to enforce some sort of consistency convention. Otherwise you risk running into a bug, where you upload new production PCs for the month, but the sign has flipped, and all your models are running backwards. Codifying the first term as always positive for any PC isn't a bad convention.

"The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R."


Total Posts: 979
Joined: Nov 2004
Posted: 2016-09-16 06:48
> Otherwise you risk running into a bug

Distant, but vivid memories. Head against Wall

If you are not living on the edge you are taking up too much space.


Total Posts: 98
Joined: Feb 2009
Posted: 2016-09-16 14:42
Nice to see even more ppl trying to trade brazil rate futures........


Total Posts: 116
Joined: Nov 2005
Posted: 2016-09-21 21:06
Thanks ExpressoLover, I like the first term is positive convention.

There are several ways to produce the set of daily changes. If I switch from exchange Settlements to some sort of VWAPed trades over the last minute and the percentage of the variation that is explained by the first 3 principal components goes up does that mean my new approach is better?

What about if my new data set causes more variation to be explained by PC1 and less by PC3. What can I tell from that?


Total Posts: 202
Joined: Jan 2015
Posted: 2016-09-22 23:34
As long as the correlation matrix is the same*, then the PC loadings and percent of variance explained will remain the same. So if you're getting significantly** different loadings for different time-frames or price measures, that can only be due to the Epps effect. If the co-movement of the prices were perfectly synchronized, then PCs would be the same at every horizon and measure.

To that extent the best measure to use is probably the one that maximizes percent of variance explained. You want to be using PCs derived from the "true" correlation matrix, i.e. how the instruments would co-move absent any trading frictions. Asynchronous trading and discretization almost always decrease co-movement. So if one measure has stronger PCs, that's usually the measure least influenced by friction.

In the end though, I wouldn't sweat the choice too much. It's a pretty noisy measure, so minor methodology decisions are unlikely to have significant impact on the end trading model.

*This assumes that you're using correlation based PCA (e.g. princomp(..., cor=TRUE)). If you're using co-variance, then it's still true as long as A = yB, where A and B are the respective co-variance matrices, and y is a scalar. Either way, with asset returns, if two measures have the same correlation matrix, then this holds true for their co-variance matrices, except under very weird circumstances.

** Since returns are noisy, the sampled correlation matrix is never going to exactly equal the true correlation matrix from that measure. So even if two measures are processes with equal correlation, their samples are never going to have exactly the same PCs. You can either do a whole bunch of pain-in-the-ass random matrix math to test the null hypothesis of whether two PCAs are different. Or you can use sensible judgement and see if the differences are minor enough to probably not be due to any substantial difference. I'd recommend the latter.


Total Posts: 991
Joined: May 2004
Posted: 2016-09-23 05:35
I've screwed around a bit with PCA and RMT -more than anyone should, probably- and I don't think the "first term positive" convention is too good.

As EspressoLover said, the issue is that the sign of PCs is arbitrary. For example, R and Matlab can give you different signs for PCs on the same matrix. Or a slight perturbation in the matrix can flip the signs of some PCs.

Now, let's say you have time series PC(i,t), i is the ith PC, t is time. You may want to select the sign of each PC(i,t) such that correl(PC(i,t),PC(i,t-1))>0

Typically, for the first component, you can get ~90% correlation, i.e, your first PC stays almost the same through time. It is very easy to see if PC(i,t) "stays the same" or "had its sign flipped", as in your example.

As i increases, correl ->0. Your 10th PC is just noise pointing in some random direction... The correlation therefore gives you a clue about how many PCs are non-random and useful.

"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016)
Previous Thread :: Next Thread 
Page 1 of 1