
I'm looking to use PCA to inform my yield curve trading. Using daily changes in yield I was able to find 2 or 3 principal components that explain 99% of variance. That matches what I read about PC1 being parallel yield curve moves and PC2 being slope.
I then broke the data up into 6 onemonth periods, hoping to see that the PCs stayed fairly constant over time. Looking only at PC1, I have 5 that are similar and one that is similar but negative. What does that mean?




 
goldorak


Total Posts: 1000 
Joined: Nov 2004 


> Otherwise you risk running into a bug
Distant, but vivid memories.

If you are not living on the edge you are taking up too much space. 


briant57


Total Posts: 98 
Joined: Feb 2009 


Nice to see even more ppl trying to trade brazil rate futures........ 



Thanks ExpressoLover, I like the first term is positive convention.
There are several ways to produce the set of daily changes. If I switch from exchange Settlements to some sort of VWAPed trades over the last minute and the percentage of the variation that is explained by the first 3 principal components goes up does that mean my new approach is better?
What about if my new data set causes more variation to be explained by PC1 and less by PC3. What can I tell from that?





As long as the correlation matrix is the same*, then the PC loadings and percent of variance explained will remain the same. So if you're getting significantly** different loadings for different timeframes or price measures, that can only be due to the Epps effect. If the comovement of the prices were perfectly synchronized, then PCs would be the same at every horizon and measure.
To that extent the best measure to use is probably the one that maximizes percent of variance explained. You want to be using PCs derived from the "true" correlation matrix, i.e. how the instruments would comove absent any trading frictions. Asynchronous trading and discretization almost always decrease comovement. So if one measure has stronger PCs, that's usually the measure least influenced by friction.
In the end though, I wouldn't sweat the choice too much. It's a pretty noisy measure, so minor methodology decisions are unlikely to have significant impact on the end trading model.
*This assumes that you're using correlation based PCA (e.g. princomp(..., cor=TRUE)). If you're using covariance, then it's still true as long as A = yB, where A and B are the respective covariance matrices, and y is a scalar. Either way, with asset returns, if two measures have the same correlation matrix, then this holds true for their covariance matrices, except under very weird circumstances.
** Since returns are noisy, the sampled correlation matrix is never going to exactly equal the true correlation matrix from that measure. So even if two measures are processes with equal correlation, their samples are never going to have exactly the same PCs. You can either do a whole bunch of painintheass random matrix math to test the null hypothesis of whether two PCAs are different. Or you can use sensible judgement and see if the differences are minor enough to probably not be due to any substantial difference. I'd recommend the latter. 



I've screwed around a bit with PCA and RMT more than anyone should, probably and I don't think the "first term positive" convention is too good.
As EspressoLover said, the issue is that the sign of PCs is arbitrary. For example, R and Matlab can give you different signs for PCs on the same matrix. Or a slight perturbation in the matrix can flip the signs of some PCs.
Now, let's say you have time series PC(i,t), i is the ith PC, t is time. You may want to select the sign of each PC(i,t) such that correl(PC(i,t),PC(i,t1))>0
Typically, for the first component, you can get ~90% correlation, i.e, your first PC stays almost the same through time. It is very easy to see if PC(i,t) "stays the same" or "had its sign flipped", as in your example.
As i increases, correl >0. Your 10th PC is just noise pointing in some random direction... The correlation therefore gives you a clue about how many PCs are nonrandom and useful.

"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016) 

