JTDerp


Total Posts: 66 
Joined: Nov 2013 


I'm pondering a path forward to get a *legitimate* correlation matrix for the P&L returns yielded by several marketstrategy combos. There are two types of issues as I see it:
1) returns are asynchronous (i.e. a function of market session hours, strategy order logic, etc) 2) array lengths are unequal  'series A' might have only 20 data points in a span of, say, 6 months; 'series B' may have 2,000 points in same span. I.e. a verylowfreq strategy versus a relativelyhigher one.
The first idea for resolving these issues is to resample to a common frequency, and truncate the series with the longer overall timeframe, then do the correlation. But that's sacrificing accuracy for sake of simplicity, and then there's some assumption about the moments of the returns distributions.
However, with series A having only 20 records, a correlation calc seems very spurious / unstable. So I figured to separate the several time series (i.e. A & B among others, C/D/E/etc) into groups by some range of sampling frequency (e.g. 110 data points per month, 1150/mo, etc.). Obviously the lower the frequency, the larger a minimum timeframe should be to capture more points. Going further, it seems appropriate to use some process like crossvalidation (Bagging or kfold) to estimate a test error...but trying to do a traintest data set split of correlation values for forecasting seems totally wrong, as correlations in financial time series are notoriously unstable. Maybe something along lines of detrending/normalizing the data, doing cointegration, other?
Clearly I'm not a statistician :)
I've seen a phorum thread from '04 with a similar line of questioning here: https://nuclearphynance.com/Show%20Post.aspx?PostIDKey=11957
I've been playing with the R package 'highfrequency'  section 3.2 of this reference: http://highfrequency.herokuapp.com/index.html covers irregularlyspaced data being aggregated (I take it that's synonymous with resampling).
So the two questions are: 1) what to do about the async & unequal nature? 2) what 'rules of thumb' might be valid for minimum sample size to do the correlations?
P.S. I've dumbeddown the problem statement fair bit e.g. avoided extra references, for sake of brevity. I'll elaborate in a reply if asked 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 



ronin


Total Posts: 585 
Joined: May 2006 


There is a mathematical answer, and there is a real world answer.
In the real world, a strategy that is flat is returning zero. Just plug zeros in the time series. It solves the out of hours problem too  you are presumably looking at close to close, not open to close. The London strategy will have returned something from yesterday's NY close to today's NY close  that's what you use. The only issue is whether there is funding cost when it is not trading.
Mathematically, it's less straightforward.
The unequal sampling is easier. Correlation is just regression. There is no reason you can't regress them when they are unequally sampled. It's one run of solver in excel, or use LINEST(). LINEST should also return the stats like confidence intervals etc.
Asynchronicity is more difficult, fwiw. The basic idea is that you interpolate prices from yesterday's close to today's open. It should really be Brownian bridge interpolation, but good luck estimating the out of hours vols. 
"There is a SIX am?"  Arthur 


nikol


Total Posts: 1124 
Joined: Jun 2005 


Calculate, Eq[X], Eq[Y], Eq[X^2], Eq[Y^2] and Eq[X*Y]
The rest you know. Key is in choice of "q". Simplest  timeweighted EMA.
Or you can define X and Y as brownian (any other?) with poisson sampling and unknown correlation which you can extract with loglikelihood. If time series are shifted (X=NY close and Y=HK open) then you assume some process and correlate X propagated through the night with Y.
Or you can build bivariate (empirical) CDF and hang some (gaussian) copula on that.





JTDerp


Total Posts: 66 
Joined: Nov 2013 


"Just plug zeros in the time series."
> That would of course add downward bias to the correlation, but maybe that's appropriate for such a shifty proposition as this. Will give it a roll.
"It solves the out of hours problem too  you are presumably looking at close to close, not open to close."
> CtC would help with the async issue...the thread that I mentioned in OP which NeroTulip started went into sample calcs in his last post, and 'jungle' posited to pick a time somewhere as your own "fix"...that seems fair when the frequency of trades for each marketstrategy combo has some minimum threshold and up to some max, and a common overlap of session times, but I suppose it's still warranted to separate the superlowfreq strategies from the higherfreq ones and into several subportfolios, as it were.
Thanks for the Excel & LINEST() reference...making confidence intervals of these correlations was the general direction I was looking toward.

"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


gaj


Total Posts: 103 
Joined: Apr 2018 


Playing devil's advocate: strategy A is buy and hold S&P500 ETF in Asia, strategy B is buy and hold S&P500 ETF in the US. The correlation should be 100%, but I'm pretty sure no amount of statistics can give you 100% correlation just by comparing daily returns of S&P500 at Asia close vs US close.
My first instinct is to downsample to a lower frequency, i.e., compare the weekly, monthly, or even quarterly P&Ls. Then the asynchronous part gets negligible compared to the overlapping period. In the case of asynchronous ETFs above, the correlation should get closer and closer to 100% as the downsampled period gets longer. But you need a large amount of data to get a good estimate. 




JTDerp


Total Posts: 66 
Joined: Nov 2013 


"Calculate, Eq[X], Eq[Y], Eq[X^2], Eq[Y^2] and Eq[X*Y]"
Sorry Nikol, I'm dense at the moment...by 'Eq', are you alluding to a spline/curvefit type of parameterization of the returns? 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


nikol


Total Posts: 1124 
Joined: Jun 2005 


No, usual notation for something like Eq[x] = \Int x * Prob(x) * dx
for example, for EMA it is iterative E[x,t] = x(t)*wgt + E[x,t1]*(1wgt) with wgt(t) ~ (period1)/(period+1) or wgt(t) ~ exp(t_last  t)/period
choose q = choose Prob(.)





JTDerp


Total Posts: 66 
Joined: Nov 2013 


Thanks for clarifying.
Going further with correlation modeling & into trying to forecast them, something I picked up from one of Rob Carver's recent blog posts:
"The maximum expected risk measure assumes that Gaussian risk is sufficient, and that we can forecast it's components (correlation, and standard deviation). Now let's relax that assumption. Correlation risk is the risk that instrument correlations will do scary unusual things, that happen to be bad for my position. If this has already happened (i.e. we have a correlation factor problem) then it will be dealt with in the expected risk calculation, that uses recent historic returns to calculate the instrument correlation. But what if it is about to happen?
There is a very simple way of dealing with this, which is that we replace the estimated correlation matrix with the worst possible correlation matrix. Then we reestimate our expected risk, and plug it into a risk multiplier formula. Because we're shocking the correlations to the extreme, we allow expected risk to be 4 times larger than our target.
(There is no justification for this number 4, it's calibrated to target a particular point on the realised distribution of the estimate of relative risk.)" 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


nikol


Total Posts: 1124 
Joined: Jun 2005 


All this means that Carver's risk engine structurally embeds correlation as parameter. Likely, he estimates correlation and is able to estimate its CLevel as well. So, perhaps he sets corr range as minmax over some past historic period and including estimation uncertainty (CL). And plugs worsescenario into his risk engine.
He could use Bayes approach to get similar result.





ronin


Total Posts: 585 
Joined: May 2006 


> Going further with correlation modeling & into trying to forecast them, something I picked up from one of Rob Carver's recent blog posts:
Bumping the correlation matrix is fine. That's correlation trading 101. But it doesn't solve your problems with correlation.
Which is that sh.t has fat tails. Correlation only knows about what happens ATM. You are looking at ATM, hoping that it will tell you what happens in the tails. And then you'll be surprised and disappointed when it doesn't. 
"There is a SIX am?"  Arthur 


deeds


Total Posts: 478 
Joined: Dec 2008 


In the past have considered conditional independence in gaussian field (with log surprise measure) or nonparametric setting to try to understand correlation in 'tails'.
I think an algebraic approach (> Pearl, Diaconis) here offers something
(does compressed sensing give us a signal processing type framework which could inform original question, though with an exotic perspective?)
Sensible?
Other alternatives in this direction?




