JTDerp


Total Posts: 85 
Joined: Nov 2013 


I'm pondering a path forward to get a *legitimate* correlation matrix for the P&L returns yielded by several marketstrategy combos. There are two types of issues as I see it:
1) returns are asynchronous (i.e. a function of market session hours, strategy order logic, etc) 2) array lengths are unequal  'series A' might have only 20 data points in a span of, say, 6 months; 'series B' may have 2,000 points in same span. I.e. a verylowfreq strategy versus a relativelyhigher one.
The first idea for resolving these issues is to resample to a common frequency, and truncate the series with the longer overall timeframe, then do the correlation. But that's sacrificing accuracy for sake of simplicity, and then there's some assumption about the moments of the returns distributions.
However, with series A having only 20 records, a correlation calc seems very spurious / unstable. So I figured to separate the several time series (i.e. A & B among others, C/D/E/etc) into groups by some range of sampling frequency (e.g. 110 data points per month, 1150/mo, etc.). Obviously the lower the frequency, the larger a minimum timeframe should be to capture more points. Going further, it seems appropriate to use some process like crossvalidation (Bagging or kfold) to estimate a test error...but trying to do a traintest data set split of correlation values for forecasting seems totally wrong, as correlations in financial time series are notoriously unstable. Maybe something along lines of detrending/normalizing the data, doing cointegration, other?
Clearly I'm not a statistician :)
I've seen a phorum thread from '04 with a similar line of questioning here: https://nuclearphynance.com/Show%20Post.aspx?PostIDKey=11957
I've been playing with the R package 'highfrequency'  section 3.2 of this reference: http://highfrequency.herokuapp.com/index.html covers irregularlyspaced data being aggregated (I take it that's synonymous with resampling).
So the two questions are: 1) what to do about the async & unequal nature? 2) what 'rules of thumb' might be valid for minimum sample size to do the correlations?
P.S. I've dumbeddown the problem statement fair bit e.g. avoided extra references, for sake of brevity. I'll elaborate in a reply if asked 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 



ronin


Total Posts: 690 
Joined: May 2006 


There is a mathematical answer, and there is a real world answer.
In the real world, a strategy that is flat is returning zero. Just plug zeros in the time series. It solves the out of hours problem too  you are presumably looking at close to close, not open to close. The London strategy will have returned something from yesterday's NY close to today's NY close  that's what you use. The only issue is whether there is funding cost when it is not trading.
Mathematically, it's less straightforward.
The unequal sampling is easier. Correlation is just regression. There is no reason you can't regress them when they are unequally sampled. It's one run of solver in excel, or use LINEST(). LINEST should also return the stats like confidence intervals etc.
Asynchronicity is more difficult, fwiw. The basic idea is that you interpolate prices from yesterday's close to today's open. It should really be Brownian bridge interpolation, but good luck estimating the out of hours vols. 
"There is a SIX am?"  Arthur 


nikol


Total Posts: 1403 
Joined: Jun 2005 


Calculate, Eq[X], Eq[Y], Eq[X^2], Eq[Y^2] and Eq[X*Y]
The rest you know. Key is in choice of "q". Simplest  timeweighted EMA.
Or you can define X and Y as brownian (any other?) with poisson sampling and unknown correlation which you can extract with loglikelihood. If time series are shifted (X=NY close and Y=HK open) then you assume some process and correlate X propagated through the night with Y.
Or you can build bivariate (empirical) CDF and hang some (gaussian) copula on that.

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 



JTDerp


Total Posts: 85 
Joined: Nov 2013 


"Just plug zeros in the time series."
> That would of course add downward bias to the correlation, but maybe that's appropriate for such a shifty proposition as this. Will give it a roll.
"It solves the out of hours problem too  you are presumably looking at close to close, not open to close."
> CtC would help with the async issue...the thread that I mentioned in OP which NeroTulip started went into sample calcs in his last post, and 'jungle' posited to pick a time somewhere as your own "fix"...that seems fair when the frequency of trades for each marketstrategy combo has some minimum threshold and up to some max, and a common overlap of session times, but I suppose it's still warranted to separate the superlowfreq strategies from the higherfreq ones and into several subportfolios, as it were.
Thanks for the Excel & LINEST() reference...making confidence intervals of these correlations was the general direction I was looking toward.

"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


gaj


Total Posts: 117 
Joined: Apr 2018 


Playing devil's advocate: strategy A is buy and hold S&P500 ETF in Asia, strategy B is buy and hold S&P500 ETF in the US. The correlation should be 100%, but I'm pretty sure no amount of statistics can give you 100% correlation just by comparing daily returns of S&P500 at Asia close vs US close.
My first instinct is to downsample to a lower frequency, i.e., compare the weekly, monthly, or even quarterly P&Ls. Then the asynchronous part gets negligible compared to the overlapping period. In the case of asynchronous ETFs above, the correlation should get closer and closer to 100% as the downsampled period gets longer. But you need a large amount of data to get a good estimate. 




JTDerp


Total Posts: 85 
Joined: Nov 2013 


"Calculate, Eq[X], Eq[Y], Eq[X^2], Eq[Y^2] and Eq[X*Y]"
Sorry Nikol, I'm dense at the moment...by 'Eq', are you alluding to a spline/curvefit type of parameterization of the returns? 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


nikol


Total Posts: 1403 
Joined: Jun 2005 


No, usual notation for something like Eq[x] = \Int x * Prob(x) * dx
for example, for EMA it is iterative E[x,t] = x(t)*wgt + E[x,t1]*(1wgt) with wgt(t) ~ (period1)/(period+1) or wgt(t) ~ exp(t_last  t)/period
choose q = choose Prob(.)

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 



JTDerp


Total Posts: 85 
Joined: Nov 2013 


Thanks for clarifying.
Going further with correlation modeling & into trying to forecast them, something I picked up from one of Rob Carver's recent blog posts:
"The maximum expected risk measure assumes that Gaussian risk is sufficient, and that we can forecast it's components (correlation, and standard deviation). Now let's relax that assumption. Correlation risk is the risk that instrument correlations will do scary unusual things, that happen to be bad for my position. If this has already happened (i.e. we have a correlation factor problem) then it will be dealt with in the expected risk calculation, that uses recent historic returns to calculate the instrument correlation. But what if it is about to happen?
There is a very simple way of dealing with this, which is that we replace the estimated correlation matrix with the worst possible correlation matrix. Then we reestimate our expected risk, and plug it into a risk multiplier formula. Because we're shocking the correlations to the extreme, we allow expected risk to be 4 times larger than our target.
(There is no justification for this number 4, it's calibrated to target a particular point on the realised distribution of the estimate of relative risk.)" 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


nikol


Total Posts: 1403 
Joined: Jun 2005 


All this means that Carver's risk engine structurally embeds correlation as parameter. Likely, he estimates correlation and is able to estimate its CLevel as well. So, perhaps he sets corr range as minmax over some past historic period and including estimation uncertainty (CL). And plugs worsescenario into his risk engine.
He could use Bayes approach to get similar result.

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 



ronin


Total Posts: 690 
Joined: May 2006 


> Going further with correlation modeling & into trying to forecast them, something I picked up from one of Rob Carver's recent blog posts:
Bumping the correlation matrix is fine. That's correlation trading 101. But it doesn't solve your problems with correlation.
Which is that sh.t has fat tails. Correlation only knows about what happens ATM. You are looking at ATM, hoping that it will tell you what happens in the tails. And then you'll be surprised and disappointed when it doesn't. 
"There is a SIX am?"  Arthur 


deeds


Total Posts: 516 
Joined: Dec 2008 


In the past have considered conditional independence in gaussian field (with log surprise measure) or nonparametric setting to try to understand correlation in 'tails'.
I think an algebraic approach (> Pearl, Diaconis) here offers something
(does compressed sensing give us a signal processing type framework which could inform original question, though with an exotic perspective?)
Sensible?
Other alternatives in this direction?





JTDerp


Total Posts: 85 
Joined: Nov 2013 


> In the past have considered conditional independence in gaussian field (with log surprise measure) or nonparametric setting to try to understand correlation in 'tails'.
By 'conditional independence' are you alluding to a binary type of filter for deciding when correlations are (semi) stable? A crude example: 'if volatility of A is less than x, and vol of B is less than y, then 1 else 0"?
On a side note, much thanks to you, ronin. Your perspective on this and many other threads is very useful/pragmatic. 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


nikol


Total Posts: 1403 
Joined: Jun 2005 


found in my archive directory https://epubs.siam.org/doi/pdf/10.1137/1.9781611972757.74
discussion is nearly basic, but it gives good basis. 
... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 



deeds


Total Posts: 516 
Joined: Dec 2008 


thank you! very grateful
regarding your signature  Before enlightenment: chopping wood, carrying water. After enlightenment: chopping wood, carrying water.
EDIT: JTDerp  apologies for delay in replying...will find a way to summarize. 



nikol


Total Posts: 1403 
Joined: Jun 2005 


Main message I carried out from that paper is that, in order to calibrate on asynch/uneven data: 1. we have to setup process/model of prices evolution and then 2. adopt all data points to this model while adjusting parameters.
Clearly, this approach is sensitive to the model choice. However, that can be resolved via "Bayesian model selection" or something. Of course, the outcome will depend on the selected "space of models". And I think this is the best what we can do here.
PS. dont you like my signature?? how dare you? 
... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 



deeds


Total Posts: 516 
Joined: Dec 2008 


...like the signature a lot...think about the point a lot...trying to provide the seeds of an answer? 




I experimented with something like this in general. Do your overlapping samples not provide sufficient insight?
Could you train something to learn the overlapping dynamic then test out of sample? Simulation based methods may allow you to get an "idea" of what would be happening when market1 open and market2 closed etc. 




nikol


Total Posts: 1403 
Joined: Jun 2005 


Indeed, controlled/supervised: AI/ML training over MCgenerated (with set of fixed models) sample and then unsupervised: learning transfer on real life data could be an interesting approach. 
... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c) 


JTDerp


Total Posts: 85 
Joined: Nov 2013 


@contango: the issue seems to be moreso due to the large differential of frequency of returns...spurious correlations. So the unequallength and asynchronous features are both inherent.
I've shelved this portion of the overall routine for the past few months to let ideas percolate  I paused with an idea to simply subdivide the collection of marketstrategy combinations byfrequency, so lowerfrequency grouped in one "subportfolio" (what constitutes "lowerfrequency" could be overfit/goalseek'd too much & create another rabbit hole due to snooping), mediumfreq, etc. Rob Carver's writings about portfolio optimization definitely frown on that, and I hold his views on avoiding overfitting in high regard, so it's still percolatin'. 
"How dreadful...to be caught up in a game and have no idea of the rules."  C.S. 


