Mistro


Total Posts: 9 
Joined: Aug 2018 


Hi everyone, I am building a model for relative value equity options trading and could use some help with some of the statistical aspects. The two assets under the microscope are VNQ and SPY. Using OLS we get a very nice relationship when we take the log of both their volatility. log(Spy_vol) ~ log(VNQ_vol). The problem is, the auto correlation of the residuals is very high.
I am looking at ARIMA models (generalized least squares did not seem to have a better fit), but I am not sure on how I can apply it to this data set or if there are other alternatives. Below is the OLS regression and the AC

men lie, women lie, numbers don't 



day1pnl


Total Posts: 50 
Joined: Jun 2017 


Does your time series consist of changes in vol ("returns") or outright vol ("levels")?
To give some guidance how I would proceed.
0. Carve out the last 10% time points of your time series from your data set and put it in a drawer without ever looking at it. 1. Do some exploratory data analysis on the remaining data without making too many probabilistic assumptions or transformations (box plots, marginal empirical distributions of your response and covariate, joint empirical distro, scatter plots of x vs y, etc) 2. Write down your parametrized statistical model. E.g. it might simply be R(SPY_i) = alpha + beta * R(VNQ_i) + gamma * r_i + epsi for i = 1,...n, where epsi are independent normals. R(*) is dalily return. The r_i could be anything you might think makes economic sense to add  like an interbank rate or whatever. This is basically just CAPM equation. For vols would try parametrizing from that starting point  just be mindful that if your transform your returns, SPY_i, into nonlinear transformation of SPY_i, they can't both be normally distributed. Also in this case both your covariate and response are stochastic, so in adition to considering on which scale normality applies, you need to consider whether your line of approach is going to be from the full (joint) likelihood function or from the conditional likelihood based on your regression. 3. Find estimators for parameters and do MLE, and get confidence intervals. 4. Try to use your model to predict the 10% last observations you had put in the drawer 



Mistro


Total Posts: 9 
Joined: Aug 2018 


It is the 20 day yang.zhang vol. Here is the non log plot for a better understanding.

men lie, women lie, numbers don't 



Mistro


Total Posts: 9 
Joined: Aug 2018 


Thanks day1 for your in depth response. The problem I have is, the residuals are correlated. This means the regression is not BLUE no matter what I incorporate into the model. One of my models gets a R^2 of .89, but it still has correlated errors. Are you saying I should take the daily returns rather than the realized vol over n amount of periods? Do you personally use linear regression for time series data? 
men lie, women lie, numbers don't 


schmitty


Total Posts: 57 
Joined: Jun 2006 


I suspect that your main problem is that you have overlapping data. If each data point represents a single day, then you have 19 days overlap in your YZ estimator with the previous day's value. Throw out your current model. Start with a model on a nonoverlapped subset (current data sampled every 20 days) of your data. It will still have autocorrelation in both your predictor and response, but nothing like the 0.97 figure at lag 1 in the acf of the residuals like you have now. OLS as you have it set up in your current model is inherently misspecified here. Try fitting a bivariate VAR (R package vars and several other packages) model to the data then report back here with the acf and pacf of the resisuals of that model. 




nikol


Total Posts: 594 
Joined: Jun 2005 


I would approach it this way:  take daily rates, r, over calibration period (from t= N to 0) for each series, j.  calibrate ARIMA to each series, such that you get ARIMA(param_j)  check (ARIMA)residuals from each calibration: res (from t=N to 0 for j)  check marginal distributions of these residuals and cova matrix  apply PIT (prob integral transformation) to each of res_j and check cova matrix again. PIT is empirical CDF^{1}(x).  try to model (ARIMA)residuals with something like GARCH or else. The result of that will be (GARCH)residuals.
When to stop this research?
The moment, when you (ARIMAGARCHxyz)residuals becoming Normal you can trust you have a good model.
How to use it?
 Simulate Correlated Normal random core.  Apply PIT or GARCH to model ARIMAresiduals  Apply ARIMA to model daily process of returns  integrate your process to the horizon necessary (10 days, 1 months, 1 year etc etc).
It is like "Russian doll" (Matreshka)  you open layers till the core of normal correlated sample, then you have to reassemble reality back to the space of returns. 



day1pnl


Total Posts: 50 
Joined: Jun 2017 


You cant really use R^2 for anything.
> Are you saying I should take the daily returns rather than the realized vol over n amount of periods? Do you personally use linear regression for time series data?
No no, just that realized vol (SSD)^(1/2) is a nonlinear transformation of the former, which in turn is described by “CAPM”. I personally dont much time series at all  but if you need to find the beta ratio in which its appropriate to compose your portfolio then regression is correct. Still not sure exactly what you want to achieve with your model except its “for trading”?
On GarchARIMA: have yet to see a economic time series model that was not hysterically overfitted or could actually predict anything, and u open a different can of worms there in terms of T0 distribution for your data imo.. but im sure some people can actually make it work
Anyways just my 2c, hope it helps 




nikol


Total Posts: 594 
Joined: Jun 2005 


time series model that was not hysterically overfitted or could actually predict anything
Totally agree. I descried exemplar framework, where the modeler can add/remove components at his/her taste or reasoning.
At least what I can say for sure is that the distributions of returns shown on plots cannot be correlated directly in the model without proper transformation. If you do correlate them like shown in the plot, you get into trouble. 


