Forums  > Basics  > Machine Learning with Multivariate Data vs Multivariate Time Series Data  
     
Page 1 of 1
Display using:  

Jurassic


Total Posts: 358
Joined: Mar 2018
 
Posted: 2020-01-28 19:32
What are the differences when training models when you think about multivariate data (no time) vs multivariate time series data? What I mean by this is that we have X_1,....X_N where N is large and for each each for these we have a Y. Assume that we have 10000 samples of (X_1,....X_N, Y).

Does it make a difference when you think about statistical methods vs machine learning methods vs time series analysis methods?


Nonius
Founding Member
Nonius Unbound
Total Posts: 12794
Joined: Mar 2004
 
Posted: 2020-02-05 18:10
the main difference is if data is temporal in nature, then you want to be careful about not randomizing/shuffling data so that out-of-sample (and/or validation) sets occur before training sets.
more minor differences are a) there are lots of ways of natural downsampling of temporal data and b) time (or some transformation thereof) itself can be a feature in time series data.

On your second question, I personally don't make much distinction between those three viewpoints, although ML people slice up datasets into three subsets (training, validation, testing) whereas the old skool stats way of doing things is sort of binary, ie, in-sample/out-of-sample.

Chiral is Tyler Durden

Maggette


Total Posts: 1233
Joined: Jun 2007
 
Posted: 2020-02-10 12:45
I do lot's of time series analysis.

And I don't make an distinction between "classical techniques" (SARIMAX,ARCH Exponential Smoothing, State Space/Kalman Filter, Elastic Mode Decomposition, Fourier Transformation) and more ML driven stuff. More often than not a stats-machine learning hybrid approach is a good way to handle things. Have a look at everything signal processing has to offer.

The classical train/validation/test split is often dangerous. You should in addition work with some adapted form of cross validation, that takes intoc account the autocorrelation or even mutual information with respective lags.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

gaj


Total Posts: 103
Joined: Apr 2018
 
Posted: 2020-02-10 14:31
@Maggette: what are the benefits of using classical techniques compared to machine learning?

Correct me if I'm wrong, but it seems that most classical techniques fall under the category of generative models. This means you start with a parameterized model of the system, then estimate the parameters using MLE on the observed data, and finally make a prediction using the estimated parameters. In contrast, machine learning methods make the prediction directly by minimizing some objective function. So I feel like classical methods are often weaker than ML methods.

Jurassic


Total Posts: 358
Joined: Mar 2018
 
Posted: 2020-02-10 17:55
@Maggette

you cant use cross validation easier with time structure as you need to avoid sampling points in the future and testing them on the past

Maggette


Total Posts: 1233
Joined: Jun 2007
 
Posted: 2020-02-10 18:22
@Jurassic That's why I wrote you should use some variant nested cross validation. The normal nested CV does not take autocorrelation into account.

Something along the line like this:
https://robjhyndman.com/papers/cv-wp.pdf
https://www.sciencedirect.com/science/article/abs/pii/S0304407600000300

And by the way: IMO it is important to "retrofit" your final model. Not to use it for forecasting, but for model selection and check your hyperparameter set. If you train your model on "future data" and
A) its hyperparameters look vastly different than on your normal train test split
B) does not fit past data when trained on future data and tested on past data

you more porbably than not have overfitted garbage from the start


@gaj: I don't think so. IMHO Boltzman Machines are generative models. Also in my books regulation methods for linear regression like ridge or lasso tend to be from the ML tribe. But I do think I get what you are saying. And a Random Forrest comes with a strict set of preset parameters as well (depth, leaves etc). That's in general a very strong assumption about how the world you try to model works.

And I don't feel that they are weaker. They can be more complex.

It's all rather pointless semantics to me. I use what I think make sense....and check it by using stuff that I doesn't make sense. Computaion time is cheap these days. I recently had success in time series with Deep ANNs where I at first sight had the opinion that classical models will be hard to beat. And of course I had the reverse case more than once.

At the end of the day, allmost all time series models I have in production are some kind of hybrid.

Pure LSTMs sucked for me so far. DeepAR or a hybrid method of CNNs and other stuff worked fine so far.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

ronin


Total Posts: 585
Joined: May 2006
 
Posted: 2020-02-11 20:46
> Does it make a difference when you think about statistical methods vs machine learning methods vs time series analysis methods?


Well.

When you are doing things like option pricing or portfolio optimization, your worst case scenario is that your time series is completely random. So that's what people looked at, and that is how we ended up with theories for option pricing and portfolio optimization for lognormal random variables.

But then, you can't arbitrage a lognormal process. We also employ people whose job is to arbitrage things. If that is you, you have to start with the assumption that things are slightly less random, and slighty more time-series-y. Otherwise you are out of the job.


> In contrast, machine learning methods make the prediction directly by minimizing some objective function.


Yeah. That's a bit older than machine learning. It is called non-parametric statistics. It existed long before there was machine learning. Or even machines.

Ed Thorp did a fair amount of work on that. But it was always niche.

Why? Because it is really difficult to judge when you are generalizing and when you are overtraining. Especially if your data set isn't unlimited. Does that sound familiar?

"There is a SIX am?" -- Arthur

nikol


Total Posts: 1126
Joined: Jun 2005
 
Posted: 2020-02-17 12:41
@Maggette

> Pure LSTMs sucked for me so far.

Same impression with LTSM. But I used something like diff(prices) at input and positive PnL at output. Had to rethink the training strategy.

>DeepAR or a hybrid method of CNNs and other stuff worked fine so far.

Hm, I came to similar conclusion trying to solve LTSM problem.

For the links Beer

Jurassic


Total Posts: 358
Joined: Mar 2018
 
Posted: 2020-03-15 16:51
" the main difference is if data is temporal in nature, then you want to be careful about not randomizing/shuffling data so that out-of-sample (and/or validation) sets occur before training sets."

@Nonius thats the answer I was hoping for. The problems regarding slicing of datasets is usually just to apply common sense.
Previous Thread :: Next Thread 
Page 1 of 1