Forums  > General  > ML Methods: finding the right hammer  
     
Page 1 of 1
Display using:  

Strange


Total Posts: 1453
Joined: Jun 2004
 
Posted: 2018-08-20 00:46
I am trying to experiment with using ML for my little corner of the world. For starters, I am trying to figure out how to frame the problem. There are N assets with M features over T time periods.

On one hand, it could be thought of as a classification problem (i.e. "buy" or "sell" or whatever), which would probably make the problem more tractable and nudge it towards the ensemble methods like random forest or maybe some smarter forms of clustering. On the other hand, I can assume that I am looking for a linear relationship between the forward looking returns and some features - thus it might be wise to use regression methods.

The positive of the latter, IMHO, is that I can actually review the trees and see if it makes a modicum of sense. The positive of the former is that it would produce a continuous signal and thus have preferable characteristics from the risk management perspective. Now, maybe I can make it a two-pass model were first I identify alphas via classification and then build a continuous model using regression on relevant features?

Also, I am sure people here have tackled these issues - any advice?

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

anonq


Total Posts: 13
Joined: Aug 2018
 
Posted: 2018-08-20 01:18
clustering is unsupervised learning so for example if you want to find groups of related assets given a set of features so likely not what you're looking to do

if you're looking to do predictions then you'll want to use classification or regression and all the trees can do either (so can do a non-linear regression). unless you're doing hft where one can look at things in almost discreet steps probably going to want to do a regression cause otherwise you lose a ton of granularity/information in your dependent variable when classifying.

so that's the easy part.. "hard" part is going to be properly normalizing features/dependent variable, sufficiently understand the different algorithms to not over fit and pick appropriate parameters.



EspressoLover


Total Posts: 338
Joined: Jan 2015
 
Posted: 2018-08-20 21:30
I've had a fair bit of luck with ML approaches and techniques. One thing though is that quant trading tends to a very different beast than most problem domains. In particular when it comes to predicting forward returns the signal-to-noise ratio is very very low. So in general, most techniques don't work out of the box. At the very least you need to have a pretty deep familiarity with the model, and have an intuitive understanding of what kind of inductive biases you're assuming. Often you need to get under the hood and actually tweak the implementation.

Some random thought making up my 0.02 bps on this topic. As always YMMV.

- Always start with the simplest methods first. 99% of the time an OLS is more appropriate than a deep convolutional network.
- Start with the cleanest datasets possible. The more complex the model, the cleaner the data needs to be. Since the actual signal is so small, even a small error can make it so that your technique spends all its explanatory power on the bad data instead of the actual alpha.
- Always keep data as pure out-sample. Don't even look at until you have a final product. Sometimes overfitting can happen on the level of the researcher picking models, rather than inside the model itself.
- Be careful using techniques designed for classification. Remember in classification there is no penalty for overconfidence. In trading, there's a tremendous penalty for overconfidence.
- Bias towards fitting shorter-horizons. Shorter-horizons are less noisy, less prone to overfitting, and exhibit more stable behavior over time.
- Random forests are the closest thing to a free lunch in all of ML
- Unsupervised learning tends to not be that useful. Markets are not only noisy, but they're efficient. Whereas in most problem domains the most visible features tend to be the most predictive, in trading obvious things get arbitraged away quickly. For example let's say we're looking at a bunch of order flow features. If we do PCA, the biggest eigenvectors are probably things telling us what the total volume is in the market. That doesn't give us much useful alpha. Whereas the relative balance of buy/sell order flow, which is critically predictive, probably has comparatively small eigenvalues.
- For that reason deep learning also tends not to be that effective. A lot of deep learning is using some form of unsupervised learning to compress the data, before training on the extracted features.
- Ensemble methods are not only powerful, but also robust against changing market regimes. The original Netflix prediction paper is a great example. Fit a lot of different models with different techniques, hyper parameters and feature sets, then combine them together.
- Along the same lines stacking models is also very powerful. Start with simple, linear techniques, then fit more complex models on the residuals. If 50% of your variance can be explained by a three-dimensional hyperplane, a tree-based implementation is going to waste a tremendous amount of degrees of freedom doing something that's dead simple for least squares.
- In a lot of ways ML is often just a substitute for feature engineering. Many complex models could probably be replaced with linear fits of better features. (Not saying that the tradeoff isn't worth it, I'd rather spend 100 hours of compute time than 10 hours of research time.) But it helps to be aware of the tradeoff between modeling and feature engineering and direct your time towards where returns are highest.
- If you're doing anything latency critical, be aware that most common ML representations are wasteful and computationally expensive to evaluate. If you must have it, you need to be clever about avoiding touching these things in the hotpath.
- [Placeholder for further additions if I think of any...]

Good questions outrank easy answers. -Paul Samuelson

kuebiko


Total Posts: 24
Joined: May 2018
 
Posted: 2018-08-20 21:41
@EL, posts like this are a gem. I’ve got a few of yours bookmarked. Appreciate the tips.

katastrofa


Total Posts: 458
Joined: Jul 2008
 
Posted: 2018-08-21 00:17
Why would you even think about using a convolutional network? It's been designed for image data.

kuebiko


Total Posts: 24
Joined: May 2018
 
Posted: 2018-08-21 00:25
Convnets have become quite popular for sequence modeling tasks (where you're convolving over the time axis rather than over spatial dimensions). So they're conceivably applicable to financial time series. But I think in this case that's beside the point, as EL probably just meant to use it as a generic stand-in for "fancy sophisticated ML algorithm"

sharpe_machine


Total Posts: 16
Joined: Feb 2018
 
Posted: 2018-08-21 00:30
Conv nets work very well for sequence modelling problems.
For example, they show (near) state of the art performance in music recognition tasks (see WWW 2018 workshop on it). Also, they are very good in some language modelling tasks (and much better when stacked with LSTMs).

gaj


Total Posts: 25
Joined: Apr 2018
 
Posted: 2018-08-21 06:00
> Random forests are the closest thing to a free lunch in all of ML

Why do say they're free lunch? Using random forests out of the box always results in massive overfitting for me. Granted, I haven't tried very hard to tinker with the parameters and I don't have a good intuition on what's going on under the hood.

Strange


Total Posts: 1453
Joined: Jun 2004
 
Posted: 2018-08-21 06:38
As per advice of a friend, I decided to use a binary "like"/"no like" metric instead of continuous returns prediction. This allows me to frame this as a classification problem and I am going to be looking for more-or-less tree paths that tell me "you want to own this thing if it has X, Y, Z". It's more of hidden relationship discovery than anything else, since anything counter-intuitive that might come up I can throw out or validate using other methods or additional data.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

gaj


Total Posts: 25
Joined: Apr 2018
 
Posted: 2018-08-21 06:56
Binary classification only makes sense if the return distribution is symmetric. Otherwise your classifier will tell you to sell OTM options all the time.

sharpe_machine


Total Posts: 16
Joined: Feb 2018
 
Posted: 2018-08-21 07:15
> Binary classification only makes sense if the return distribution is symmetric

You can always construct 'balanced' batches (not only in neural networks ofc.) through some undersampling or tweak loss function to penalize for mistakes on a minor class.

Maggette


Total Posts: 1062
Joined: Jun 2007
 
Posted: 2018-08-21 11:20
I like a lot of the stuff ES said, but also not so sure on the free lunch quote regarding rf. Like ES I like to start with regressions. Ridge or lasso if there are many features. I also almost always compare that with results of a huber regression to check for impact of outliers. Than progress to decission trees from there. If there are non linear effects, you should see them. Than I progress to ensembles (like random forrests), boosting and stacking.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

Strange


Total Posts: 1453
Joined: Jun 2004
 
Posted: 2018-08-21 14:26
@gaj Otherwise your classifier will tell you to sell OTM options all the time.
Or I can simply look only at cases where it tells me to buy vol :) thats the beauty of supervised learning, right?

@Maggette hmm. I have never played with any of the fancy regression methods (I do use OLS a lot, obviously), but they look intimidating

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?'

katastrofa


Total Posts: 458
Joined: Jul 2008
 
Posted: 2018-08-21 23:43
@convnets for time series

I see, thanks. I didn't know about that. But I still think that convnets wouldn't be a good fit for financial data, because they assume translational invariance.

TonyC
Nuclear Energy Trader

Total Posts: 1282
Joined: May 2004
 
Posted: 2018-08-22 20:28
Strange said:
"I decided to use a binary "like"/"no like" metric instead of continuous returns prediction"

"ordinal regression" (where the dependent variable is a rank, in this case, just two ranks) builds upon logit and probit regression, and could be used to give you a "go / no go" decision ...
... then use random forrest to explain the residuals

flaneur/boulevardier/remittance man/energy trader

nikol


Total Posts: 553
Joined: Jun 2005
 
Posted: 2018-10-11 15:44
some favorite about the hammer

"If all you have is a hammer, everything looks like a nail"
Abraham Maslow
Previous Thread :: Next Thread 
Page 1 of 1