Forums  > Basics  > Newbie question re modelling and identifying trading opportunities  
Page 1 of 1
Display using:  


Total Posts: 7
Joined: Jun 2020
Posted: 2020-06-15 19:11
Hi all, I'm keen to get into the quant finance field, and though I've a maths/compsci background and some software development experience, I've very little experience on the trading side. As such I thought I'd try cut my teeth with cryptocurrencies, as the market data is readily available and barrier to entry (with regards to infrastructure) significantly lower than traditional markets.

So I plugged into the top ~15 exchanges (by volume) and have books + trades (+ funding for some exchanges with swaps, and liquidations on Bitmex) for a variety of spot/futures/swaps markets, and am fully drowning in tick level data, and am now trying to make some sense of it.

First stop was looking into linear regression. As a lurker of the forum I'm aware this a popular approach, however I'm unsure of how to apply this in a high frequency setting. Is a valid approach to fit with say, 5m samples, and then use this model with tick level data? I can see how this may be problematic, as you're assuming the relationship at 5m holds for higher frequencies, but if you're estimating a beta between coins for example, then it seems like a reasonable thing to do.

Another question I have is, how popular is it just doing super simple analyses of the kind 'last time x,y,z happened, how did the world look at t?'. For example, questions like 'if $X trades, where $X is in the upper 90th percentile of trade sizes over the last 24 hours, and mids across some set of exchanges haven't moved on average by more than 2bps for the last 10 seconds, what is the evolution of mid price on exchange Y over the next 5 seconds?'. I'm finding that a lot of the action appears to take place based on a relatively small set of ticks, and it seems that identifying these trigger events is pretty important, just I'm wondering how sophisticated you really need to be.

As another example of what I'd consider dumb compared to a lot of what I read in various papers - I'm looking at some other pretty un-statsy stuff, like trying to figure out which exchanges and products have the most 'porous' books, so ones where trades often go deep into the book. The obvious strat here seems like it would be to ladder orders in the book, (possibly with size increasing as a function of distance from mid, like a sqrt(x) shape, with x = abs(mid - order_price)), and then hedge on exchanges with thicker books. It seems like a pretty reasonable approach, but as someone with no real intuition as to what will work I'm wondering if it's just too obviously dumb.

Last of all, I'm currently reading 'Quantitative Trading: Algorithms, Analytics, Data, Models, Optimisation' (I chose this as the authors appear to be practitioners), and have been going over their approach to modelling the LOB as a queue. Specifically they mention this paper 'Simulating and analyzing order book data: The queue-reactive model' ( It seems fairly straightforward to implement, but I'm curious - is this going too deep into the weeds for the moment? Similarly I see a bunch of stuff applying stochastic control, which appears to gets results, but I've no idea how applicable it is in practise. Even if it is I'm sure it needs to be used in the context of knowing exactly what it is you're doing, but I am curious, as it seems an interesting avenue

Of course the only way is to know if any of this works is to try, but if anyone has any pointers to guide the learning process I'd be very grateful!


Total Posts: 478
Joined: Jan 2015
Posted: 2020-06-26 02:40
Great username!

My general experience is that most alphas are pretty robust to the details of how they're modeled. If there's something "real" underneath, then a simple approach or rule of thumb should still do a pretty good job. It won't necessarily capture as much of the alpha as a sophisticated model, but it should still be pretty reliable. If someone tells you that an alpha doesn't work at all without a very complex model, that's usually a sign that it's spurious and will fall apart out-sample. You'd be pretty surprised how many $100 million trading desks basically run on the type of simple rules like the examples you mention.

That being said there's two major drawbacks to the rule of thumb approach. First they're deceptively easy to overfit, because the bias-variance tradeoff can't be evaluated. The advantage of classical statistical techniques is that the results come with error bars, which make it easy to evaluate the null hypothesis. Even modern-day ML approaches like neural nets don't give confidence bounds, but do give a fully automated pipeline from ground truth to finished product. Therefore they can still be evaluated with empirical risk minimization and cross validation.

In contrast, the rules of thumb approach involves some totally opaque human intuition, who's inevitably biased by what he's seen before. You can't just "forget" about everything outside the immediate dataset and work from scratch. So, if you're going to take this approach you have to take pains to minimize the effective degrees of freedom and implicit number of hypothesis tested.

It's pretty easy to think "oh, let me just trying bumping this threshold a little or add this condition...Hmm, that didn't really work, let me just try a slight variant..." And then pretty soon you're overfitted like crazy on what's a seemingly simple rule. In the rule of thumb approach, come up with a simple a priori hypothesis, test it, then take it or leave it. Once you start adding bells and whistles to juice it, you pretty much gotta move statistical methods.

The second major drawback for rules of thumbs are that they don't generalize. Say your rule is buy when X is bigger than a threshold, and Y hasn't moved for 10 seconds? What about when Y hasn't moved for 30 seconds, should you lower your threshold? Or if X is way bigger than threshold, can you weaken the condition to if Y hasn't moved for 5 seconds? What if Y has moved, but not when you beta neutralize it? How do you combine multiple rules into a single strategy? What if you want to adjust for different trading costs? Or skew your inventory for risk purposes?

Almost by definition, the rule of thumb approach means you're leaving money on the table. Not that this is the worst thing in the world. Done is better than perfect. And there's certainly value to quickly turning around an MVP into production, getting validation in live trading, then iterating based on real-world learning.

Good questions outrank easy answers. -Paul Samuelson


Total Posts: 25
Joined: Jan 2019
Posted: 2020-06-26 04:29
Thanks EL.

Don't mean to derail the thread, but another modeling q: if you have a 5 minute and a 3 hour alpha, how do you combine them? Any pointers or references are appreciated.


Total Posts: 117
Joined: Apr 2018
Posted: 2020-06-26 05:03
Great post as usual, EL.

Interesting that you say that the rule of thumb approach is prone to overfit. I'd normally think it's the opposite, but of course it depends on how many degrees of freedom there are and how many combinations you try. There's often no regularization applied to this kind of approach. If you just pick the best one from grid search, it's surely going to overestimate the true alpha. I like to think that ML methods are basically equivalent to trying a bunch of rules of thumb and regularize/shrink the ones that seem to work.

A separate question. You said that a simple model should be able to capture some alpha is there's something real. But what if there are hidden alphas that can only be discovered by deep ML models? Once you have more than 2 or 3 layers (either neural nets or random forests or whatever), the resulting alpha is usually not interpretable by the researcher. Do you believe that such hidden alphas exist?

@longGamma: I think that's one of the biggest open problems in quant trading. If there's a good fundamental reason to assume they're uncorrelated, you can just add them up. It's not easy to measure the "correlation" since the horizons are different, so you often rely on fundamental reasoning. There may be some smart ways to resample the data and analyze the signal interaction, but it's really case by case. A lot of people just implement them as two separate strategies, and cross internally if they go in opposite directions.


Total Posts: 1307
Joined: Jun 2007
Posted: 2020-06-26 09:14
I have to admit that I would pay QUITE SOME MONEY for a book co-authored by ronin, ES, goldorak and doomanx.

The activity is going down here. True. But these nuggets make it worthwhile to stop by once a day.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...


Total Posts: 113
Joined: Jul 2018
Posted: 2020-06-26 10:48
@gaj just because it takes a lot of compute to detect a signal, doesn't mean it's a very complicated function of the data (i.e. usually financial features aren't very deep).

Toy example: we're looking for some kind of time of day effect (movement at time T in asset set A correlated with movement at time T+t in asset set B). Across all frequencies and all asset sets A, B it would take a lot of compute (exponential time) to check every combination (and doing so while adjusting for multiple testing is another story). Yet the underlying relationship we're looking at is not complicated (just correlation). This is why firms have supercomputers yet don't employ 100 neural network 'experts'.

To rephrase it: a signal can be non-interpretable (there may be no obvious cause for a given anomaly) and also mathematically simple.

Makes it seem pretty impossible to find such inefficiencies (and looking at the distribution of people who have tried to do this, it pretty much is). To quote the great Theoden: What can men do against such reckless hate?

To get you started, a good knowledge of unsupervised learning, particularly in the high dimensional case goes a long way. Beyond that all the usual candidates: robust and realistic simulation, holding out of sample data, meticulous procedures for avoiding data mining.

did you use VWAP or triple-reinforced GAN execution?


Total Posts: 1352
Joined: Jun 2005
Posted: 2020-06-26 12:37
@EL and @doomanx

Don't you use akaike or alike to get formal overfitting warning?
Or the "rule of thumb" procedures give you enough assurance?

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c)


Total Posts: 113
Joined: Jul 2018
Posted: 2020-06-26 13:34
@nikol type of inefficiencies we are discussing here (i.e. those with no causal explanation, not some well documented factor) are found via some automated system so rule of thumb is useless. Rule of thumb has a very important place in trading (you can think of it as a coarse type of regularisation), but specifically for this task it has no value.

Information theoretic methods (of which akaike is one) are very important. More generally the maximum entropy approach is particularly important for such procedures. You might take this paper as proof this is a fruitful research direction.

did you use VWAP or triple-reinforced GAN execution?


Total Posts: 117
Joined: Apr 2018
Posted: 2020-06-26 13:45
Thanks doomanx. Agree that signals that are hard to find are not necessarily complicated.

> usually financial features aren't very deep

This seems to be the general consensus, which is my prior as well. On the other hand, popular ML applications outside of finance (such as AlphaGo, visual recognition, etc) typically have deep, complicated functions. People like to think that ML methods can automatically discover hidden truths that are incomprehensible by humans. But in finance where the noise is very high, it seems that we often have to restrict the search within simple, shallow functions, and ML is just a smarter version of grid search optimization.

> To get you started, a good knowledge of unsupervised learning, particularly in the high dimensional case goes a long way.

Any good reference for this?


Total Posts: 7
Joined: Jun 2020
Posted: 2020-07-03 12:40
@EL: Thanks a bunch for the in-depth response. I gotta say I've read a lot of your posts on this forum and I feel significantly wiser for having read them, I really appreciate your continued efforts! Likewise @doomanx, highly nutritious stuff!

It's good to know the crude approach isn't totally daft, but as you mentioned there are some pretty big issues with it, and I have found myself falling into the hole of overfitting through endless tweaking. I've tried to confront the lack of generalisation somewhat by creating a 'portfolio' of rules across multiple parameter values, like the last 5 secs, 10 secs, etc., but again as you say - how to combine them? How do I reasonably interpret rule 1 saying buy and rule 2 & 3 saying sell. I think will attempt some kind of portfolio optimisation based on backtesting these rules to try and add some structure.

As a result however, this has got me realising that a far better idea of desirable signal characteristics would be helpful in the quest. But what is the form of an ideal signal? This may be bit of a philosophical question, sorta like financial Platonism, but my vague, ill-informed prior is:

1. It updates per tick.
2. Also has some time element, so supposing you don't see a tick for 5 minutes, there would be some background adjustment based on time passed since a tick.
2. Is continuous, so signals strength of how much you wish to buy or sell instead of just 'buy' or 'sell'.
3. Would peak strongly around the moment to trade, and then dampen quickly afterwards, instead of peaking and persisting. (I guess this could be achieved by a two layer approach of one signal that may peak and persist, providing the buy/sell strength, and another which watches that signal for a state change from 'off' to 'on').

Of course it may be market dependent, but I would find it super edifying to hear some opinions on the general characteristics of a desirable / useful / well behaved signal (if such a thing exists).

On a totally different note, though something that others may find interesting, is that I've been trying to determine what a reasonable time to market is for some exchanges I'm considering using, and one of them I've looked at is Bitmex. This is apparently based in AWS Dublin (seemingly the ireland-c region from my experiments), so I spun up some AWS instances and checked the latency on the feeds over a couple of hours by comparing the exchange message timestamp vs the instance timestamp (I made sure to use the AWS time sync end point for this). Some results I found:

1. Book updates with a change in mid price seem to have significantly higher latency than those which don't (times are in milliseconds, sample sizes 405 and 230k respectively, so quite a diff in size admittedly):

Min. : 6.461
1st Qu. : 20.759
Median : 27.154
Mean : 38.985
3rd Qu. : 37.658
Max. : 339.448

Min. : 3.680
1st Qu. : 6.751
Median : 7.923
Mean : 14.073
3rd Qu. : 10.745
Max. : 413.524

2. In those cases where there is a tick, you'd be better off using the AWS london-a region:

Min. : 9.177
1st Qu. : 17.269
Median : 20.688
Mean : 28.477
3rd Qu. : 28.835
Max. : 318.879

Min. : 7.633
1st Qu. : 9.883
Median : 10.579
Mean : 13.330
3rd Qu. : 11.914
Max. : 410.949

Also, somebody appears to be getting in orders in at a new price level in under 5ms. Given the numbers above, would it be reasonable to think that some firms are going through private APIs? I'm hesitant to suggest that as there are many things I could change with my setup, as for one thing I use a shared instance, and I've no doubt a dedicated hardware instance with some thought put into its networking config would be far better, but I'm curious as to whether any of this looks odd. For example I'd expect the message with the mid update to have the usual latency, and subsequent updates to be delayed as orders crowd in.

Finally I agree with @Magette that a co-authored book would be nice. Maybe we can get a kickstarter going?


Total Posts: 117
Joined: Apr 2018
Posted: 2020-07-05 09:44
@steamed_hams: For someone who is not yet in the industry, you have put an impressive amount of effort to getting your hands on practical quant trading. You're certainly far ahead of other candidates who are memorizing interview questions to get in the industry. I have to say I'm impressed.

My 2 cents on some of your points.

> It updates per tick.

In general you want to focus on training your signal on a small number of interesting ticks. If your strategy is liquidity taking, you need to define a trigger condition on when you might take liquidity. So the model should be trained on the subset of ticks where the condition is fulfilled and ignore all the other ticks. If you're market making, your signal needs to be continuously streaming. But you still want to focus on the interesting tick events (e.g. when a price level is wiped out or when a new price level is forming, etc). The vast majority of the time when things are quiet, the signal should practically be zero.

> Is continuous, so signals strength of how much you wish to buy or sell instead of just 'buy' or 'sell'.

I think you're thinking of a signal as a "classification" problem. This is a valid approach, but I just want to point out that you could also use a regression approach. Instead of telling you to buy or sell, the signal gives you the expected price in the next N seconds. How you execute your trades based on this prediction is a separate problem.

> Would peak strongly around the moment to trade, and then dampen quickly afterwards, instead of peaking and persisting. (I guess this could be achieved by a two layer approach of one signal that may peak and persist, providing the buy/sell strength, and another which watches that signal for a state change from 'off' to 'on').

If your signal persists, then there's something wrong with your model. The model should automatically dampen the signal once the trade opportunity has passed. I have seen people doing this "dampening" artificially, but I think this is bad modelling.


Total Posts: 1352
Joined: Jun 2005
Posted: 2020-07-05 11:34

thank you for entropy paper


> interpret rule 1 saying buy and rule 2 & 3 saying sell

Note, that if your rules are chained, like rule3(rule2(rule1)), then it is not the same as rule1(rule2(rule3)) or rule3(rule1(rule2)) etc etc.

Moreover, formally speaking the number of rules should be interpreted as number of parameters in the trading strategy thought as a model, hence information methods are applicable which balance number of rules and some entropy/likelihood of the strategy.

Latency account should be part of your trading model, so if you sit in London and know signal delay from Dublin of 10 ms then you have to put latency uncertainty (~hist.volatility) around Dublin price. If you sit in Dublin, same idea applies to London-price. And so forth around the globe. You might find advantageous to sit in Man island to have a signal aggregator, if you are sitting in between London and Dublin.

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c)

Its Grisha

Total Posts: 84
Joined: Nov 2019
Posted: 2020-07-06 00:47

According to this thread, the AWS time sync service can be off by as much as 10s of milliseconds because of latency to the public NTP endpoint.

As far as I know, the Bitmex gateway should be sitting in AWS Dublin, so your results are quite counter-intuitive. I haven't done the same work though, so don't want to dismiss what you are seeing outright.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."


Total Posts: 7
Joined: Jun 2020
Posted: 2020-07-07 22:49
@gaj: Thanks a lot, this really is super helpful information. It's certainly making me rethink my approach. And by continuous I did indeed mean something more along the lines of a regression than classification, so I will try to forge a way down this path. Just dwelling on some initial ideas it seems that signals based on relative action between exchanges satisfy the desired signal characteristics quite well, so I'll take a closer look at that. I suspect that'll put me squarely in the HF realm however so possibly a doomed investigation from the start, but we shall see!

@nikol: Thanks for the pointers, however I'm not sufficiently sophisticated to have rules within rules just yet :) I plan to start with combining them additively to get a final aggregated signal and then passing this through to the next step of the system to determine my desired position.

> Latency account should be part of your trading model

Good point. And given the crazy jitter I'm seeing around delivery of market data in crypto this seems like an especially pertinent consideration. It does make me wonder whether looking at tick level data is worth it, and if I'd be better off focussing on attempting to find something at the seconds to minutes frequency.

@Its Grisha: It does indeed look a bit fishy! The numbers for messages which don't contain a tick event seem to line up with expectations though. However my setup definitely isn't accurate enough to make robust conclusions at the millisecond level. Though I would expect any error to be a +/- N ms error with an individual time request rather than a fixed clock bias, and so hopefully average away when taken across multiple measurements over time?


Total Posts: 1352
Joined: Jun 2005
Posted: 2020-07-07 23:02
I meant chain of rules, 3 after 2 after 1.
if you have for example
- rule1 (price>thresholdUP)
- after that rule2 (volume>threshold2)
- after that rule3 (bids volume>threshold3)
execute for example - send market buy order.

their permutation will deliver different results.

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c)
Previous Thread :: Next Thread 
Page 1 of 1