Forums  > Trading  > Fitting Model with Latency  
     
Page 1 of 1
Display using:  

Its Grisha


Total Posts: 89
Joined: Nov 2019
 
Posted: 2021-05-11 02:30
So let's say I'm fitting a model to predict mid price change in the near future.

For simplicity assume two coefficients with form: y = b1*x1 + b2*x2
Model is also linear in reality if it matters.

Let's also say feeds that provide data for x1 and x2 have known latencies L1 and L2 where L1 != L2.

Now the decision:
I can generate my dataset with idealized server timestamps, or I can generate the data with simulated arrival timestamps.

no latency:
- model fit is pure, less noise between cause and effect

with latency:
- model is fit on data that would be observed in production

Is this a try both and see situation? Or is one answer much more fundamentally sound than the other?

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

ronin


Total Posts: 680
Joined: May 2006
 
Posted: 2021-05-11 09:26
It kind of depends on what you are trying to do.

The advantage of the first model is that you can look at how sensitive you are to latency. And if the latency changes, you don't have to recalibrate.

For the second model, you probably need a bit of noise. x1 and x2 is the stuff you see, but there is a random thing that might be there but you don't see it yet. And that randomness might complicate things.

In reality, 99% of the time, the two models will agree. It's that 1% of the time that you probably want to worry about.

But here is my stupid question of the day: 'latency' and 'mid price' don't usually go together. You worry about latency because there is liquidity you want to take, or priority you want to get. Why does latency matter if all you care about is the mid price?

"There is a SIX am?" -- Arthur

rickyvic


Total Posts: 245
Joined: Jul 2013
 
Posted: 2021-05-11 15:05
if you assume some latency coefficient L you can simulate as normal but add the cost (positive or negative) as slippage for midprice between t and t+L

"amicus Plato sed magis amica Veritas"

Its Grisha


Total Posts: 89
Joined: Nov 2019
 
Posted: 2021-05-12 01:44
>> But here is my stupid question of the day: 'latency' and 'mid price' don't usually go together. You worry about latency because there is liquidity you want to take, or priority you want to get. Why does latency matter if all you care about is the mid price?

@ronin Tell me if this is super misguided, I was primarily doing some hedged market making but want to try building some purely directional taker alphas.

What i envision is if I have a strong opinion on what mid does in 30 seconds I can take liquidity and factor the immediate cost of eating into the book into my forecasted returns. Even though the horizon is relatively long, I believe the execution itself will still be latency sensitive based on the book's propensity to jump.

This is the kind of venue (you may guess the space) where the feed providing x1 might be 50 ms behind the x2 feed, and that's pretty much the fastest you can get. So even on a longer horizon these kinds of considerations start to matter for how the model output evolves at the moment we need to take liquidity. At least I think they start to matter.

I come from low frequency experience in quant equity so all of this is new to me.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

ronin


Total Posts: 680
Joined: May 2006
 
Posted: 2021-05-12 10:07
Got it.

I don't think it's misguided - it's perfectly reasonable.

But the issue is still with the mid. So say you see the mid move up - did it move because the bid moved up, or the ask? If the bid went up, the ask is probably looking more attractive relative to the mid. If the ask went up, the available ask is probably looking less attractive. And the mid is doing the same thing in both cases.

So all in, it sounds like you are on the right track. But you might need a bit more detailed view.

"There is a SIX am?" -- Arthur

Its Grisha


Total Posts: 89
Joined: Nov 2019
 
Posted: 2021-05-12 11:07
Makes sense, thanks for the tip. I thought about these considerations a bit actually, but this is a thin tick market with a very fat rebate and quite expensive taker fee.

So it's actually very rare to see the spread widen out because as a MM you can afford to be pretty wrong before quoting worse and giving up your price priority. Because of these factors I don't mind going with mid for a first pass.

As an improvement I totally agree with what you are saying though.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

EspressoLover


Total Posts: 484
Joined: Jan 2015
 
Posted: 2021-05-12 18:19
IME mid-price delta is kosher for maker-taker thick books (i.e. sits at one-tick wide spreads most of the day). The convenient part is that it blends pretty seamlessly into mid-frequency where mid-price is de rigueur.

You can always validate by fitting the same regression on mid, bid, and ask deltas. My guess is you'll get near identical coefficients. In these environments, 90% of the time the bid will immediately follow the ask and vice versa. There will be a handful of datapoint that are two ticks wide, but these are pretty rare and exert minimal influence. You should generally either fit a separate model for these points, or just set your HFT alphas to 0 when spread > 1 tick.

W.r.t. multi-venue latency, are you trading across venues? Or can you assume that you're only trading actively trading on a single exchange? Even if it's not the latter, is there a way to structure it so that each exchange runs as its own independent strategy with its own independent inventory? This isn't feasible if you need to do something like buy cheap at X then sell dear at Y. But if you can manage it, it simplifies modeling.

Assuming you're under that framework, what I would do is train the strategy on its "home exchange". Everything is translated into the home exchange's frame of reference. Market data from foreign sources has its timestamp shifted based on the data delay latency. Home exchange data uses native timestamps. (You still assume roundtrip execution latency when backtesting, but this is just a matter of delaying simulated fills.)

This minimizes the messy business of timestamp shifting. At high frequencies the bulk of your alpha will come from native, unshifted data. Of course, you'll want to rerun the training across a wide range of plausible foreign data delays. That'll give you an idea how sensitive your strategy is to market data latency.

Good questions outrank easy answers. -Paul Samuelson

ronin


Total Posts: 680
Joined: May 2006
 
Posted: 2021-05-13 10:33
I guess the only tip would be to be sure about what exactly your edge is, and what you are capturing.

E.g., is it latency arbitrage? Do you have speed advantage, and are you capturing a fraction of a tick per share?

Or is it some kind of quanty directional play, and you are capturing a few ticks per share? In which case, you'll probably end up giving that fraction of a tick up to the fast guys anyway, no matter what you do.

The usual beginner error is to try to do everything at once, and that never works. Just focus on doing one thing well.

"There is a SIX am?" -- Arthur

Its Grisha


Total Posts: 89
Joined: Nov 2019
 
Posted: 2021-05-13 11:53
EL & ronin, this is "quanty directional" at a single venue. The latency discrepancy actually comes not from geographical separation but one feed at the same venue lagging via batched updates.

Totally guilty of trying everything at once. We are on par with the fastest at this venue, but it is not where the bulk of price discovery happens. Initially we were trading something that looks more like latency arb. We quickly realized it's only going to scale with good cross-venue infrastructure, which we are are still working on.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

ronin


Total Posts: 680
Joined: May 2006
 
Posted: 2021-05-17 18:37
Latency arb won't scale in any meaningful sense, no matter what you do.

Quanty directional sounds like a separate strategy. There are pretty much no benefits to keeping them mixed, and there are many drawbacks. I'd split them and run them separately.

In theory, there are some benefits to letting them exchange inventory at mid. In practice, the number of times when quanty directional is saying go long and latency arb happens to be, of all things, long - yeah, that doesn't happen. Zero.

"There is a SIX am?" -- Arthur
Previous Thread :: Next Thread 
Page 1 of 1