Forums  > Software  > Test Harness Design for HFT  
     
Page 1 of 1
Display using:  

Its Grisha


Total Posts: 92
Joined: Nov 2019
 
Posted: 2021-03-27 05:35
Hi all, I come for advice on what appears to be a huge project. A nudge in the right direction could save a lot of wasted time here.

Right now our setup is a cpp application, userspace networking, single threaded execution pipeline from polling of network stack to sending the order. Slow path threads doing some numerics to update strategy trigger parameters and pushing telemetry. This is my first time working on a truly low latency system, but I have a feeling we've arrived at a fairly standard design for the domain.

Strategy testing is done in parallelized python right now. This leads to rules of thumb regarding fill rates which later prove to be inaccurate. Also it introduces headaches and operational risk with each iteration of translating the logic to cpp. Naturally, the right way seems to be the same code running in backtesting as in production. So the application needs to be tied into an exchange simulator. Preferably with testing happening faster than line rate by skipping event interarrival times.

I see a few options here:
1. Full packet captures of live data feeds, replay packet dumps and inject simulated order acks and fills. Unclear how the simulated matching engine is to maintain state here, either reading the same packet streams or from normalized data.
2. Run exchange simulator off of normalized orderbook data and rebuild all of the real network interactions from scratch instead of replaying any packet captures.
3. Rip out the core trading logic from the production app and ignore the feed parsing and networking layer. Come up with our own event triggering API for testing purposes.

I imagine it's challenging to get any of these options to approach something realistic given all of the timestamping nightmares, which are further confounded by cpu-time latency contribution if going faster than line rate. To be honest it's tempting to skip this entirely and stick to research in python and A/B testing in production with small capital. But it's hard to believe any serious operation lacks this functionality.

Anyone care to share what has worked for them here? As always, thoughts are deeply appreciated.








"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

prikolno


Total Posts: 94
Joined: Jul 2018
 
Posted: 2021-03-27 12:51
I've done and seen both option 1 and 2 used for strategies that remained in long-term production. From your language I suspect you prefer option 2 but aren't sure what's a customary way of addressing the latency assumptions.

With option 2 you just parameterize the queuing effects in the network stack and hardware between wire and host CPU with some sensible upper bound estimate. Usually layer 4 is offloaded so the error propagation is trivial compared to other parts (market-side) of your simulation.

I'm having issues parsing the part about "come up with our own event triggering API" under option 3 but I'm guessing it's an accommodation for this awkward design:

> single threaded execution pipeline from polling of network stack to sending the order. Slow path threads doing some numerics to update strategy trigger parameters

You can parametrically simulate the queues contributing to internal latency and parallel code paths too if I understand what you're thinking.

Not that I've had to make this trade-off in the trading platform application before but I take the position that whatever abstractions that make it easier for you to write the initial simulation are fine, even if they incur latency cost. Because you do want to get to production ASAP even if you have signals that depend heavily on simulation. It's most likely more important than optimizing for latency.

Even if you handwave the simulation, you can end up grabbing a significant market share within a month and by then improving simulation will often be on the backlog; the fastest I've seen or been involved with is about 0 to 20% ADV within a month at a top 10 exchange by volume, with a simulation-based strategy. (This is a conventional asset class, not the weird crypto stuff you hear these days.)

Option 1 sounds elegant in principle but in practice these days if you are reasonably fast, it's pretty standard that your captures may be bloated with all kinds of FCS-invalidated packets, which makes your simulation very slow in exchange for very little increase in accuracy. Then if you discard those from simulation, one may argue by induction that you should just go for option 2.


Its Grisha


Total Posts: 92
Joined: Nov 2019
 
Posted: 2021-03-27 22:20
Thanks for this, want to make sure I understand.

> With option 2 you just parameterize the queuing effects in the network stack and hardware between wire and host CPU with some sensible upper bound estimate.
This make sense to me on the exchange side.

>You can parametrically simulate the queues contributing to internal latency and parallel code paths too if I understand what you're thinking.
If running at line rate this should just be accurately reflected if using the same hardware? Or are you referring to running faster than line rate?

I can implement option 2 at line rate with a good chunk of work but no serious challenge. Where the challenge comes in is running faster than line rate. So there is always a receive queue on our end because we want to parse the feed as fast as hardware allows, and not at the real life arrival rate.

In even simpler terms, if using option 2, our matching engine implementation would go through historical market data deliver a feed to trading app as fast as it can generate it. Problem arrives when message X triggers order. Matching engine can't be 50 states ahead at this point when in reality it might be 2 or 3 states ahead.

So it's as if at each step we want to re-synchronize the host and trading app, figure out if the message triggers an action and then calculate the resulting state based on our latency parameters and the next few steps of normalized data, otherwise continue speeding through history.

It seems like the level of abstraction required for this fast playback starts to defeat the purpose of recreating real network interactions, and tempts me to option 3. What I meant there is skipping the network stack entirely and having the matching engine talk to the strategy directly by engaging it's callbacks. E.g. Strategy::OnFill(order info) gets called by the simulator instead of by the full fast path. But then this starts to look like the existing python code, just implemented in cpp. So simplifying pushes us back to the starting point.

Am I thinking about this the right way? Or is the full scale testing im describing typically done at real life line rate with only more hand waving simulations running fast?

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

prikolno


Total Posts: 94
Joined: Jul 2018
 
Posted: 2021-03-28 20:37
I slightly misunderstood option 2 and 3.

> What I meant there is skipping the network stack entirely and having the matching engine talk to the strategy directly by engaging it's callbacks. What I meant there is skipping the network stack entirely and having the matching engine talk to the strategy directly by engaging it's callbacks.

Yes, skipping the network stack is fine and quite common.

Without knowing much more, it will probably look cleaner if it's not directly the simulated market dispatching the callbacks, but a manager class that holds both the simulated market object(s), strategy object, control UI server, and other state objects like a virtual clock, manual risk settings. You synchronize everything by stepping forward the virtual clock.

If it helps you reason about how to modularize things better, there is a scenario that I feel any design should be flexible enough to support: the scenario where any model running in the strategy during production also needs to hold an instance of the simulated market class or its internal simulated matching engine module. A (future) situation where this may come up is if the venue only has a price book ("MBL") and you can use that module to estimate queue position.


EspressoLover


Total Posts: 490
Joined: Jan 2015
 
Posted: 2021-03-29 19:33
I think you may be confounding two separate problems. At least I've always thought of this in terms of two orthogonal applications. One is backtesting strategy. Two is testing/profiling/benchmarking code that will run in live trading.

For profiling, you want to be able to run as close to a full production environment as feasible. (But it also helps to have a reduced stack for lower fidelity but more convenient profiling.) This means matching the real-time cadence of live market data. (Sometimes it's convenient to squash long time gaps between packets, when you know for sure you'll just be idling.) This is where you run integration tests on the application, clock latencies, measure your SLAs, run Valgrind, figure out runtime bottlenecks, etc.

For backtesting, you don't need the full stack. You definitely don't want to touch network- keep everything in a single process. Code your backtest environment so it exposes the same order entry and market data API as prod. The strategy layer should't event be aware of if it's inside live or sim. Pump pre-normalized market data to a synchronous consumer. Latency buffering is handled on the simulator side based on the logical timestamps of the market data- not the physical wall clock. Simulated latencies are derived from the SLAs you measure on the profiling step. (Although always be sure to backtest under a variety of latency conditions.) A running backtest should use 100% of CPU.

The goal in a backtest simulation is to 1) have as much fidelity to live strategy performance as possible, and 2) process at a very high throughput rate. Reconciling sim and live is hard enough, but extra hard when they're actually separate codebases. So, make sure you can use the same strategy core on both sides by keeping APIs transparent. This also helps in terms of throughput, since all the optimization effort put into live code automatically makes simulation faster.

Quantity has a quality all of its own. When you put in the effort to run fast backtests, it has a drastic impact on research productivity. It's really nice to be able to backtest an entire day in a few minutes.

Good questions outrank easy answers. -Paul Samuelson

prikolno


Total Posts: 94
Joined: Jul 2018
 
Posted: 2021-03-29 22:40
To add to EL's point, I interpreted that your goal is to design the backtesting.

But I did get a little confused too because you mentioned "line rate" several times.

Its Grisha


Total Posts: 92
Joined: Nov 2019
 
Posted: 2021-03-30 01:33
Thanks guys, thinking about this a lot more clearly now. In reality I do have two separate projects here: better backtesting environment using production core trading APIs, and integration testing environment for the full stack. Expecting to kill two birds with one stone was a mistake.

>A (future) situation where this may come up is if the venue only has a price book ("MBL") and you can use that module to estimate queue position.
@prikolno Really interesting, makes sense that production would borrow queue position estimate from the simulator.

>The strategy layer should't event be aware of if it's inside live or sim.
@EL This is an elegant design constraint for the backtester, will definitely go by this rule.

I hope someday I know enough to tell you guys something you don't know. Really appreciate the info, to two somewhat lost 25 yr olds this is very useful.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."

Maggette


Total Posts: 1325
Joined: Jun 2007
 
Posted: 2021-03-30 09:12
>The strategy layer should't event be aware of if it's inside live or sim.
>>@EL This is an elegant design constraint for the backtester, will definitely go by this rule.

To me that is really really important. Outside of finance I had an application where the backtest/replay/simulation mode of my system was, quasi by construction, also depending on the wall/system clock, and not only on the time stamp within in the messages.

That led to all sorts of nasty problems. All of it went away after I started sending the wall/system clock tick as an event into the system.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

ronin


Total Posts: 689
Joined: May 2006
 
Posted: 2021-03-30 13:08
> But it's hard to believe any serious operation lacks this functionality.

I don't think anybody who needs it lacks it.

Last time I needed it, we hired a developer with experience of that sort of thing. He came in the morning with two USB sticks, and it was up and running by mid afternoon. Within a week, it had been tweaked for what we needed.

I wouldn't bother rediscovering hot water. Just hire who ever you need - they are out there and available.

"There is a SIX am?" -- Arthur

Its Grisha


Total Posts: 92
Joined: Nov 2019
 
Posted: 2021-03-30 21:40
@ronin
>Just hire who ever you need - they are out there and available.

Ideally yes, but we are building from scratch here with some budget constraints. But even on a tight budget, bringing a consultant in for a week who has built this many times is a compelling story.

"Nothing is more dangerous to the adventurous spirit within a man than a secure future."
Previous Thread :: Next Thread 
Page 1 of 1