Forums  > Software  > administering a busy loop  
     
Page 1 of 1
Display using:  

bullero


Total Posts: 72
Joined: Feb 2018
 
Posted: 2020-07-29 07:19
Let's imagine a really simple piece of software: you have a socket and a busy loop. For every cycle, you check if there is stuff in that socket. If stuff - read n bytes to buffer and calculate. If no stuff - do something else.

The calculation is always the same but uses parameters that are time-dependent. Therefore, you need to update the parameters periodically where the period is many orders of magnitude larger than the average time of the busy loop cycle.

The parameters are sent as a string to the software.

Just out of curiosity - how would you architect this whole thing such that the design minimizes performance drag w.r.t the busy loop?

nikol


Total Posts: 1198
Joined: Jun 2005
 
Posted: 2020-07-29 09:21
pseudo-code:

calibration loop:
-- sleep(1s)
-- if size(data_sample)>0:
---- calibration = calibrate(data_sample) // depends of the design of your calibration of model. You can also split it into two steps: full_calibration (slow and precise), incremental_update (quick but with accumulated error, like e.g. kalman or something).

itching loop:
-- sleep(1us)
-- if buffer.is_full():
---- buffer_copy = copy_and_flush(buffer) // to minimize time of buffer usage. Another way is to use 2 buffers, one is in work and another is in processing, then swap them.
---- data_sample, triggers = process(buffer_copy, calibration, portfolio) // here main thing is that calibration did not change, but data has changed, so you need urgently to check if trade trigger is flagged.

1s and 1us - are arbitrary. Use your own.
Consider loops already wrapped into task (thread). Each task add to scheduler or pool of tasks (or whatever mechanism you use).

steamed_hams


Total Posts: 5
Joined: Jun 2020
 
Posted: 2020-07-29 09:28
I think if I was doing this I'd set up another thread that receives the parameter string and passes it to your main thread via something like a socket/pipe, and your main thread can then select over the two. You're already making a syscall so I don't think adding another file descriptor will add significant overhead.

Another option is to have some shared memory between the two threads, something like a ring buffer where you have a producer thread that listens for parameter updates and adds a struct containing these to the buffer and increments some index. The main consumer thread can then on each receipt of bytes check if the ring buffer writer's index has increased, and if so read from that frame in the buffer and update its parameters. You can avoid using a mutex this way which is nice, however you may need to add some memory fences to ensure that you don't get weird race conditions (e.g. producer thread increments its index before the write is complete, so the consumer gets junk).

(edited to fix typo)

EspressoLover


Total Posts: 451
Joined: Jan 2015
 
Posted: 2020-07-29 16:28
IMO, the best approach is the simplest design that's inside your SLAs. So step 1 is to formally define your requirement. Step 2 is figuring what the easy approach is for your runtime environment. For example the the Java approach will probably cut against the grain in Golang.

One thing that was a little ambiguous: the time-dependent parameters. Do you need to use their time-synchronized value, or can you always use the latest real-time update? So for example, say params-X are injected at T+5 and params-Y at T+25. Then you're evaluating the buffer snapshot-Q at T+24. Because of race conditions, params-Y have already been ingested. Are you required to defer back to params-X because they were active at T+24? Or is it fine to use params-Y, even though they're not formally active at the time stamp? This seems pretty pedantic, but it actually has pretty deep implications on the architecture.

I mostly agree with @nikol and @steamed_hams. Two threads with shared memory is the way to go. Assumptions: Linux C++ with standard library, latest-real time injection is fine, the SLAs are 99th percentile O(10uS) and O(100ms), you're using kernel bypass (i.e. no syscall overhead), param injection times are uncorrelated with buffer update arrival times, and the param struct is O(1KB) or can be updated in atomic chunks of O(1KB). This is all pretty typical for low-latency auto traders, where trading parameters get periodically updated by a human operator.

A ring buffer is an elegant lock-free solution. But to be honest, I think you'll be just fine with mutex locking. Modern Linux uses futexes underneath pthread_mutex. Acquiring an uncontested lock takes less 50 nanoseconds or less. Not enough to meaningfully change performance. Lock contention on the hot thread should be extremely rare. The injection thread can do all its parsing work outside the lock, build a prototype param struct, then only needs to hold the lock to copy the pre-constructed object.

You're maybe talking about ~500 nano contention window per injection. Even if injections occur every 10ms, we'd only expect high-latency lock contention once per 20,000 updates. I.e. not enough to impact 99th or even 99.99th SLA.

Good questions outrank easy answers. -Paul Samuelson

nikol


Total Posts: 1198
Joined: Jun 2005
 
Posted: 2020-07-30 10:07
Since long ago, I use swap of pre-allocated buckets instead of ring buffer:
fill A-bucket - replace with B-bucket:
meanwhile
- process A-bucket and then flush.
- fill B-bucket, when it is full, replace it with A-bucket and repeat.

you can also pre-allocate 3 buckets and more.

Advantage:
- buckets are isolated
- lock is applied only tiny instance of time during swap
- (linear) buckets can expand when needed or just add more buckets or swap them more frequently

PS. Just realized a thing: basically I am talking about ring of refs to linear buckets !

NIP247


Total Posts: 550
Joined: Feb 2005
 
Posted: 2020-07-30 12:29
Nikol,
Sounds like double-buffering of your screen paint back in the days when you had one screen cycle to paint whatever and then test the beam for top of screen and switch screen address...

On your straddle, done on the puts, working the calls...

Its Grisha


Total Posts: 61
Joined: Nov 2019
 
Posted: 2020-07-30 18:50
Curious how this relates to FPGA since I've dealt with code like this (using a ring buffer as mentioned) but not FPGA.

Would it be correct to say that in place of the tight loop you have an FGPA (essentially acting as a state machine ready for message), and you are updating conditionals via PCI? How does FPGA manage the same parameter update issue e.g. an unfinished write?

bullero


Total Posts: 72
Joined: Feb 2018
 
Posted: 2020-09-08 10:18
@Its Grisha, this is all software.

Thanks for your ideas guys (and gals). I decided to do (dramatic) refactoring and implemented a standard epoll_wait :).

@EL, sounds reasonable but this is single threaded.
Previous Thread :: Next Thread 
Page 1 of 1