Serg


Total Posts: 2 
Joined: Aug 2018 


I wish to develop a random market on a locally implemented matching engine. There are many use cases of it including training and competition of nonrandom bots in this environment, but it's a different topic.
Since price (and all data driven indicators) is a direct function of orders and the matching algorithm, it's reasonable to model random "traders" or random orders instead of random price. Here are a couple of directions that I consider:
1. Based on empirical orderbyorder full depth data. Suppose, I convert millions of data events it into a form like this: dT  time diff from the previous event ActionType  Send, or replace, or cancel an order Side  Buy or Sell PriceLevel  relative distance to best Bid/Ask according the side Buy or Sell (some actions may have "negative distance" and may lead to executions) Quantity  size of order
Then at each step my random bot randomly selects an action from this list and executes it.
2. Similar to the above, but each of these fields will have a separate probability distribution. These distributions can still be derived from empirical data, but the resulting action in most cases will not be a replication of any of actions in the original data.
What model do you think is better? Both seem simple, but the first model requires to solve the cases of actions like cancel / replace when order of such size doesn't exist at that price level.




bullero


Total Posts: 33 
Joined: Feb 2018 


I might be able to share some experience here:
What you might want to do is to model the arrival times of the market and limit orders plus cancels as Poisson processes. Limit orders arrive either on the bid or askside on "i"th level (iminimum tick sizes from the top level of other side of the market). Market orders eat the limit order volume from the opposing side. Cancels eat the limit order from the same side.
Modeling limit order arrivals: When speaking of a "prototypical LOB", the depth near current mid is the most active and the activity decays further you go from the mid (that is, up or down the prices). Hence, assuming one single arrival rate for all levels might be unrealistic. You might want to model the Poisson arrival rates for each separate level using a power law  arrival rate gets lower/decays as you move away from the mid market.
Modeling market orders: Again, you model the arrival times of market orders as per above assuming some arrival rate. Here you might want to model the arrival rate to be time dependent from market close. The arrival rate is usually more pronounced during the first hour and picks up again close to market close. What you want is to get that "U"shaped volume graph. After you have done the coding it is easy to find the empirical lambda function from the messages.
Modeling cancels: I guess the stylized fact about cancels is that they seem to be dependent on the resting volume at the price level. What this means is that the arrival of cancels is dependent of the current state of the LOB. For simplicity you may assume that the arrival rate is linearly dependent on the resting volume. I suggest that you model the limit and market orders first and then add the cancels as a final touch. In my opinion it is the cancels that cause your book to be either "too deep" or "too thin"  which again cause unrealistic price behavior in your simulations.
Volumes: The shape of the volume distribution seem to be quite dependent on the underlying asset. Also, note that some even numbers such as 1,5,10,20 so on and so forth seem to be quite probable.



Jurassic


Total Posts: 199 
Joined: Mar 2018 

 
nikol


Total Posts: 640 
Joined: Jun 2005 

 