 upatree
 Posted: 2019-10-29 23:20 I am looking at a market in which there are scheduled events that lead to on average 7% pumps in the relatively larger instruments, and 100+% pumps on no volume in the smaller instruments (no prizes for guessing which market this is). The time of the event is known beforehand, and on average lasts a couple of minutes, but the exact instrument is not. My overall goal is to try and classify the instrument to be pumped before the event, or at least narrow down the possible set.My conjecture is that instruments with higher volumes than normal before the event occurs are more likely to end up being the pumped instrument (insiders taking positions beforehand), and I would like to test this more rigorously. In particular I’d like to define “higher volumes than normal” in a reasonable manner. However, I’m not sure what is the best way to go about doing so (aside: if there’s a way to build intuition for how to go about verifying/disproving these types of conjectures, I’d love to hear it). Suppose I have minute-level bars for each instrument (currently have hourly but am working on getting minute level), and ~300 of these events. Universe of instruments is probably on the order of ~500-10000?My first thought is to do something like this: for each event, and each instrument, take the ratio of the volume in the bar just before the event against its historical mean volume. Then visually inspect to see if the ratios of the pumped instruments cluster away from the others.Is this a reasonable approach? Too simple to do the job?
 TonyC
 Posted: 2019-10-30 06:31 I would flip the problem around. Instead of calculating the volume per time unit of 1 minute, I would calculate the average time to execute, say, ten thousand cars ... And then be on the look out for when that average time to execute 10,000 cars starts to speed up significantly before the event flaneur/boulevardier/remittance man/energy trader
 gmetric_Flow
 Posted: 2019-10-30 12:11 Something like volume imbalance bars or dollar imbalance bars might be useful here. Maybe take a look at this for some inspiration?
 gmetric_Flow
 ronin
 Posted: 2019-10-30 14:25 The keyword to google for is "vwap profile".Use your historical data to construct the vwap profile when there are no events, and normalize the realised traded volume around the events you are interested in to the profile. It is a dimensionless number that you can then play with to try to correlate with variance and/or directional price moves. "There is a SIX am?" -- Arthur
 Craig
 Posted: 2019-10-30 20:43 To be clear, are you referring to this type of thing?https://www.tradewinger.com/vwap/
 ronin
 Posted: 2019-10-30 22:36 Oh dear, google does return a ton of rubbish. Sorry about that.What I mean is:- divide the trading day into buckets. Ten minute buckets, one minute buckets, three minute buckets - doesn't matter- for each bucket, average the traded volume during that bucket over the last n days.That is called the vwap profile. It is u-shaped. Most volume trades after the open and before the close, and there is less volume around mid day.Look at some literature on algorithmic trading. I personally like Barry Johnsons book, but there are a few others out there. They all explain vwap in a fair bit of detail.Some links:https://www.researchgate.net/publication/237145755_VWAP_execution_and_guaranteed_VWAPhttps://www.researchgate.net/figure/Average-trading-volume-v-intraday-time-minutes-The-left-panel-is-for-2S06-and-the_fig1_309689185https://www.valuewalk.com/2014/03/hft-algorithmic-trading/ "There is a SIX am?" -- Arthur
 Craig
 Posted: 2019-10-30 22:49 I'm glad you clarified that! Thanks.
 upatree
 Posted: 2019-10-31 01:59 Thanks everyone - super helpful stuff. Replies in order:@TonyC - This sounds like an interesting approach. Do you have any good references of where this sort of volume-based bucketing is applied, and why it makes sense? My rough intuition would be that each unit exchanged provides some amount of “information” about something. Therefore if we’re getting lots of units in a shorter period of time it’s an informative period. Is there a different way you would characterize it?Also, should I be calculating this average time to execute against a separate benchmark for each instrument? Following the car analogy, let’s say that I have one company that on average takes 1 week to execute 10000 cars, and I have another company that on average would take 10 years to execute 10000 cars (ie. average volume is orders of magnitude smaller). In this case, it would make sense to look at the relative magnitude of change in average time to execute wrt to each company, is that correct?@gmetric_Flow - Thanks for the link. I’ve yet to take more than a cursory glance but volume imbalance bars sounds like it might be a good measure. Not sure if I can gather all the tick-level data I need though.@ronin - Thanks for the paper links. I assume the book you’re referring to is Algorithmic Trading & DMA?I’d like to check my understanding for constructing and using the profile. Example: I would construct the profile using some time span, for example, buckets for each hour of day for the last 10 days before the day of an event. When you say normalize the realized traded volume around the events you are interested in to the profile, do you mean (x - bucket_mean) / bucket_stddev? I could then do something like regress returns on this number, correct?@ronin and @TonyC, what are your thoughts on each other's respective approaches? To me it sounds like one is bucketing by volume while the other buckets by time (or maybe time of day), and both seem reasonable. Any suggestions (literature, etc.) on pros/cons?
 TonyC