
Any opinions on this research?
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087076
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3102707






No numerical experiments testing convergence. A big problem with RL is that it's not dataefficient. He doesn't show how long it takes for his agent to converge to optimal policy. 




Hmm, the second paper shows examples. Your second statement is wrong too  efficiency of RL depends on how it is formulated. Deep RL is indeed very datahungry due to an astronomical number of parameters needed there. But there is no such general statement like you make. Depending on a formulation, RL may or may not be dataefficient. The third comments is wrong too  see the second paper. 





You have a point, I was thinking about Deep RL.
Re convergence: there are numerical experiments (which report the number of MC paths used), but they only test the model in the BlackScholes world. This is not an interesting case, because we know how to price options in this case. An interesting case would be learning to price an option by observing real market data. Then, the question is: how long does it take for the agent to learn the market dynamics and price the option? This is not answered in the paper.
Maybe I have too high expectations about the method. If all it's useful for is the BlackScholes world, it's not very useful. 




Agree, real data is the most interesting case. In Machine Learning people developed a good habit to try models on synthetic data first  something that I think is underutilized in Finance. As Qlearning used in these papers is a modelfree method, it should work with real data too, and probably have similar convergence speed to what shown in these papers. Due to model independence, I think your high expectations are justified, but of course it would be interesting to see how it works with real data, especially in a portfolio setting, and see whether it will allow us to forget the nonexisting volatility smile problem as a bad dream of Wall Street :) 




Strange


Total Posts: 1434 
Joined: Jun 2004 


People have priced options using neural networks, used all sorts of nonlinear regressions and now this. Itâ€™s a waste of electrons, imho 
I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 



If you don't have a model for the stochastic process driving the stock prices, how are you going to run your Monte Carlo simulation? 




london


Total Posts: 307 
Joined: Apr 2005 


A weakness of applying ML for return forecasting is insufficent volume of (stationary) data, so I hear of people using synthetic data.
The part I dont understand is how does one create synthetic data (to train on) without a model of the system?
But if I did have a _good enough_ model of the system, then why wouldnt i just use that model to trade?
This is a genuine question and im proably missing something blindingly obvious to many folks here. Thank you for a simple answer that my small mind might grasp. 




QLearning is a method that is modelfree  it does not use any distributional assumptions about returns. So you don't have to break your head anymore choosing, say, between Heston and whatever else  it is entirely irrelevant.
Now, to test the framework, you can use synthetic model generated from a known distribution, and test your model/implementation. In this case, you know what to expect.
Hope this helps. 




Strange


Total Posts: 1434 
Joined: Jun 2004 


> QLearning is a method that is modelfree  it does not use any distributional assumptions about returns. In that case, how is it better than simply using historical distributions for pricing? There are plenty of robust statistical methods that allow you to generate distributions from the historical returns and use them for pricing.
Not trying to be a luddite, but (a) I just don't see any appeal to what these people are trying to do in real life (b) don't see any improvement over the other MLbased pricing methods that did not stick 
I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 



Sorry, but this does sound like a luddite talk. What are "other MLbased pricing methods that did not stick"? Are you against ML in general? Do you think that Linear Regression is ML or not? What are your "robust statistical methods" that you think make other approaches obsolete, and how exactly you use them for pricing? 





"QLearning is a method that is modelfree  it does not use any distributional assumptions about returns. So you don't have to break your head anymore choosing, say, between Heston and whatever else  it is entirely irrelevant."
1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the riskneutral measure. Your paper assumes the agent operates in the realworld measure. But then can't just use the riskfree rate for discounting rewards.
2. The datadriven Qlearning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the nexttime value X t+1".
a) In order to use the model, you need to know an unobservable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).
b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?
c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3month options, you need 3x10,000 months worth of data. This is impossible.
If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at nonstandard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised. 



Strange


Total Posts: 1434 
Joined: Jun 2004 


> What are "other MLbased pricing methods that did not stick"? There were attempts to price options using neural nets and using robust regressions. That was fairly long time ago and it was more or less a waste of electrons. My expectation is that this method would be of equal value.
> Are you against ML in general? No. In fact, I use some simple ML approaches in some of my strategies. I just do not believe that ML can be used to price convex instruments IRL, simply because of the dimensionality issues. @london said it far more eloquently.
> What are your "robust statistical methods" that you think make other approaches obsolete, and how exactly you use them for pricing? LOL, what? Did I say anywhere that riskneutral pricing is "obsolete"?
I was simply pointing out that if you really want to use modelfree approach to pricing an option, you can estimate probability density using whatever your favorite method (including KDE, if you like to use ML) and multiply your payoff by that density. That approach is far less datahungry. 
I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 




""my expectation is that this method would be of equal value"  You obviously did not take a look at the papers I asked about, because they offer a very different approach from neural nets, and they go without estimation of a probability density. 



Strange


Total Posts: 1434 
Joined: Jun 2004 


(a) "equal value"  as worthless as neuralnet based learning turned out to be. I have read the paper, though I can't claim to have a deep understanding of the statistics involved
(b) IMHO, it's not the matter of the ML approach, it's a matter of required training data and the resulting dimensionality of the problem.

I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 




Bravo Katastrofa, finally I see meaningful questions that go beyond pure entropyenhancing replies in the spirit "I did not read the papers but here is what I think about neural networks".
To your questions:
1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the riskneutral measure. Your paper assumes the agent operates in the realworld measure. But then can't just use the riskfree rate for discounting rewards.
Once you accounted for risk in rewards, you can discount them using a riskfree discount rate.
2. The datadriven Qlearning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the nexttime value X t+1".
a) In order to use the model, you need to know an unobservable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).
Good point. The agent learns from recorded actions (rehedges). They should 'match' the market, otherwise the trader continuously looses money. In the worse case, you can take purely random actions, the model will still learn the price asymptotically. There is also a version without observed rewards in he second paper.
b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?
These are hyperparameters, not parameters. No need to calibrate them insample.
c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3month options, you need 3x10,000 months worth of data. This is impossible.
No it is not not. You can use bootstapped data for both prices and hedges. Don't see any issue here.
If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at nonstandard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised.
Your suggestion is respectfully declined. Reminds me a suggestion to spend the rest of one's life trying to compute stability of a table with four legs. 




"1. How do you model the discounting factor? BSM model solves this for you by expressing everything in the riskneutral measure. Your paper assumes the agent operates in the realworld measure. But then can't just use the riskfree rate for discounting rewards.
Once you accounted for risk in rewards, you can discount them using a riskfree discount rate."
How do you account for the risk in rewards? Your answer is just "kicking the can down the road".
2. The datadriven Qlearning version is described thus in the 1st paper: "The data available is given by a set of N trajectories for the underlying stock S t (expressed as a function of X t using Eq.(24)), hedge position a t , instantaneous reward R t , and the nexttime value X t+1".
a) In order to use the model, you need to know an unobservable quantity (some trader's hedge positions, i.e. their deltas) to use this method. The agent is not learning just the market dynamics, it's also learning some other trader's model (their hedge positions).
Good point. The agent learns from recorded actions (rehedges). They should 'match' the market, otherwise the trader continuously looses money. In the worse case, you can take purely random actions, the model will still learn the price asymptotically. There is also a version without observed rewards in he second paper. "
The second version still assumes you're observing the trades. Hence, not usable in practice. If I can observe someone's trades, I also can walk to them across the trading floor and ask them what pricing model they use.
"b) Xt is derived from stock prices St using some parameters mu and sigma. Where do you take their values from?
These are hyperparameters, not parameters. No need to calibrate them insample."
You still need to choose their values. How do you do this? Hyperparameters matter. There is a lot of debate about how you can overfit via hyperparameter optimisation. And your model is not dataefficient.
"c) The model requires N such paths (where N is undoubtedly large, say 10,000 or more). In order to price a 3month options, you need 3x10,000 months worth of data. This is impossible.
No it is not not. You can use bootstapped data for both prices and hedges. Don't see any issue here."
I'm surprised that you don't! Bootstrapping assumes conditional independence  and you agent is trying to learn correlations between X_t, a_t and X_{t+1}. You still don't see an issue here? In other words: by bootstrapping, you're making a model assumption. If you use bootstrapping to train your agent, you're not modelfree.
"If you want to build something worthwhile, solve this problem: an agent observes traded stock prices, traded bond prices and traded option prices. They have a given financing rate and can invest cash at some other rate. Then, they learn a strategy to hedge a European option at nonstandard strike. The goal is to maximise the amount of cash in the bank after the option expires or is exercised.
Your suggestion is respectfully declined. Reminds me a suggestion to spend the rest of one's life trying to compute stability of a table with four legs."
I fail to see the analogy, but I gather that you prefer to stick to recasting old stuff using new buzzwords ;) 





Katastrofa, you are of course free to stick to outdated garbage called riskneutral models for another 40 years. Here is another model that keeps actions unobserved and produces a market model.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3174498




pj


Total Posts: 3402 
Joined: Jun 2004 


> outdated garbage called riskneutral models And, pray, why are they garbage?

The older I grow, the more I distrust the familiar doctrine that age brings wisdom
Henry L. Mencken 




Because they only compute the mean of an option hedge portfolio, and ignore risk of mishedges it is not priced. 



pj


Total Posts: 3402 
Joined: Jun 2004 


Not true. If your market is incomplete, you have several risk neutral models, thus several available means.

The older I grow, the more I distrust the familiar doctrine that age brings wisdom
Henry L. Mencken 




Yes, that is correct. So you suggest to estimate risk in options by running a few incomplete market models with different pricing measures? :) 



pj


Total Posts: 3402 
Joined: Jun 2004 


No, I prefer sacrificing virgins. Works same as ML. Magic. 
The older I grow, the more I distrust the familiar doctrine that age brings wisdom
Henry L. Mencken 




Let me know how it goes :)
"Economics ended up with the theory of rational expectations, which maintains that there is a single optimum view of the future, that which corresponds to it, and eventually all the market participants will converge around that view. This postulate is absurd, but it is needed in order to allow economic theory to model itself on Newtonian Physics." (G. Soros)
For a more detailed discussion, you may also find this interesting:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1530046 



Strange


Total Posts: 1434 
Joined: Jun 2004 


Look, for someone that never actually managed an options book to understand what the actual purposes the option pricing model serves. I really think a lot of academics waste a lot of efforts trying to solve problems that do not really have much practical value. 
I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 


