
Most DeepMind papers are sent to conferences, where they are peerreviewed. You can even read the reviews: https://openreview.net/group?id=ICLR.cc/2018/Conference etc. 





What do you mean to say by that? 




"As you saw, words "modelfree" have some marketing appeal :)"
Is that why you called your approach "modelfree"? ;)
"My current view is that there is no strict separation between 'modelfree' and 'modelbased' approaches."
I agree, to some extent. Implicitly, there's always a model of reality. There's no learning without bias. Maybe we should talk about "(explicit model)free" methods.
"Whenever you add a regularization and it's role is important, you admixture a 'modelbased' component to a 'modelfree' component."
Ditto with bootstrapping ;)
"It's been a while since I looked into Google's Atari paper, so do not recall details of what exactly they did with regularization. But in general, my sense is that deep learning is not a good approach for trading applications, so I largely stopped paying much attention lately to this. I think RL is more promising."
Why the opposition RL vs DL? Synthesis is "statistical learning". 





Yeah man, now you are getting it! Smart marketing is important in science too :) Do you know the history of deep learning revolution?
Re RL vs DL: RL is a paradigm, DL is a method. RL focuses on the *main* task  to act/trade optimally. DL focuses on an *auxiliary* task (prediction of returns or something). Its is a huge difference of the paradigm. In practice, these two things can be blended of course  see Deep RL, modelbased RL, etc. 




"What do you mean to say by that?"
You seemed to imply that peer review is finished in ML. It isn't. 





No, that's not what I meant to say. What I meant is that even by high standard of peerreviewed journals, statistically most of published work in branches of science I am most familiar with (physics, statistics, ML) is either crap of 'quiet pathology' (using the words of Landau).
If you think that everything that comes out of Google is genius and smart, you might find it interesting to read a series of blogs on RL by Ben Recht. One example he mentioned  a recent Google paper claimed that evolutionary algorithms work as well as policy gradients in RL. As Ben points out, it actually means that policy gradients are as inefficient as evolutionary algorithms, because they essentially amount to a pure random search. And so on :)
Enjoy:
http://www.argmin.net/2018/02/20/reinforce/





"you might find it interesting to read a series of blogs on RL by Ben Recht"
Yeah, I went to his talk once. He is a very eloquent guy, but he set up himself a bit of a strawman  he showed that for a simple problem, a modelbased approach kicks the shit out of a modelfree. I don't think anyone in Google (or Facebook, or Microsoft) disagrees with that. And you can take the modelbased approach much further, to a level of a nuclear plant or further (but let's not mention Chernobyl, OK?). But the problem is that the problems which e.g. DeepMind is interested in (bulding a real AI, basically) have no models. If you can come up with a solvable model of intelligence, pack your suitcase and hop on a plane to Stockholm ;)






Yes, but I thought here we are not as ambitious as Google, and not really after models of general intelligence? Ben's blog talks about basic RL algorithms, not the "AGI'.
I was very enthusiastic on DL initially, but the more I understood about it, the less I actually liked it. I am very skeptical about learning in a completely 'modelfree' way. At least in finance, I think a good way is first to take a simplest semisolvable case, and solve it from the start to the end using only RL, but in a completely controllable setting, where the solution just reduces to a bunch of linear regressions. This is what I just did in my papers. 




I think basic RL was always meant to be a toy model for doing the big thing. 





You mean the objective of RL in general? RL is definitely on a path from Supervised Learning to AGI, but it is a tiny step. I don't think we should confuse the current stateofthe art in RL with AGI  they have very little in common IMHO. 




You're saying that as if you knew where the path ends ;) 





well, precious little I understand about AGI tells me it is quite different from the modern RL. I had some fun playing an arrogant ass among arrogant traders, but I did not think about playing a prophet yet :) 




Hi Nudnik,
Interesting approach, I though of doing something along these lines but never got to it. Good to see someone putting in the effort!
One note: if you are using resampling, you are destroying serial correlations and therefore are not completely modelfree. Still, this is close, and the best one can do.
The interesting part about using artificial data is that you know what the "correct" answer is: your agent should, over time, rediscover BlackScholes. The question is how much data you need for that: 1 year, 10 years, 100 years?
I also suspect that if you use a stochastic vol model to generate your artificial data, learning will be slower than in the GBM case, as your agent will need to observe rare events to rediscover the vol smile. How much slower the learning is, is an interesting empirical question.
The real world process is more complex than the GBM or stochastic vol cases (and unknown to us), so learning would be even slower. I suspect that the answer would be that you would need orders of magnitude more data than is available for your agent to be able to learn in the real world, but it would be great to have some numbers to back that up.

"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016) 



Strange


Total Posts: 1501 
Joined: Jun 2004 


> I also suspect that if you use a stochastic vol model to generate your artificial data, learning will be slower than in the GBM case, as your agent will need to observe rare events to rediscover the vol smile.
The agent will also have to know about the upcoming events  NFPs, FOMCs etc. Unless, of course, you back these things out of the current market prices. 
I don't interest myself in 'why?'. I think more often in terms of 'when?'...sometimes 'where?'. And always how much?' 



That would be true with real data, but if you are playing with artificial data generated by a stochastic vol model, these things do not exist. All the agent needs to figure out is the generating process and its parameters e.g. vol, vol of vol, mean reversion, etc...
Hope what I am saying is clear. 
"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016) 




Hi NeroTulip, thanks, these are all questions that I wanted to explore more when have a bit more time, maybe in the summer.
yes, it will learn the BS itself, I mentioned it in the paper. I did so far very few experiments by training with noisy hedges (by randomly perturbing the optimal hedges by +/ 50%, and then using them for training). It showed quite a little slowing down in learning in comparison to learning with optimal hedges. This gives some optimism about how it will perform with real data. 




"One note: if you are using resampling, you are destroying serial correlations and therefore are not completely modelfree."
My point exactly!
What Nudnik could do is to phone up a few hedge funds and ask them for some very old trade histories, train his agent on them and see if he can recover the pricing model used by the hedge fund. 





Katastrofa, I did not disagree with you about resampling. And on hedge funds  yes, that was the plan, but most of my contacts in industry work in equities not options, so I did not find any real data for this. If anyone has option data that you could share, please PM me. 




Gents, here is another question for you:
In the third paper I mentioned on this forum, I tried to build a multiperiod version of BlackLitterman using methods of RL. One of the proposed outcomes of this model is that the effective asset price dynamics is given by a Geometric Mean Reverting (GMR) process with signals:
where mean reversion $ \kappa $ is proportional to a linear impact model parameter $ \mu $, and timedependent mean level $ \theta_t $ is a linear function of signals.
The conventional lognormal dynamics is recovered from the above in the limit $ \kappa \rightarrow 0 $, but this is a wrong limit to take, because it describes different "physics".
The GMR process was used for commodities and real options. I was wondering if such process was used by anyone in equities space, and also if in general such process might make sense for equities. Any thoughts?





ronin


Total Posts: 401 
Joined: May 2006 


@nudnik,
I guess I don't understand why you need hedge positons to fit to. Conceptually you can reconstruct the optimal delta hedge just from the underlying paths. How do the hedge positions help?
Your mean reverting process doesn't sound like anything from equities. This sort of thing is used to model volatile forward curves, and equities don't have volatile forward curves.

"There is a SIX am?"  Arthur 



Ronin  it is called Qlearning: learning from *actions*, i.e. hedges, when applied to trading...
My question on the GMR process for equities is whether it *outright contradicts* any known facts about equities. 





Download a price series for IBM and find out ;) 




you probably mean that any autocorrelations implied by such dynamics would be exploited to nonexistence? If yes, I am not sure it is a complete answer.
Actually, I did it for the particular case of the IBM stock. Can you be more specific? 




ronin


Total Posts: 401 
Joined: May 2006 


> t is called Qlearning: learning from *actions*, i.e. hedges, when applied to trading...
The fact that it has a name doesn't really answer the question. Why do you need hedge positions? Does it converge faster when you are learning using hedge positions? How much faster? Do hedge positions introdue a bias? How much bias? How much error in hedge positions can you tolerate before you introduce a bias?
> My question on the GMR process for equities is whether it *outright contradicts* any known facts about equities.
Yes, it does. In the large price limit it compounds linearly, not exponentially. Which would imply something pretty strange about the cost of funding when stock prices are high. Can't even be bothered to think it through.

"There is a SIX am?"  Arthur 



So don't do then, save your brain cells. Any other opinions? 



