Forums  > Off-Topic  > Very Interesting paper on how we don't know shit about how deep ANNs are working  
     
Page 1 of 1
Display using:  

Maggette


Total Posts: 964
Joined: Jun 2007
 
Posted: 2017-09-29 22:17
I don't have any idea hoiw this is possible:
https://arxiv.org/pdf/1611.03530.pdf

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

EspressoLover


Total Posts: 240
Joined: Jan 2015
 
Posted: 2017-10-02 17:44
Really interesting paper. Thanks for posting.

Intuitively, I'd speculate that this is due to the greedy layer-wise training in deep nets. The more incrementally piecewise a model is trained, the less applicable VC dimension analysis is. The full parameter space of the model can shatter the set, but deep nets aren't trained by arg-max'ing everything at once.

I'd imagine that if you think of each layer training step as a standalone problem, then the generalization error is well behaved. The "signal gradient" at each step is much larger than the "noise gradient", so continuously repeating training steps takes us in the right direction. Signal-driven variations in the fitness landscape are usually much more stable and smooth than noise-driven variations. However if the training set is pure noise, the "signal gradient" disappears and eventually the model converges to fitting the noise.

I don't think the underlying mechanics are too different from vanilla boosting like AdaBoost. Similarly you can use very large models, large enough to shatter the training set. And in the case of pure random data, the model will eventually completely fit the training data. Yet in most real world applications, AdaBoost is surprisingly resilient to overfitting. Training incrementally and greedy-wise seems to siphon off signal from noise.

All of this is just random speculation, of course. So, who knows.

Maggette


Total Posts: 964
Joined: Jun 2007
 
Posted: 2017-10-03 15:19
You bring up a couple of interesting points.

"Yet in most real world applications, AdaBoost is surprisingly resilient to overfitting."

Even though I had some pleasent experiences with boosting on real wworld problems at work and some educational daata sets, I had a time where did not use boosting (AdaBoost or Gradient), just because from a theoretical point of view it just adds another way to over fit your data and I didn't had good feeling for how it really worked.

You are right, my line of thinking bacl then was quite close to my thoughts after reading the posted paper.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...
Previous Thread :: Next Thread 
Page 1 of 1