Forums  > Software  > Deep learning  
     
Page 1 of 2Goto to page: [1], 2 Next
Display using:  

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-12 22:54
So, after attending an interesting seminar, I can understand why there is such a hype bubble around this subject. Has anyone else worked on this topic? I generally prefer to use simple models like gradient boost, forests or good old KNN, but there seems to be some "there" there with DL.

I have identified three systems with the bare bones for developing such models. I suspect I will unfortunately have to fool with all of them. For the record, the ones I know about are
1) Theano
2) Torch7
3) Pybrain

"Learning, n. The kind of ignorance distinguishing the studious."

radikal


Total Posts: 259
Joined: Dec 2012
 
Posted: 2014-08-13 06:35
Is Pybrain still supported? I never was really able to do much of interest with it when I tried it for a few problems a few years back. (Compared to standard logit/svm toolbox)

I've never used Torch though I've read the tutorial a half dozen times and promised myself I would switch to it.

I use Theano for a few things non-DL related. I've never really been able to wrap my head around the declarative syntax, which combined with quite unusual error reporting syntax, can make it SLIGHTLY frustrating at times. That said, there is a LOT of good resources for most NNs you'd likely want all over the web.

I've still yet to see much compelling in the finance space out of DL; I know a few people trying to do crazy giant everything vs everything macro stuff in a semi-supervised NN arch but none of what they're doing or what I've seen elsewhere has made me think they're uncovering more than you'd get out of standard linear models, covariance decomp type stuff.

I don't doubt one could replace a few of my production svm-type systems with dropout nets and get similar performance, and that would have the added benefit that training could be made continuous, but the # of new problems created is a bit daunting.


There are no surprising facts, only models that are surprised by facts

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-13 08:21
I have only looked at each of these slightly. Pybrain was recently touted at a seminar I attended, but I agree it is not too impressive so far. Working with badly written Python (aka most of Python code) really cheeses me off, so I'm probably going to end up mostly using Torch.

I am not planning on using DL for anything specifically; I am simply preposterously underemployed and trying to keep busy in a productive way until August is over and someone decides to start sending me checks again. I have a sort of random-forest/KNN melange thing I'm hoping to implement as well (it got 2nd place to a DL network in a contest involving computer vision and is generally cool), and have a small project in matrix approximants, but I figure maybe someone here has noodled with DL, for all the publicity it gets.

"Learning, n. The kind of ignorance distinguishing the studious."

FatChoi


Total Posts: 125
Joined: Feb 2008
 
Posted: 2014-08-13 17:32
Scikit Learn?

FWIW I haven't used Theano for DL projects but the bits of it I have used have worked really well.

radikal


Total Posts: 259
Joined: Dec 2012
 
Posted: 2014-08-13 18:17
Too bad no Torch + ipython NB magic. Seems like a good GSoC project for someone.

There are no surprising facts, only models that are surprised by facts

redandtheblue


Total Posts: 356
Joined: Aug 2007
 
Posted: 2014-08-13 18:44
have you looked at pylearn2? It's build on theano to make using theano easier.
(https://github.com/lisa-lab/pylearn2)

I don't think pybrain is active any more.

radikal


Total Posts: 259
Joined: Dec 2012
 
Posted: 2014-08-13 21:13
Along the lines of pylearn2, I remember this example being quite helpful as YAML originally terrified me.

Pylearn2 is pretty straightforward -- I wouldn't say it obviates the needs to understand the under-the-hood Theano stuff though. Also, I think, performance-wise, it's a bit outdated compared to convnet2 or some of Theano's new FFT stuff, but for the sorts of problems typical in finance...I can't picture it mattering.

There are no surprising facts, only models that are surprised by facts

Steve Castle


Total Posts: 306
Joined: Sep 2010
 
Posted: 2014-08-16 00:26
found it, finally

Been looking for that link since you posted.

I thought they were all bull hockey (massive overfitting) till I read this.

Oh and I had bookmarked this discussion also, as I found it good.



in the words of one such quant ‘were on the whole either less quanted or not quanted at all’.

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-16 03:37
Thanks for those links. I actually know some of the important guys in this field, though I always thought anything with "neural" in it was bunk.

FWIIW, you need to google on the terms "drop out/leave out" to find why they don't overfit any more.

I'm finding out the deep architectures have extremely long fit times unless you run them on clusters with big video cards. The fact that both Torch and Theano are basically ways to get simple interpreted code to run in threads and on a GPU ought to have clued me into this earlier. I should be able to do something with smaller data sets though. This brings us back to why everyone abandoned neural thingees in the first place: you don't need a supercomputer to run gradient boost on a hard problem.

Another quasi neural technique which was smoking hot right before deep learning took off was "reservoir computing." Basically, take a giant randomly weighted recurrent net, just like you do before fitting a deep learning RNN, and then, don't fit it. Just fit the output nodes. Obviously, this is a lot faster, and supposedly it does well on some hard problems. I'll probably play around with this for a bit for the first stage into learning about all this. Maybe some of my thesis background in random matrix theory will come in handy.

"Learning, n. The kind of ignorance distinguishing the studious."

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-20 20:31
FWIIW, Torch7 is a pretty good piece of kit. Once I figured out how to install the thing (the docs on the website are not presently useful), it was simple to build new things, despite knowing nothing about Lua. Downside: weird memory limits in the repl. Embedding Lua in a better repl (J or R) looks pretty trivial though, if it ever becomes a serious problem.
Running some of the demos reminds me of why nobody really uses neural nets in industry applications: nontrivial ones are godawful slow to fit. A video card is in the mail.

"Learning, n. The kind of ignorance distinguishing the studious."

AB12358


Total Posts: 59
Joined: Apr 2014
 
Posted: 2014-08-21 01:13
If you're just trialing it, the AWS GPU instances may be useful to you.

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-21 09:27
While I appreciate the way you can spin up a lot of instances quickly, a $120 consumer card beats what things are putting out. You only get some small fraction of their big GPU, plus, the virtualization overhead (and dealing with the programming complexity of working around all this: time is money). Worst case scenario, I get smoother video.

"Learning, n. The kind of ignorance distinguishing the studious."

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2014-08-21 10:35
You can run dedicated instances and virtualization overhead is pretty small.
It is cheaper to have your own machine at home if you only need one. However, in a corporate environment, for most companies AWS will be cheaper and much more flexible than "in-house" datacenter. We are talking about 4 to 8 weeks to provision a server in a corporation and more than USD 120K for 4 years of service for a 16 cores, 32GB RAM windows server.

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2014-08-21 10:37
My comment about virtualization overhead assumes you use the newer HVM based instances:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2014-08-21 13:11
Also if you are really keen on non-virtualized instances, SoftLayer/IBM has 'Bare Metal' services. Provisioning takes longer (hours) and either you can only provision for a month or you have to if you want a decent deal.

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-21 17:19
Did you read the Netflix report on using Amazon GPUs? Not exactly a success story:
http://www.enterprisetech.com/2014/02/11/netflix-speeds-machine-learning-amazon-gpus/

FWIIW, the GPU I just bought for $120 is vastly faster than the piece of junk that kicked Amazon's butt in that comparison.

My distaste for Cloudy things comes from having to run numerics on EC2 instances for my paid work. As far as I can tell, they aren't saving customers any money over sticking a real computer in a data center. In fact, it's costing them money paying me to wait the factors of 4 or 16 longer it would take to get the answer on an xlarge than it would on my laptop.

"Learning, n. The kind of ignorance distinguishing the studious."

sv507


Total Posts: 165
Joined: Aug 2010
 
Posted: 2014-08-21 22:42
I would go for pylearn2 ( not that I have used it fro anything beyond the demo). http://benanne.github.io/2014/04/05/galaxy-zoo.html mentions it has " Theano wrappers for the cuda-convnet convolution "...3 times faster than theano convolution code.

Personally I am against DL... (and visual bag of words) . it just seems to be a dead end. [Maybe useful to solve a current photo tagging problem, but future is working on standard computer vision - edge detection/segmentation etc] There is a lot of hype about their developing "representations" but it seems to me that all they are good for is automating the search for thresholds etc in visual bag of words models to win competitions .
you might want to look at this paper
http://cs.nyu.edu/~zaremba/docs/understanding.pdf

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2014-08-21 23:16
Sorry for the deviation from the topic:

Jslade, sorry, I think you did not get my points:

Re virtualization and performance in clouds:
- first not all clouds are virtualized. There are bare metal clouds, so no virtualization overhead there.
- virtualization overhead is not necessarily significant, it depends on which virtualization technology
- even in virtualized clouds you can run dedicated instances (at higher cost) so you don't have to compete for the resources with other users

Re costs:
- I agree that the costs are significantly lower for you or me to buy a powerful workstation and have it under your desk. For example these guys built this beast http://en.wikipedia.org/wiki/Fastra_II in 2009 for 6K euros. I built a server grade (memory and motherboard cost 2x more than desktop ones) 16 cores, 32GB RAM with 3 NVIDIA cards for approx USD 2K, and that was in 2009-2010
- However that's not the case for most companies. Most medium and large companies are far from cost effective in running datacenters. First it takes weeks-months to provision hardware. Second, it costs around 5 times more than you or me for same spec workstation (in terms of TCO). That's due to buying more expensive equipment, higher data center costs, insurance, support, etc. Also they usually buy windows server OS and licences for that are significantly higher that windows pro.


jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-22 02:41
@Max: Yes, yes, I know all these things, though I do not approve of any of it. I have spent years on clouds; virtualized, not virtualized, dedicated and whatever. I will only torment myself with these ridiculous contraptions when compensated by large checks, or when there is no other option. It was suggested I try Torch7 on the AWS GPU things. My answer to this is: no. Thx for Fastra link though; that could come in handy one day.

@sv507: I am (semi) productively using Torch7 already. I spent two years playing with the Phrench thing it is based on, so the learning curve is only in understanding how to express myself in Lua. Lua is unpleasant and primitive, but it's also really simple. I actually agree with your sentiments about deep learning, but experience trumps intuition. There are also several promising neural architectures I've been meaning to play with for years, and Torch7 is a good tool for building them. I'm pretty sure it can all be wrapped into R packages if they turn out to be useful for something.



"Learning, n. The kind of ignorance distinguishing the studious."

Polter


Total Posts: 130
Joined: Jun 2008
 
Posted: 2014-08-25 19:02
Recently ran across this -- http://www.iro.umontreal.ca/~bengioy/DLbook/ -- thoughts?

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-08-25 22:20
Bengio is definitely one of the grande fromages of the DL world. CH13 will come in handy when it is done. Thanks for the link!

"Learning, n. The kind of ignorance distinguishing the studious."

Polter


Total Posts: 130
Joined: Jun 2008
 
Posted: 2014-09-02 18:51
There's also "Neural Networks and Deep Learning" by Michael Nielsen.
It seems more standard (from a brief look it seems that so far it focuses on "regular" ANNs) / introductory, although potentially interestingly approachable:
http://neuralnetworksanddeeplearning.com/chap4.html

jslade


Total Posts: 1148
Joined: Feb 2007
 
Posted: 2014-09-03 00:44
I've been knocking off this reading list:

http://deeplearning.net/reading-list/

The most educational thing, of course, is writing code and running it on interesting data sets. Anyone who messes with this should do at least one "ground up" ANN to understand what they are. Most of the packaged code is impenetrable until you do this; at least that was my experience.

"Learning, n. The kind of ignorance distinguishing the studious."

gax


Total Posts: 18
Joined: Apr 2011
 
Posted: 2014-09-06 22:22
I've been spending a bit of my spare time reading the stuff written by Mumford and Grenander on pattern theory. There those guys take the time to write down the pdf's defining the given patterns they're looking for. Once they've got the pdf they apply Bayes to get an inference. I was wondering if deep learning is making such an approach redundant?

sv507


Total Posts: 165
Joined: Aug 2010
 
Posted: 2014-09-07 00:56
That's certainly the claim of deep learning proponents: ~ they disparage the standard approach as 'hand crafting feature descriptors' for SVMs and compare to deep learning doing it all automatically. Basically there is this claim that you don't need prior knowledge there is enough data to extract the right representation. (And I am sure everyone on this forum has had to deal with someone with no experience of financial markets telling them that they just have to feed data into their statistical model and it will learn to predict the stock market etc.)

Personally, I think there is a lot of bull : conv nets [which I believe are the only successful deep learning approach] are precisely designed to mimic existing computer vision techniques ( basically banks of filters)

So I would argue that the only 'successful' approaches actually do use a lot of prior knowledge.

The question is what is 'successful'? Basically you can train a deep NN to perform well on a given object tagging problem. But that is not necessarily the goal of computer vision. I would say you need a system of interacting modules ( processing colours/ shape/ size/depth ...) its not quite clear how one can build such a modular system
based on the high level misclassification of object tags (as deep NN) approaches do.





You might want to look at this paper - essentially they search for the minimal pixel distortion that cause each different deep convnet approach to fail to recognise an image ( as in examples above). [AlexNet denotes Alex Krizhevsky - one of the conv net successes]
http://cs.nyu.edu/~zaremba/docs/understanding.pdf
I think this highlights the issue that deep NNs do not have a true 'visual' representation ( ie colours/texture etc/ segmentation into separate objects etc) which causes them to make crazy misclassifications. To me this is rather reminiscent of the back testing fallacy that you guys have to deal with all the time- its 'easy' to find a model that performs well in a back test.

Previous Thread :: Next Thread 
Page 1 of 2Goto to page: [1], 2 Next