Forums  > Software  > Scalalab Fslab, functional/quasi functional languages  
     
Page 1 of 3Goto to page: [1], 2, 3 Next
Display using:  

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-15 22:46
Any idea if I can in part replace Matlab/R with any of those?
The idea is to use them for quick time to production.
In any case R or Matlab can be called if needed as I understood.

Logic would be to avoid rewriting the code in a lower level language.

"amicus Plato sed magis amica Veritas"

Hansi


Total Posts: 303
Joined: Mar 2010
 
Posted: 2015-09-15 23:10
I don't understand why the question brings up Scala on one hand and F# on the other, what are you using in production today?

Scalalab doesn't seem to have much traction. And I don't use Scala so I'll reserve commenting on it.

I don't get Fslab... what does it actually add? I've used all of those libraries and most of them are great but I see no point in using FsLab, it doesn't seem to add anything.

Using the R Provider does give flexibility but you're still using R which is slow anyway so you gain pretty much nothing other then ease of interop.

Our current data analytics app is a C# ASP.NET app with a separate in-house built R Engine which keeps multiple R sessions running on the server ready to do number crunching with all packages etc already loaded and whatever data possible cached and primed. It works out okay, not optimal but okay. I didn't use R.NET when building it because R.NET was pretty early stages and buggy when I built this a few years ago.

Going pure F# (+ libs referenced by FsLab) or Scala is most likely most adventitious. I'd go with F# today if we didn't have 130K+ LOC in our R library and trying to get a dozen people to shift languages is not feasible at this point.

Maggette


Total Posts: 1138
Joined: Jun 2007
 
Posted: 2015-09-16 00:20
Hi,

maybe a little bit off topic.

I started to use Scala a lot lateley, because at the moment I use SPARK (non finance application) on a Hadoop/Yarn cluster.

It is quite elegant and I like to code in it..but it comes attached with annoying Maven Shit and SBFUCK typical for JVM world.

As an aside: I hope MSFT will come up with a powerfull and distributed in memory/spill over big data framework and use F# and C# for it ...haaa...dreaming of Linq to Amazon -S3 :)....



Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-16 01:14
In production today I use matlab as in research mode, feeds are written in java or directly matlab with COM interface.

You clarified a few points, in fact probably going with matlab and writing a mix of matlab and scala in production could be a solution.

The real deal would be using one fast language for both prod and research.

"amicus Plato sed magis amica Veritas"

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-16 09:39
Admittedly I am not a great developer "annoying Maven Shit and SBFUCK typical for JVM world" would you be able to expand this comment, thanks for your help.
About scala how is performance compared to java?

Anybody used Jane street's Ocaml libraries? They should be good after all these years.

"amicus Plato sed magis amica Veritas"

Maggette


Total Posts: 1138
Joined: Jun 2007
 
Posted: 2015-09-16 23:24
Hi,

if you are not familiar with the Java/JVM world (and I am not used to Java), you sometimes have problems doing even the most trivial things (stuff you don't event think about when you are using matlab). Dependency conflicts, Java still has no useable native DateTime class and and and ...

If you are coming from a nice and neat framework like the .NET or write most of your applications/scripts in Matlab, R or Python (for things that they are good for) you sometimes get anoyed by this stuff.

Performance Scala vs Java...depends on the problem. In the SPARK universe I am living at the moment, scala is IMO the way to go (SPARK is written in scala).

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-17 16:57
Understood thx, it is not trivial. In your experience .net would be a much more hassle free experience?

Interesting Spark btw.

"amicus Plato sed magis amica Veritas"

Maggette


Total Posts: 1138
Joined: Jun 2007
 
Posted: 2015-09-18 02:56
Imho .NET is by far the better framework. But other people might have different opinions.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2015-09-18 12:19
1) I would say .NET is a better framework but if you are going to run on Linux then you will go with JVM.

2) JVM-based "ecosystems" for distributed computing are far more developed.

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-18 13:15
In this specific case the decision is based on

1) pain in converting matlab code to the target language
2) pain in fixing bugs that are hard to find
3) performance vs C especially in real time
4) ease of integration with external apis
5) time to production

If performance vs C++ is enormously far (say >3 times) then it cannot be a candidate.

Off topic: I have been told that armadillo C++ could be an alternative to functional languages, used together with matlab coder.

"amicus Plato sed magis amica Veritas"

Lebowski


Total Posts: 77
Joined: Jun 2015
 
Posted: 2015-09-18 17:20
I'm sure you likely know this, but comparing programming languages by some scalar factor like 3 isn't really the whole story. It depends what operations you're doing. The real speed advantage of C++ comes from your ability to control background processes like the GC. You can avoid a lot of GC just by making logical choices like boxing, etc. I recommend you check out .NET as well. I wouldn't get too esoteric with functional languages. R/MATLAB/Python have all the bases covered. To me, the idea of having to code something over is kind of not relevant. You can do data analysis in a functional language, but if you're doing event driven backtesting they will be suboptimal. My guess is that if you're concerned about your code being too slow you probably want to be doing event driven testing of strategies anyway, which would mean writing a near production version of the strategy in C# or Java or whatever and iterating your data vs. doing vectorized calculations.

Rashomon


Total Posts: 202
Joined: Mar 2011
 
Posted: 2015-09-18 19:45
MadMax, did you have bad experiences with Mono?

"My hands are small, I know, but they're not yours, they are my own. And they're, not yours, they are my own." ~ Jewel

svisstack


Total Posts: 320
Joined: Feb 2014
 
Posted: 2015-09-19 00:17
>> @Rashomon: MadMax, did you have bad experiences with Mono?

sorry for intercepting this response, but i'm 100% aligned with MadMax respond.

Mono is great for small projects, but not exactly work like should/or like it work in originated environment (.net),
there a lot of problems with common classes and also with linq for example, even several .net versions behind.


www.coinapi.io && www.cryptotick.com

Lebowski


Total Posts: 77
Joined: Jun 2015
 
Posted: 2015-09-19 02:03
I can echo svisstack's experience. Mono is pretty good, but I've ran into issues with System.Linq, System.Data, and others. MS is opening up .NET and starting to actively work with the mono community, so I wouldn't be surprised if the issues were or will shortly be resolved.

Maggette


Total Posts: 1138
Joined: Jun 2007
 
Posted: 2015-09-19 16:24
Echo Svisstack and MadMax too...I tryed to keep .NET and Mono as long as possible. But big projects on Linux/Unix frameworks, and esspecially big data stuff IMHO should be done using tolls from the JVM universe.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

Rashomon


Total Posts: 202
Joined: Mar 2011
 
Posted: 2015-09-20 22:42
Thanks Lebowski svisstack Maggette.

"My hands are small, I know, but they're not yours, they are my own. And they're, not yours, they are my own." ~ Jewel

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-21 15:57
@Lebowski thanks for your help, probably OO is better for production and even safer when you reach a certain project size. Time to production would be too long though.
The point of iterating vs vectorised is exactly the issue of putting the code into prod.
You can run vectorized code on a small dataset in prod (as small as possible), but it is not the safest/quickest option, however rewriting gets you exposed to other issues.

So again scala or f# looked appealing because they are potentially fairly fast and easier to handle for a researcher that knows how to write good code in scripting languages. At least it looks like it.

Thanks also for the feedbacks on mono, that makes the choice even harder....

"amicus Plato sed magis amica Veritas"

MadMax


Total Posts: 424
Joined: Feb 2006
 
Posted: 2015-09-21 21:17
The choice is simple:
- if you are windows-only, f# or scala
- if you are mixed or Linux, scala

you might sprinkle a bit of c or c++ in limited places in production (if needed).
you might also have to use (with ipython: http://ipython.org/ or jupyter: https://jupyter.org/) some python, R, Julia, (and/or matlab) mainly in your prototyping.

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-09-24 19:02
Thank you,

someone I know brought up python and the fact that a lot of production code is written in it.
Also cython can be used to speedup things where needed. How is concurrency with that?

The issue is though: can you deliver python programs packaged in an executable file? Apparently so: py2exe and PyInstaller.

Any experience with both cython and pyinstaller appreciated.
I am sure there are a lot of python coders in this phorum.



"amicus Plato sed magis amica Veritas"

rickyvic


Total Posts: 187
Joined: Jul 2013
 
Posted: 2015-10-01 19:17
I wanted to mention the following link that I found interesting even though old

https://tr8dr.wordpress.com/2010/01/27/the-ideal-quant-environment/


"amicus Plato sed magis amica Veritas"

exodus


Total Posts: 11
Joined: Feb 2011
 
Posted: 2016-01-05 19:41
Little late to the party but I've wrestled with the same exact debate recently having become all too comfortable with my Matlab-esque python prototyping modules that could not scale to large matrix operations even when converting everything to numpy from pandas dataframes. The problem is in the global interpreter lock that is not thread safe for CPU bound tasks. I tried going down the multi-processor road, but got fed up with building low level synchronization constructs to protect critical regions and what not so finally moved everything to a hadoop cluster of the hortonworks flavor and have not looked back since.

Spark is my de-facto solution now for ANY application that operates on large datasets and I routinely run batch prototyping routines on matrices with >1MM rows without as much as a hiccup. The key is tinkering your cluster and esp. the Java heap size parameters in your hadoop configuration files properly to ensure Yarn does not starve any other JVM's it governs when you turn on spark. RDD's abstract the concept of an in-memory dataset across your nodes and Yarns handles all low level memory synchronization across machines. It is written in Scala so writing your code in Scala would be fastest though I use a combination of Java and python (via pyspark) since I don't want to be bothered with learning scala at the moment. You can run pyspark in ipython as well for prototyping. Also, Java 8 comes with functional java, which subverts some of the justification for going the scala route to begin with. It depends on how much tolerance you have for learning new technology stacks, but to be honest I think it is worth it just for the ability to scale with a click of a button using AWS.

jslade


Total Posts: 1182
Joined: Feb 2007
 
Posted: 2016-01-09 01:46
"Spark is my de-facto solution now for ANY application that operates on large datasets and I routinely run batch prototyping routines on matrices with >1MM rows without as much as a hiccup."

J on one node, one thread beats the pants off of Spark on 8 nodes of 16 threads each (same 96G of ram hardware) on the same 100m row dataset, you don't have to fuck with Java heap size parameters, and you can calculate bessel functions (or write stochastic gradient descent) with a couple of characters. APL style data processing is one of those things where you never look back once you've experienced it. The only downside is you need to pay for a license for the DB parts that make this possible, and the row space per column is limited by the memory available on the machine (so you need a big machine).

I tried building things to replace slowpoke Matlab/R in Clojure, Lisp, OCaML, Golang and thought about Haskell and others. For most time oriented big data problems Array programming with columnar db is Radiation.

Of course, most of the time, you should just use R and be happy.

"Learning, n. The kind of ignorance distinguishing the studious."

a路径积分


Total Posts: 80
Joined: Dec 2014
 
Posted: 2016-01-10 05:44
Over the years I've come to realize that, "If you need to ask, don't do it."

The use cases of functional languages are straightforward, as are the use cases of various other things named in this thread. There's no overall clear, theoretical winner, only something that's best specific to your application and practical limitations.

exodus


Total Posts: 11
Joined: Feb 2011
 
Posted: 2016-01-10 22:12
APL looks fascinating. I wonder if building a NN with it's natural array data structures would be tractable or faster than using one of the GPU based libraries.

What's your take on Clojure ? The claims that you can achieve all the benefits of multi-core programming w/o the non-deterministic thread related maladies are appealing in and of themselves as is the LISP dialect and Java inter-op.

Maggette


Total Posts: 1138
Joined: Jun 2007
 
Posted: 2016-01-11 00:06
I do use Spark a lot these days and I am okay with it. But if you live in a huge companyand you are not the admin of the server and a Hortonworks distribution is installed...well you will have to use it. Not that you have a choice.

I also lack experience when it comes to APL style data processing jslade is talkin about, even though I like the "column based" elements in Matlab or python pandas.

But I do think it is easier to sell some scala project to the rest of the IT-crowd and management, than some quite incomprehensible q code ....with no development environment I am aware of that I can take seriously. If you want to write AND maintain industry standard applciations you need more than an elegant way to hammer everythin into one line of code:). I have a hard time to imagining doing that in q or J.

Anyway maybe "kerf" is the better kdb+ from that point of view:
http://www.kerfsoftware.com/

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...
Previous Thread :: Next Thread 
Page 1 of 3Goto to page: [1], 2, 3 Next