Forums  > Software  > Any advancement on compiled vector oriented languages  
     
Page 3 of 3Goto to page: 1, 2, [3] Prev
Display using:  

doomanx


Total Posts: 103
Joined: Jul 2018
 
Posted: 2020-04-29 08:20
In the end tools like pandas/numpy aren't designed to be used for *massive* datasets. If you get to a size where these operations become painful it's a sign you might need to switch to doing things in memory or distribute, for example by using Dask https://pandas.pydata.org/docs/user_guide/scale.html.

did you use VWAP or triple-reinforced GAN execution?

Maggette


Total Posts: 1288
Joined: Jun 2007
 
Posted: 2020-04-29 09:02
@kloc
You are probably right. If I think about it, I actually convert a lot via ".values" while using pandas (probably most of the time when slicing) and hence probably numpy than pandas. In addition I use dask when the data set is large and fall back to the "for looping hell" using numba when my code is too slow. Probably that's why I never felt pandas was slow because I never pushed it to the max.

@doomax I made very nice experiences with dask so far. The PySpark API of Spark is alright as well and I use it in several projects. But dask feels easier to use on a single machine.

and by the way: your signature is awesome!! :)

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

sharpe_machine


Total Posts: 71
Joined: Feb 2018
 
Posted: 2020-04-29 11:29
Getting back to Julia... What are some examples which motivate one to switch from Python (Numpy/Pandas, sometimes Numba) to Julia? Assuming Python is being used to create features from raw data and then feed it into a "black box" algorithm written in C++/Rust (already top performance)

prikolno


Total Posts: 83
Joined: Jul 2018
 
Posted: 2020-04-29 15:18
>What are some examples which motivate one to switch from Python (Numpy/Pandas, sometimes Numba) to Julia?

Only one I can think of is native support for multiple dispatch, which can be really nice for building features.

sharpe_machine


Total Posts: 71
Joined: Feb 2018
 
Posted: 2020-04-29 20:11
Does not Numba's @jit without signature implement sort of multiple dispatch in Python? It starts with some "default" signature and then re-compiles a function given a new unknown signature which then stored for future calls. Not sure how dispatch is implemented in Julia though.

smt1


Total Posts: 5
Joined: Jun 2012
 
Posted: 2020-05-01 06:26
I don't have much experience with numba so I can't compare its dispatch system relative to julia's. I mostly use cppyy w/ cython&pypy depending on what I'm doing.

I think Julia's most clever design choices are combining multiple dispatch and the macro/metaprogramming system with a python-like syntax.

Like this guy: https://flow.byu.edu/posts/julia-c++ I've run into occasional gotchas that absolutely kill performance in Julia if you're trying to write superelegant terse numerical code that matches C++ performance w/o doing a lot of profiling, but the situation seems to be steadily improving w/ each julia version.

I've also found julia's distributed/parallel support really great, as you'd expect from a modern language that didn't have to deal with legacy cruft.

Julia's multiple dispatch system has a lot of interesting consequences. I think it's tooling of libraries will have a very different graphical structure than something like python.

rickyvic


Total Posts: 232
Joined: Jul 2013
 
Posted: 2020-06-01 15:10
I find the julialang very interesting and python I am hating it more and more over time.
However python has become very powerful for everything except low latency stuff where C and C++ are necessary IMO.

Any way we can use julia to work asynch without having to use dedicated threads?
For research I dont think there is a need to switch to julia but I would be interested in using it in production for real time applications (so no research code rewriting).

research: I am still using matlab and hdf5 (they work reasonably fast), then you need recode to C and frankly it is really time consuming for me.


"amicus Plato sed magis amica Veritas"

Jurassic


Total Posts: 398
Joined: Mar 2018
 
Posted: 2020-06-01 15:26
@rickyvic why the new hated of python?

rickyvic


Total Posts: 232
Joined: Jul 2013
 
Posted: 2020-06-03 14:39
I just cant stand the syntax, but more than anything I cant make it work fast in loops hen it is necessary to use them, or better it is a big waste of my time to look for solutions to speed that up.
Then add the versions nightmare and compatibility nightmare and here you go.

But I reckon it has a huge amount of accessible functionality in many fields and probably if you can work Cython or write C code here and there it can be as fast as you want.

"amicus Plato sed magis amica Veritas"

nikol


Total Posts: 1321
Joined: Jun 2005
 
Posted: 2021-01-18 10:58
geez.

After scanning this thread I concluded that python and Matlab are popping up through the pile of programming languages.

My bits:
- oldie memory/object pre-allocation improves things significantly. In python, if I want to use functionality of numpy or pandas, I create these objects at the very beginning. "No-No" to dynamic allocations. Same for Matlab. First, what I usually do is the development of setters/getters around preallocated object to simplify the next steps.
- Never compared Matlab vs python-numpy vectorizations by speed.
- I like iPython and find it quite good. Matlab is still better as a inline machine and because of documentation-help capabilities. The latter is especially good, because it is easy to get inline help and clients like it.
- My main complaint against python is the mess with scoping. Every time I lose attention, I step into this sh*t. Hammertime which causes loss in debugging time.

In fact, I was looking for the answer to another (lame) question:

Do I really need Scala to work with Spark? Or pyspark is good enough? My core system is in python. Yes, I've read all curses..., but still.

... What is a man
If his chief good and market of his time
Be but to sleep and feed? (c)

Maggette


Total Posts: 1288
Joined: Jun 2007
 
Posted: 2021-01-18 12:53
Hi.

For almost anything you can think of PySpark will be enough.

Spark offers different APIs. The DatasSet API, the DataFrame API (which IMHO is still the most performant API...even though a DataFrame is just a DataSet[Row]) and the RDD API.

For most ETL stuff you will be fine with the Spark DataFrame API and Spark SQL. There are a lot of blogs on the web that claim that there is no difference calling the Spark API from Python or from Scala. That is utter nonsense. If you define user defined functions using PYthon, you will loose a lot of Sparks Data Frame optimization.

But you will be fine. For most practical parts PySpark is alright and actually has no downside to calling the API via Scala.
HTH

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

bandi_np


Total Posts: 10
Joined: Nov 2020
 
Posted: 2021-01-18 14:53
An interesting alternative is Dask which does not rely on Scala (sorry missed the previous posts talking about it)

Maggette


Total Posts: 1288
Joined: Jun 2007
 
Posted: 2021-01-18 15:06
Dask is way less developed at this stage.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

rickyvic


Total Posts: 232
Joined: Jul 2013
 
Posted: 2021-01-21 08:05
Fast forward 2 years and see.... I am still stuck with damn matlab.
Python tends to win if you have a lot of man power (very easy to source in python) can be real good for most things, still dont want to migrate all my code to gain zero advantage.

Julia is evolving still work in progress

"amicus Plato sed magis amica Veritas"
Previous Thread :: Next Thread 
Page 3 of 3Goto to page: 1, 2, [3] Prev