Forums  > Software  > Any advancement on compiled vector oriented languages  
     
Page 1 of 2Goto to page: [1], 2 Next
Display using:  

rickyvic


Total Posts: 188
Joined: Jul 2013
 
Posted: 2018-11-20 10:16
Was curious if julia or similar langs have been effectively used for production trading systems.
Stuck on MATLAB r and python for research then c++ is a pain to port to, especially if there is no ultra low latency requirement

"amicus Plato sed magis amica Veritas"

TonyC
Nuclear Energy Trader

Total Posts: 1309
Joined: May 2004
 
Posted: 2018-11-20 12:02
kx.com (k/kdb/q) is used at a number of hft shops. and a number of exchanges. and a couple of phone companies andmore than one intelligence agency.

kdb's predecessor (APL) is the core language upon which simcorp and fiserv are built.


flaneur/boulevardier/remittance man/energy trader

bullero


Total Posts: 52
Joined: Feb 2018
 
Posted: 2018-11-20 13:56
My first and only encounter of Julia was on University campus where it was used (for unknown reason) for nonlinear optimization. In the particular cases that I studied using it the performance was more or less the same as what you would see using python or Matlab. From pure user experience perspective it was really painful (even more so than C++) to use and the total time (dev + debug + run) was many times longer than with the more mature languages. Now, would I replace C/C++ or even C# or Java with Julia for systems programming - no, not right now.

jslade


Total Posts: 1184
Joined: Feb 2007
 
Posted: 2018-11-20 22:38
A friend of mine who is a fellow language polyglot pointed out (bug number 45 or so) that Julia doesn't automatically recompile and link new function definitions. Meaning it doesn't act like a normal repl. They fixed it a year or two ago, but 'amateur hour.' Their 'benchmarks' are nonsense.

Art's latest creation, K7, is reputedly really cool and allegedly solves some problems I always wanted solved with array languages. Meanwhile I still use J and think it's just fine for what I do.


"Learning, n. The kind of ignorance distinguishing the studious."

TonyC
Nuclear Energy Trader

Total Posts: 1309
Joined: May 2004
 
Posted: 2018-11-21 03:36
if you squint and look at it sideways, k7 has features that make it look a lot like it's own OS

flaneur/boulevardier/remittance man/energy trader

EspressoLover


Total Posts: 382
Joined: Jan 2015
 
Posted: 2018-11-21 04:25
If you're not latency constrained, why not just run python in production? Google, Facebook, Spotify and a lot of other major tech firms run python servers. With PyPy performance is reasonably good, probably better than whatever network latency you're dealing with. Containers and/or virtualenv make it nearly easy to deploy as pushing a compiled binary.

Alternatively if you already have low-level code (market data parsing, order routing, account management, etc.) in C++, is there any reason you can't just split into multiple services communicating over IPC. Something like protocol buffers, avro or even just JSON depending how rich you need the interchange to be. If latency's not an issue, then this shouldn't be a problem. Especially if the services all run on the same host.

Good questions outrank easy answers. -Paul Samuelson

Jurassic


Total Posts: 255
Joined: Mar 2018
 
Posted: 2018-11-21 10:42
@EspressoLover what do you mean by Python in production exactly?

FDAXHunter
Founding Member

Total Posts: 8372
Joined: Mar 2004
 
Posted: 2018-11-21 17:22
One data point about using Python in production:

A friend of mine built his fund with Python as the underlying language and a good portion of the core infrastructure. (Fund is about 2 billion USD and employs about 80 people). He started 11 years ago.

He really regrets using Python and really wishes he hadn't done it. As you can imagine, it's a bit late to switch the whole tech stack now, but he laments the choice almost daily.

The Figs Protocol.

Jurassic


Total Posts: 255
Joined: Mar 2018
 
Posted: 2018-11-21 17:51
@FDAXHunter did he specify the reasons for the regrets?

mtsm


Total Posts: 237
Joined: Dec 2010
 
Posted: 2018-11-21 17:54
That sounds like an idiotic opinion to me. I am sorry. I would not trust your friend. Most places that are somewhat tech savvy operate on a hodge podge of shit and it's typically fairly doable to switch stuff around and things evolve.

Whoever claims that their life is miserable based on some decision that was made 11 years ago is a bit of an idiot I am sorry to say.

chiral3
Founding Member

Total Posts: 5100
Joined: Mar 2004
 
Posted: 2018-11-21 18:16
By that definition I've been guilty of being an idiot due to choices in codebases. After time I've had issues integrating modules, scalability, extensibility, latency, etc. with very large production codebases that were great initially but evolved into something miserable. Today I have a hodgepodge, but it's a smaller hodgepodge than earlier in my career that is architected much better and better integrated with my data (this latter part is largely python based). This isn't an issue for a small operation or a single strategy but for something larger, higher frequency, cross asset, and vertically integrated it's a matter of life and death.

Nonius is Satoshi Nakamoto. 物の哀れ

EspressoLover


Total Posts: 382
Joined: Jan 2015
 
Posted: 2018-11-21 21:05
I really wouldn't enjoy being stuck in a python codebase at that scale either. In software everybody's situation is different in weird, subtle and unpredictable ways. Who knows what kind of idiosyncratic constraints someone's under.

But in general, there's very little reason in this day and age to be locked into any one specific language or framework. In 2018, there's tons of fantastic tooling and infrastructure that makes refactoring into services simple, safe and straightforward. It's almost a stereotype of Silicon Valley that a company gets launched with a half-baked Rails app then later down the road gets refactored into a polyglot service-oriented architecture.

I'm not even a fan of super-isolated, single-function microservices. But if someone's tired of their aging, ill-suited codebase, then there's no reason they're permanently stuck with it. In the aforementioned python codebase from hell, wait till you're banging your head against the wall working on some sub-component where python's a really bad fit. Re-write it in Visual HaskellLang.js instead of hacking on the pre-existing python. Replace the python module with a dumb client which wraps the API calls to REST, RPC or message queues depending on what works best. Then just roll the existing app and the service together in Kubernetes.

Repeat until it's no longer frustrating working on the original python app. Or until it's been refactored out of existence. Besides migrating to containers, which you probably should be if you haven't anyways, there's no overhead to this approach. The dev just spends his time refactoring instead of polishing a turd. Ops just deploys containers instead of artifacts. End-users see the same frontend and APIs. The primary reason this strategy might now apply is if the original codebase is so tightly coupled that there's no clean way to slice it into sub-components. In which case there's major issues with the underlying software engineering. Switching languages would just be re-arranging deck chairs on the Titanic.

This definitely doesn't apply if you're doing anything HFT or latency-critical. Anything that's ever called in the hot-path needs to live on a pet (not cattle) server, run bare metal, avoid IPC, and have a meticulously tuned API. But let's be realistic, well less than 1% of the code written on Wall Street will ever be invoked in this context.

Good questions outrank easy answers. -Paul Samuelson

ax


Total Posts: 77
Joined: Jul 2007
 
Posted: 2018-11-21 21:27
a downside of python at scale is dynamic data types and everything is runtime. to the original question python has this http://numba.pydata.org/

EspressoLover


Total Posts: 382
Joined: Jan 2015
 
Posted: 2018-11-21 21:43
But Julia's also dynamically typed. Even though they tend to be correlated, we should remember that typing and interpreted/compiled are two different things.

Going back to the original motivation for the question, there's an inherent tradeoff. Static typing is good for large, complex, long-lived systems with a lot of interconnected pieces where runtime failure is expensive. But dynamic typing makes research easier by facilitating rapid prototyping and flexible coding standards.

If the goal is to unify research and production code, then you inevitably have to come land somewhere on this spectrum. But in its defense, python3's typing module does provide a pretty nice system in this context. It makes it pretty easy to barf out untyped research code, then gradually annotate with it with typing as it gets promoted along to full production. (Though to be fair my understanding is that Julia takes a similar approach.)

Otherwise the alternative is to go with the flow of research and production having in many ways fundamentally different technological needs. Often the best tech stack for one will have major drawbacks for the other. But if that's your philosophy, you're going to have to accept the inevitable burden of re-writes when logic graduates into production. Either way there's no silver bullet, and there's always some bullshit that comes with the research/production dichotomy.

Good questions outrank easy answers. -Paul Samuelson

chiral3
Founding Member

Total Posts: 5100
Joined: Mar 2004
 
Posted: 2018-11-21 21:45
My core is still c++ (and cuda). Millennials hate c++. Too low level for them :-)

Nonius is Satoshi Nakamoto. 物の哀れ

bullero


Total Posts: 52
Joined: Feb 2018
 
Posted: 2018-11-22 00:40
Echoing EspressoLover - use the right tool for the job. Even if you can ride a motor scooter from Boston to Bogotá that does not mean you should.

Your python and such are mainly for scientific computing. How I see them is that they allow you to do prototyping and write proof of concepts very fast. Yes you can write a trading system using them but no you shouldn't. It is so easy to paint oneself into a corner when you can just throw together some code without static type setting. Later, things get messy when a lot of people are making small modifications to the code which imposes no type checking until runtime. Also, it puts higher demand for other people to study the code and try to interpret what is going on. Yes you can leave comments or write documentation but they are usually infested with lies damn lies. When you enforce static type checking its very clear what the writer means. So, help the readers of the code please. Also, compilers check your silly mistakes which generally is a good thing.

TonyC
Nuclear Energy Trader

Total Posts: 1309
Joined: May 2004
 
Posted: 2018-11-22 05:59
> My core is still c++ (and cuda)

thats why you roll your own APL to Cuda cross compiler, so you don't have to learn too much new stuff that's not as intuitive or as slick as the old stuff you learned in 7th grade

flaneur/boulevardier/remittance man/energy trader

indiosmo


Total Posts: 13
Joined: Jun 2011
 
Posted: 2018-11-22 17:57
Another vote for using the appropriate tools.

I use python for stuff like processing end of day reports and other non-critical systems and it's great, allows for quick and easy changes.

For anything in the order path nothing beats the safety of the type system, along with good tests and sanitizers.

Also if you can use it, modern C++ with modern libraries like boost range, nlohmann json, fmt and spdlog can make a lot of previously laborious tasks easier.

rickyvic


Total Posts: 188
Joined: Jul 2013
 
Posted: 2018-11-28 10:18
@fdaxhunter I know why, it is because it is a mess to debug and a lot of libraries you need to know how they work inside to avoid bugs, then versions and compatibility is another issue.
I live a much nicer life using matlab, which is faster by orders of magnitude if you write code correctly thanks to jit, install it and it runs, you can even compile it.

Just it is not a commonly used programming language and you cannot hire coders, a least not easily. Almost not a programming language IMO, but it works.

For latency sensitive work you need to port it to low level languages unless you are suicidal, some use functional compiled languages, but I cant get my head around it, at least not without wasting days.

"amicus Plato sed magis amica Veritas"

prikolno


Total Posts: 45
Joined: Jul 2018
 
Posted: 2018-11-28 12:53
Echo EL +1. FWIW it's actually a lot easier to debug Python than MATLAB, for various reasons that I'll try to walk through with real world examples.

1. Significantly better testing frameworks, e.g. property-based testing.
Think of a simple test case for an order book: if you insert 2 bids priced p1 > p2, then `getBidPrice()` should return p1.

What you really prefer is to have some guarantees that for all numeric price types p1 > p2, this holds.

In MATLAB, you just test some sensible numbers like p1=1 and p2=0 and in the ideal world, you've also built some pseudo-inductive reasoning with other test cases like the base case that the book accepts zeroes as prices (since the instrument may be a spread) and the case that ordering is preserved for negative prices. In many situations it's not so obvious and you just manually pick a few 'privileged' values like -1, 0, 1, type max, type min, and hope that you've covered a good part of your domain.

With a testing library like QuickCheck (Haskell) or hypothesis (Python), the test cases are randomly generated to match your distribution or type properties so you don't have to try exhaust the cases manually, you just need to specify that p1, p2 are numeric types and the library builds the test cases for you and even caches where the cases have failed in the past or narrows down the simplest case where your test fails.

It's also sensible to have microbenchmarking built into the testing framework, which modern testing frameworks like pytest and criterion (Haskell) provide. You've already gone through all that effort to mock a bunch of order objects, why not also benchmark how fast your book inserts each order?

2. Strong typing.
Back to the order book example, you really want to limit your function to accept arguments of an integer type or fixed decimal price type that you wrote. Why? In MATLAB, your order book functions will work even if someone passed in orders with floating point (or even singleton array) prices because it's weakly typed. Inevitably, some junior guy on your team will write a lot of code assuming this invariance since the test cases will still pass. Then later someone joins your team and wants to trade 2 year treasuries or JPY and now you have a cascade of code to clean up.

3. Static type hints and optional typing.
bullero's post brings me to the incredible feat of engineering on Jukka's PhD thesis and Lisa Roach's part that have made it possible to resolve many of the static typing complaints for using Python in production. Many of the points in his post don't apply after Python 3.5 (PEP 484/526). Instagram, Amazon, Dropbox all have near 100% test coverage with static type hints as you should if you're running a big firm like FDAX's friend.

As with many dynamically-typed languages, often you debug a runtime error on a function `foo` involving variable `var` by tracing back to previous call sites with `foo` or `var`. It's painful to do this with grep/ag and much easier if your code base has static type hints.

4. Better scoping, no massive, polluted global namespace.
Self-explanatory.

jslade


Total Posts: 1184
Joined: Feb 2007
 
Posted: 2018-11-28 17:25
Personally I think RickyVik is right: Matlab is extremely good if you know how to use it. Python is a big step backwards for numeric work; poor repl, opaque syntax, no first class array type, comparatively shittier IDE, ridiculous versioning treadmill and bad package/dependency management and package quality control. I was an early numpy fan, but having watched it since it was an LLNL project, and looked at and fixed the things built with it; I SMDH at kids these days. By contrast, a matlab project I tried to talk the client into doing in python: the glowing crystal in the middle of it has been chugging away for 11 years now; low maintenance, reliable and extendable, making the client money instead of software consultants. Anyone here have working 11 year old Python projects running a fund?

No amount of type hint patches, fashionable test framework nonsense will convince me that python is a good choice for anything at this point. Unless there was some theano type thing I couldn't do without, I'd sooner use nodejs for ephemeral software tooling than python.

Kubernetes and friends ... I know it's the way things are done now, but please at least acknowledge that the only reason such things exist is you are victim of a shitty package manager and a badly designed upgrade treadmill. If you built your stack with something with a sane package management system (I dunno ... Leiningen -what Matlab does is also valid and good) you would not have to use Kubernetes.

People continue to pay for Matlab for a reason.

"Learning, n. The kind of ignorance distinguishing the studious."

EspressoLover


Total Posts: 382
Joined: Jan 2015
 
Posted: 2018-11-28 19:04
@jslade

You've hit peak curmudgeon. I half expected your post to end with an invective against the integrated circuit and paean to the unsung virtues of vacuum tubes.

@rickvik

> matlab, which is faster by orders of magnitude

Matlab's relative speed comes from using MKL. OpenBLAS is quickly caching up, and the relative difference is well less than orders of magnitude. Even if not, MKL is now a free product, and you can build Numpy against it. Numba+MKL will have similar performance to Matlab, because they both compile down to pretty much identical assembly. Personally it would piss me off to pay insane licensing fees to Mathworks based on performance they're getting for free from Intel.

While we're on the Matlab hate train, regardless of its numeric or ecosystem merits, the language itself is objectively terrible. It's what happens when a bunch of scientists decide that PL research is for wimps. There's a reason it's one of the most dreaded languages right up there with Visual Basic, Cobol and Perl.

Good questions outrank easy answers. -Paul Samuelson

jslade


Total Posts: 1184
Joined: Feb 2007
 
Posted: 2018-11-28 22:33
" the language itself is objectively terrible."

Matlab as a programming language is fine: the only thing is really needs is namespaces. Even assuming the python defects I listed were not true (they're all true), it's vastly, preposterously easier to read numeric Matlab code than Numpy code, as it is closer to mathematical notation (again; first class array types). Mind you I don't use it and probably never will again if I can help it (J, C and R do everything I need), but it's unquestionably a better choice than Python for numerics development, starting from green fields.

Sometimes "new" isn't better. Sometimes "new" is just another trendy bandaid on a turd. Kubernetes is a great example of a trendy bandaid on a turd. "Look, shiny thing which will allow us to slap together a bunch of shitty broken tools!"

"Learning, n. The kind of ignorance distinguishing the studious."

Maggette


Total Posts: 1151
Joined: Jun 2007
 
Posted: 2018-11-28 22:35
I have learned the hard way not to bet against that jslade dude when it comes to this stuff...in the past (at least more often than not) it turned out that he was right.

Still I more or less agree with ES here. I started out as a MATLAB programer (even won a competition in it) and had 2 projects in the german automotive sector, where MATLAB and Simulink are still king.

As always, you can do decent, stable and performant stuff with the wrong tool and the wrong arhcitecture if you have the right people.

LAPACK and BLAS is fast. The rest of MATLAB is an interpreter at least as slow as python.

With a combination of numpy, pandas, numba/cython and the dask framework you can write distributed solutions in python (I mean like real software...with abstraction layers, services, unit tests and a sensible way to deploy stuff ) that are very hard to do and maintain in the MATLAB world.

Like I said...you probably can do it in MATLAB too...but that probably also holds true for VBA.

I think people still pay for MATLAB (at least in automotive) because of SIMULINK and for the same reason why there are still COBOL projects...people are scared of migrating.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

prikolno


Total Posts: 45
Joined: Jul 2018
 
Posted: 2018-11-28 23:47
>I'd sooner use nodejs for ephemeral software tooling
Lol, please don't.

>Python is a big step backwards for numeric work,
>No amount of type hint patches, fashionable test framework nonsense will convince me that python is a good choice for anything at this point.
To avoid confusing people, my post was not about using Python for numerical work. I was just referencing the specific point about debugging. I don't have a preference on Python, MATLAB, R or any other language, for numerical work. My experience is that you should just pick whatever you can gather a critical mass of development manpower for, be it if you're a one-man shop that only knows how to code in R or you have a team of 30 engineers who want to write everything in LuaJIT. That said if anyone has already decided to adopt Python, here's a few things I may suggest that will address these points:

>poor repl
I subscribe to the manifesto that things that you need to do more than twice should be in a tool. And I've always found people to be fairly repetitive even in their REPL environments. In my use case there's usually a command line executable, library method or GUI that one-shots most of my data exploration tasks, so I don't actually find a good REPL particularly important since I spend very little time there. This policy is language-agnostic.

>no first class array type
I argue this is what separates good platforms from the average ones. You actually don't want array types to be first class citizens because a significant amount of desync between research and production arises from that. And obviously if execution latency is crucial to your use case, you want to implement your own array types anyway so that there's no allocations buried in a library, and going back to the desync principle, ideally you want to avoid repeatedly porting research/prototype code using a third party array type to the ones in your homebrew linear algebra library for production, so by construction array types still aren't first class citizens.

>as it is closer to mathematical notation
I agree, but on the other hand an overwhelming amount of ML research and supporting frameworks for said research is written in Python as compared to MATLAB, so the merits are debatable.

>opaque syntax
Inherently untrue because one is open source and the other is proprietary. I mean literally even the simplest example I can think of is integer multiplication - 1*2 in CPython means Karatsuba's recursive algorithm (https://github.com/python/cpython/blob/master/Objects/longobject.c) and I know the cutoff digits for large integers is 70. What's large integer multiplication in MATLAB? Furer? Schonhage and Strassen? Coincidentally this motivation is also a reason why Python is the most popular language for cryptography, not MATLAB.
Previous Thread :: Next Thread 
Page 1 of 2Goto to page: [1], 2 Next