Forums  > Software  > Morgan Stanley Hobbes language  
     
Page 1 of 1
Display using:  

jslade


Total Posts: 1202
Joined: Feb 2007
 
Posted: 2019-12-04 13:37
I found this being touted on Hacker News the other day.

https://github.com/Morgan-Stanley/hobbes

Its allegedly a successor technology to their APL+/K4 stack. I'm a bit dubious as to whether it is actually used, and would be grateful if someone who actually uses it were to chime in.

Why I am dubious:
1) it's Haskell -I've never seen a corporate project involving Haskell which wasn't a boondoggle trashfire. It is an interesting programming language, but it appears to attract more than its share of people who like the smell of their own farts, rather than serious engineers who wish to solve problems with a minimal headaches. Usually when it comes out of a corporation of MS size, it's some dudes screwing around rather than something which is actually used.
2) it uses comprehensions to process and subset data, which is generally a trashfire. There are also no obvious sorts on columns.
3) I have met MS people who use crap like OneTick and are looking for a replacement (granted this was a few years ago).

Why I am curious: obviously someone spent a lot of time on this, and a high level language you could build reliable ticker plant and data storage infrastructure out of is a cool idea.

"Learning, n. The kind of ignorance distinguishing the studious."

gmetric_Flow


Total Posts: 28
Joined: Oct 2016
 
Posted: 2019-12-04 15:28
@jslade Applause to item 1.

It looks like an eager Haskell with some binding with C++. I guess they wanted more "readability" than Q/APL/K4/Shakti.

To quote the Lambda the Ultimate:

The programming language is a variant of Haskell (HM type inference, algebraic data types, qualified types / type classes) with some adjustments to help reduce boilerplate and derive very efficient machine code.


jslade


Total Posts: 1202
Joined: Feb 2007
 
Posted: 2019-12-05 14:30
Yeah, it looks neat; I'm not an early adopter though, so I don't even want to invest the time in learning it well enough to test it unless it's being used.

"Learning, n. The kind of ignorance distinguishing the studious."

kthielen


Total Posts: 2
Joined: Dec 2019
 
Posted: 2019-12-30 02:17
I made hobbes for Morgan Stanley, I can answer some of these questions.

10-20% of all daily equity trades in the US go through systems using hobbes, both on the critical trading path and off of it (e.g.: out of band real time and historical analysis, about 1TB/day of structured data is analyzed this way).

It's not used everywhere at MS, but it has proven itself useful to solve many real problems over the last ~7 years.

I don't "like the smell of my own farts" and I disagree that monad comprehensions are anything like a "trash fire" (but obviously there's nothing technical in this argument, maybe if you'd like to clarify what you mean).

My interest is exactly in solving real problems with minimal headaches, just like you. I've been around long enough to see several problems that I thought that I could solve, and my hope is only to make a useful tool.

HTH

Alfa


Total Posts: 11
Joined: Jun 2018
 
Posted: 2020-01-14 10:52
Do You mind please giving some of Your thoughts on APL/K/Q/KDB+/Shakti compared to Hobbes?

And what exactly Hobbes excels at?

Maggette


Total Posts: 1187
Joined: Jun 2007
 
Posted: 2020-01-14 11:46
@kthielen

welcome to NP and +1 on Alfas request, could you elaborate a bit on where you see the strength and potential downsides?

And a stupid one: do you have any IDE support for that stuff at MS?

Thx

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

kthielen


Total Posts: 2
Joined: Dec 2019
 
Posted: 2020-01-15 23:41
Thank you for the welcome. :)

Let me say about kdb, it's an excellent product and has several great ideas. When I was just leaving high school in the late 1990s, it left a lasting impression on me. First and foremost (as a technical matter), making no special distinction between massive datasets and small datasets (in that both are open to evaluation in simple programs in the same way) is a major simplification over the traditional model. The language itself is also very useful, covering a lot of what you'd expect in a foundation and in a way that requires nothing extraneous (e.g. code can be dense and compact, spending very little where you'd otherwise write many pages of throwaway code in C++/Java with redundant type declarations and such -- very little pretense). Arguably, the extreme overloading in k/q also helps with this, since a given expression can have many meanings and so can be practically useful in more contexts. Programs can execute very efficiently if they can be cast as vector operations (on arrays of a handful of primitive types). It's a major inspiration for hobbes and I target many of the same use-cases with hobbes.

OK, so those are some good things about kdb. How can we make this basic model of programs/databases better? That's something I have been working to answer with hobbes.

* It would be really useful if non-vectorized scalar-oriented code also executed very efficiently -- I had to rewrite a q market data feed handler in C++ many years ago and had this feeling, because the q code was much shorter and more straightforward but it generated a lot of garbage and fell far enough behind that we would lose our connection. When it comes to execution in the critical path of latency-sensitive trading systems, this is a major issue.

* Overloading is very useful, but there isn't a mechanism to introduce your own overloaded identifiers or plug new data types into existing identifiers. For example, I might want to construct a btree mapping some key type (like a uuid) to a numeric value, and then define multiplication on such a btree to scale the values in it. This raises a few issues actually, but hopefully it's clear enough to say that we need a hook into this overloading facility ourselves because it's so useful!

* Speaking of btrees actually, it would be really useful if we could efficiently store more types (of extremely large size) than just flat (and keyed) tables. I have some data and query patterns that benefit from using an interval tree e.g. correlating all simultaneously open orders (at what point today did I have the maximum number of simultaneously open orders and what were they?). I'd like to see that as logically an interval map from time to some order structure, and then structurally as a tree partitioning intervals (like one of the representations described in Wikipedia for "interval tree"). This is a basic practical problem.

* Large k and q programs are difficult to maintain and often you find misunderstandings between cooperating programmers very late in the development cycle of a program (at runtime, when trading actually starts). If we could identify these misunderstandings earlier, we could save a lot of money (and at least sleep easier).

* As expressive as k and q are, there are some kinds of problems that are very awkward to express (and could be made even denser!). In particular, data structure pattern matching is very useful, has a simple notation, and can translate into very efficient code. You can basically only write the back end of that translation in q, which makes for pretty horrible code that most people won't want to maintain. There's a similar story for parsing text, a common problem (LALR(1) parsing covers a lot of this).

Together, these issues kind of point the way toward something like Haskell. The issues of efficiency, verification, user-defined overloading, complex data structures, all basically come down to having a type system. We want a type system somewhat like Haskell's because it works very well with type inference (most people don't want to think about or say anything about types unless they really have to) and integrates overloading via qualified types (ie: predicates on types as e.g. 'Show t' to remember that the type 't' must have a mapping to strings). Patterns and array/list comprehensions are also really useful for compactly writing a lot of common queries (maybe we can have a debate about this, I guess some folks here feel differently about comprehensions as a query syntax but I'd like to see their alternatives).

OK why not just use Haskell? Basically it has lazy evaluation, C/C++ integration difficulty, and nominal typing working against it. I probably don't have to enumerate the issues with lazy evaluation in terms of difficulty reasoning about space/time (this is an interesting area for debate though, it definitely has value). The C/C++ integration is really important for all of the internal software that we have and to provide a way to move in and out of hobbes with very little friction (e.g. native C/C++ data structures should be directly interpretable without translating into a generic form as e.g. the K struct). But the killer is nominal types, because it turns out that we actually assume structural types much more frequently and we usually don't intend to distinguish types by name but only by structure (e.g. the "row type" for one of your kdb tables, probably doesn't deserve a name).

These are some of the basic considerations that went into the design of hobbes. I could get more detailed, it gets extremely extremely practical. Contrary to what you might expect, the type theory bits keep things simple (as far as they can be made simple).

After that, I've tried to focus on splitting up the basic features of hobbes so that it can be used with very little overhead. There's a format for type descriptions, you can get in C++ just by including one header (nothing to link). There's a format for data files, one header. There's a method for logging data structures into shared memory efficiently, one header. There's a method for TCP RPC both synchronous and asynchronous, one header. There's a structured data compression method (very useful for market data), one header. There's a JIT compiler backend (efficient register allocation, liveness analysis, encoding into x86 machine code), a couple of headers -- I have been meaning to fold the LLVM translation into this method but have been a little short on time.

There's a lot of stuff in here with real production uses and serious money at risk. It's not just mental masturbation or a dilettante adventure.

I haven't done anything in terms of IDE support, that could be a fun angle if anybody wants to work with me to put something like that together. I'm interested in making a simple tool with minimal dependencies that has a lot of practical applications. I am (I think) a reasonable, honest, straightforward and friendly person and I work well with other people. I'm not coming from a privileged background (as surprising as that might be for somebody who's worked in finance and investment banks) and I'm not pretentious. I just want to make something useful. I am eager to work with other similarly-inclined people.

Please send me a note if you're interested (my username at gmail). Be patient, I'll be patient, we can make useful tools.

Maggette


Total Posts: 1187
Joined: Jun 2007
 
Posted: 2020-01-16 09:27
Thanks for the in depth answer. Highly appreciated!

I am not an kdb+/q user and lived/live in a different eco system**. Still. The things you say make a lot of sense to me.

You wanted:
-a Haskell like experience on top of a kdb+ (pattern matching, list comprehensions)
- without a restrictive name-based type system
- without lazy evaluation (which IMHO is great or at least of for batch processing but adds unbearable risk and complexity for everything that is event based/real-time/streaming data processing).
- more complex, tree-like data types.

Again: thanks for your input!! Stuff that has proven itself in the field is always interesting to know about.

Regarding working on the IDE support: the problem is I am already extremly overworked, even without considering my own side projects/ learning projects. I don't think I would have the time to commit to another side project.


**(the whole hippster apache jungle (Kafka, Pulsar, Spark, Flink and Akka), lot's of Python and Scala, a bit of C and classical relational data bases (mostly Oracle and PostGres) and in-memory stuff..and as of lateley some Fortran..back in the days lot's of C# and some F#).

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

mrdivorce


Total Posts: 12
Joined: Jan 2017
 
Posted: 2020-01-16 10:59
@kthielen awesome post, thank you for the deep dive. I'm afraid I'm unqualified to help in any meaningful sense but I do find this stuff fascinating, it's enlightening to hear the reasoning behind a project like this.
Previous Thread :: Next Thread 
Page 1 of 1