Forums  > General  > numerai  
     
Page 1 of 3Goto to page: [1], 2, 3 Next
Display using:  

NeroTulip


Total Posts: 997
Joined: May 2004
 
Posted: 2016-05-06 12:19
numerai

Seems better thought out than the Winton thing on Kaggle. Has anyone looked into this? Thoughts?

"Earth: some bacteria and basic life forms, no sign of intelligent life" (Message from a type III civilization probe sent to the solar system circa 2016)

mmport80


Total Posts: 85
Joined: Jul 2010
 
Posted: 2016-05-06 13:55
The encryption angle is really interesting. I wonder how good it is at keeping all the artifacts of the actual data. Suspect it can't be very good, but who knows. If anyone has more info I'd love to read up on it.

--- http://johnorford.blogspot.com http://blog.johnorford.com

Nonius
Founding Member
Nonius Unbound
Total Posts: 12706
Joined: Mar 2004
 
Posted: 2016-05-06 21:21
someone showed me that recently. RenTech is invested; that doesn't mean it's a good thing for a stakeholder whoe isn't an investor. I personally think it is the wave of the future on a lot of fronts. The argument against the approach is "we don't want a one-trick pony". Okay, well, maybe you can build a business tapping the minds of one trick ponies, be it trading or whatever.

Chiral is Tyler Durden

chiral3
Founding Member

Total Posts: 5023
Joined: Mar 2004
 
Posted: 2016-05-06 21:46
http://www.huffingtonpost.com/francis-pedraza/forget-john-galt-who-is-r_b_7891762.html

Nonius is Satoshi Nakamoto. 物の哀れ

EspressoLover


Total Posts: 254
Joined: Jan 2015
 
Posted: 2016-05-06 22:19
I'm pretty skeptical. I'd give this fund less than a 5% chance of having over $500mn AUM in 5 years.

The business proposition is have a bunch of quants independently work on signals then combine them together. Except that's what Two Sigma already does. And it doesn't need to encrypt its data because its has non-competes and deferred comp. Lack of domain knowledge is still a major handicap for almost any problem. Even with bleeding edge machine learning, handing some smart people a bunch of unlabeled features is not a recipe for success.

I'm sure in some sense Numerai's signals will have some marginal information to the rest of the market. But in quant trading the relationship between signal strength and profit is extremely non-linear. If you can achieve the same R-squared as the best participant, you have a multi-billion business. At 90% you have a multi-million dollar business. At 75% you have nothing.

Numeral may be able to harvest some unique signals. But there's no way that some smart hackers with no market knowledge will independently develop anything close to the bread-and-butter alphas which have basically been refined and passed back and forth between quant shops for the better part of two decades. Which I think speaks to why RennTech is interested.

As a standalone trading vehicle, they're untenable. But invest a couple million in a managed account where you can track their positions in real-time. (Or maybe even trick them into revealing the raw signals). You've just leased a potentially orthogonal alpha stream for Medallion. At 2/20 this costs you less than the annual comp of the receptionist.

chiral3
Founding Member

Total Posts: 5023
Joined: Mar 2004
 
Posted: 2016-05-06 22:41
It reminds me of some big data attempts in biology/medicine/immunology. I remember this multi day hackathon in ny last year where all these kids with no education (MD or PhD) worked on these datasets related to epitopes and cancer cell immunotherapy. You can't crowd source just anything. At some point you have to understand what you're looking at or work in some bottom-up way. Not everything is a classification or machine learning problem. These attempts are like saying we can abstract any dataset and there's some hidden message in it from God that nobody's found because we've been too specific with our methods.

Nonius is Satoshi Nakamoto. 物の哀れ

polysena


Total Posts: 1047
Joined: Nov 2007
 
Posted: 2016-05-07 00:19
Applause Chiral 3.

Свобода - это то, что у меня внутри. (Ленинград и Кипелов - "Свобода")

Nonius
Founding Member
Nonius Unbound
Total Posts: 12706
Joined: Mar 2004
 
Posted: 2016-05-07 10:01
"These attempts are like saying we can abstract any dataset and there's some hidden message in it from God that nobody's found because we've been too specific with our methods."

That reminds me of when I was in grad school I needed some summer lunch money, so I befriended this math professor who was bent on decoding messages in the Bible. Did some "work" for him to that end...on an NSF grant.

Chiral is Tyler Durden

HankScorpio


Total Posts: 465
Joined: Mar 2007
 
Posted: 2016-05-07 11:46
[threadjack]

Decoding messages in the Bible? That reminds me of this guy: https://www.youtube.com/watch?v=jo18VIoR2xU

[\threadjack]

radikal


Total Posts: 258
Joined: Dec 2012
 
Posted: 2016-05-07 16:43
Yeah, I'm also extremely skeptical -- perhaps I'm just really arrogant/deluded but my strong belief is that no amount of clueless cheap 25ish year olds + sklearn is a real threat to me. The water is warm, please please come trade.

I mean, it's GREAT for the industry if stuff like this gets bigger, clueless quant flows could be the new paper. Even a lot of algo traders I speak with have no idea what they're trading or that they are not getting the best price in the risk they're putting on. (You are synthetically just selling correlation on X basket, and you sold through the bid of what the bank's wide-fuck-you dispersion desk would show you) This gets far far worse with people who know nothing about what products are out there or how to relate values across similar products.

However, the view into this flow could be quite valuable. So, props to RenTech. (As per usual)

There are no surprising facts, only models that are surprised by facts

chiral3
Founding Member

Total Posts: 5023
Joined: Mar 2004
 
Posted: 2016-05-07 17:34
"Yeah, I'm also extremely skeptical -- perhaps I'm just really arrogant/deluded but my strong belief is that no amount of clueless cheap 25ish year olds + sklearn is a real threat to me."

That's how I've started to feel, at least as it relates to moonshots and innovation. I have been impressed by some of the ingenuity, though, but it's not organic. Like the kid that used our grid to run a bitcoin mine and tried to hide it by saying he was doing a bunch of really expensive economic simulation on a new strategy. What gave him away was that he sucked at developing trading strategies before he started bombing the performance of the gpu grid.

Nonius is Satoshi Nakamoto. 物の哀れ

jslade


Total Posts: 1101
Joined: Feb 2007
 
Posted: 2016-05-08 03:29
To be fair, what is presently being presented by numerai is an already curated data set which is known to have useful information in it. Finding that curated data set is the hard part. Finding signal in something useful is relatively easy. Risk management is also hard, and is strongly correlated with what you use for your signal finding, and knowing about the data in detail and what the market is doing.

To give some Silly Con valley perspective, this isn't the first "crowdsourced HF" idea. It is better because than what has come before because they're not just giving you quandl and saying "go for it" the way most of them do, and of course the encryption part, which is a relatively new research result is also pretty cool. Lots of people here have really crazy ideas about what it takes to trade profitably. The mania for crowd sourced everything is really stupid. Crowds are morons, and even in a "contest" situation like this, it is hard to pick out the non-morons.

As a business, I'd say it is probably OK as a recruiting pipeline, or a way of getting cheap labor for the signal seeking piece (what about the risk piece that goes with the signal piece?) but I can't imagine them scaling out in any sane way.

"Learning, n. The kind of ignorance distinguishing the studious."

Nonius
Founding Member
Nonius Unbound
Total Posts: 12706
Joined: Mar 2004
 
Posted: 2016-05-08 16:05
interesting analysis Jslade.

btw, It's not 100% pervasive but pretty common for quant shops to in-source signal development to, actually many some cases, wholly owned subsidiaries. Personally, the concept of just sitting a desk and finding predictability in data doesn't strike me as the most exciting job/activity/hobby. in the context of trading, that is, as one guy here once said, "a starting hypothesis".

Chiral is Tyler Durden

chiral3
Founding Member

Total Posts: 5023
Joined: Mar 2004
 
Posted: 2016-05-08 16:24
Can you talk a little more about the curated dataset? Are you saying it was curated based on known in-sample information and they are crowd sourcing algos to get better out-of-sample performance?

Nonius is Satoshi Nakamoto. 物の哀れ

goldorak


Total Posts: 1000
Joined: Nov 2004
 
Posted: 2016-05-08 19:40
@EspressoLover: > But in quant trading the relationship between signal strength and profit is extremely non-linear. If you can achieve the same R-squared as the best participant, you have a multi-billion business. At 90% you have a multi-million dollar business. At 75% you have nothing.

Your sentence is totally unintelligible to me. What do you mean by "signal strength"? R-squared on what? What is a "best participant"?

If you are not living on the edge you are taking up too much space.

svisstack


Total Posts: 303
Joined: Feb 2014
 
Posted: 2016-05-08 22:39
chiral3: i will agree that not everything is machine learning problem, but you mentioned gpu grid ;-) if its only for mc then jokes on me

---

Lets do not make too much assumptions about what they don't do now or will be in future, based only on what they told us they do right now in public.

Also they are probably reading this thread or they will in future and observation changes future like in quantum.

It has some degree of innovation and a lot of work ahead required. I'm not familiar with internals of this cryptograhpic algorithms used.
I'm skeptical that all information's preserves in dataset after encryption as already exact price information was lost,
so for me it's look like they biggest innovation can backfire in various ways.

Time well wasted.

jslade


Total Posts: 1101
Joined: Feb 2007
 
Posted: 2016-05-08 23:29
Chiral3: "Can you talk a little more about the curated dataset? "

Not beyond what is on their website/blog.

https://medium.com/@Numerai/encrypted-data-for-efficient-markets-fffbe9743ba8

"Over the last two and a half years, working with expensive financial data from many sources at a $15 billion asset management company, I came up with a way to turn a small segment of data into a tractable binary classification problem. And I was able to create and train a machine learning algorithm on this data.

Using my model, we invested about $50 million for more than a year, and outperformed the market significantly. "


This link has some commentary by an outsider on the nature of the data.

http://fastml.com/numerai-like-kaggle-but-with-a-clean-dataset-top-ten-in-the-money-and-recurring-payouts/

Remember, it is a data set founder guy painstakingly gathered when he was at a big fund with access to everything. His idea is he is looking for uncorrelated strats on the same data. That being the case, he should probably use different data. Maybe he has already done this and the universe of possibly useful data is in this 15 dimensional data set. I dunno.

Another thing to keep in mind, they're using logistic loss as a figure of merit. This is something that you can easily crowd source, since this is the simplest loss function that everyone in data science uses when flagging fraud or predicting click throughs. On the other hand, it seems to me not very well suited to trading problems. This is the kind of thing that makes me roll my eyes at Silicon Valley people.

"Learning, n. The kind of ignorance distinguishing the studious."

EspressoLover


Total Posts: 254
Joined: Jan 2015
 
Posted: 2016-05-09 04:05
> Your sentence is totally unintelligible to me. What do you mean by "signal strength"? R-squared on what? What is a "best participant"?

Let's say you have a backdoor into RennTech's net alpha stream on some strategy for some stock. However the data comes over a noisy channel. Half signal, half noise. You have 50% of RennTech's R-squared (with no marginal information) on whatever the hell dependent variable they fit. (1 minute mid-price returns wouldn't be unreasonable). After shrinkage, your signal averages root(1/2)'s RennTech's average magnitude. How much money can you make? To a first approximation: zero. Definitely not half what RennTech makes.

Why? Because you have significant adverse selection. Every trading opportunity where ||Alpha|| > TCosts, falls into one of three categories:
1) RennTech's alpha does not exceed TCosts, and yours only does because of the noise. In which case the trade is negative EV.
2) Both of you and RennTech want to trade. It's positive EV if you get filled. But even if you have the same non-deterministic latency (which you don't), you only get filled 50% of the time. And these types of trades just don't happen that often. RennTech doesn't need to shrink their alphas, so it's likely that they trigger well before you. Your only real opportunities are big discontinuous jumps. That game's about latency, not learning.
3) Both you and RennTech have positive EV alphas. But RennTech doesn't want to trade for inventory related reasons. Exceedingly rare. RennTech has for all intents and purposes infinite money relative to Medallion's capacity.

In this toy model, the vast majority of your trades are negative EV. Medallion's alphas are worth billions, but 50% of Medallion's alphas are worth nothing. And this toy model is very close to what Numeral is dealing with. There only hope is to data mine marginal information that others don't have, while keeping non-marginal information out of their models. That's just not going to happen. Almost all the major sources of alpha are pretty much solved. Discovering something that RennTech, Citadel, Two Sigma, KCG, HRT, et al. don't know is nigh impossible. (And whatever is new to discover, is almost certainly not a well-defined feature in a "curated dataset"). Collectively the major quant shops have near perfect LOB alpha, near perfect residual reversion, near perfect order flow models, etc. Even being wildly optimistic Numeral may get 2-3% marginal information relative to any major traders, and capture 80% of the non-marginal information. But without anything special on the monetization or infrastructure side, that simply doesn't mean squat.

goldorak


Total Posts: 1000
Joined: Nov 2004
 
Posted: 2016-05-09 07:38
Very interesting thread.

Thank you ES for making your line of thought more verbose. I am totally with you on this reasoning.




If you are not living on the edge you are taking up too much space.

a路径积分


Total Posts: 80
Joined: Dec 2014
 
Posted: 2016-05-09 13:30
[emotions]
I'm generally short anything where the founder's face takes up 25% of the webpage's screen real estate in a highly-edited PR video even as the company is in its infancy. I'd double down on that bet after watching https://www.youtube.com/watch?v=Tt84oawDTlI or seeing the self-drawn comparisons to Elon Musk. It reminds me of a working paper literally called "Narcissism Is a Bad Sign" where they found that stock prices underperformed as their CEOs' signatures or pictures became bigger on their annual reports.
[/emotions]

Good for him though, I think he has one substantial advantage over Quantopian in the crowd-sourced hedge fund market segment. The interesting idea to me is the skew in psychological preference for short-term reward. By turning the reward function into a microtransaction, not the long-drawn "maybe we will fund you" legal blackhole route that Quantopian has taken, I think he has a leg up in user acquisitions. Not to mention it's a reward effectively drawn from a random distribution at that... It's the kind of magic formula that gets lab mice hammering at a button nonstop for the chance of food. A good portion of Kaggle participants are motivated by the same reasons MMORPG players grind hours without end to get the same item with a slightly nicer hue or color or 0.1% more damage. I will wager that he will go on to take the growth (in user acquisitions or supposedly wide scale applicability of his encryption algorithm) argument to raising a comfortable margin of venture capital.

[emotion]
Surely, we're not going to speak of this in the same breath as Two Sigma right? The guy deserves better... after all, he is confident of pulling off an alpha signal strategy at 100% ADV. https://twitter.com/richardcraib/status/722102327843418114
[/emotion]

ronin


Total Posts: 219
Joined: May 2006
 
Posted: 2016-05-09 15:03
Decoding messages in the Bible has a long and distinguished history, almost entirely populated by crackpots.

https://en.wikipedia.org/wiki/Kabbalah

Whereas the Numerai attempt at generating a useful signal looks a bit more like this: https://www.youtube.com/watch?v=no_elVGGgW8

Mr. Burns: This is a thousand monkeys working at a thousand typewriters. Soon, they'll have finished the greatest novel known to man.

[reads a page]

Mr. Burns: All right, let's see... "It was the best of times, it was the BLURST of times?" You stupid monkey.



"People say nothing's impossible, but I do nothing every day" --Winnie The Pooh

Nonius
Founding Member
Nonius Unbound
Total Posts: 12706
Joined: Mar 2004
 
Posted: 2016-05-09 15:53
my fav Simpson's quote actually....!

This math guy was a total crackpot, but I didn't care as long as he paid me. (apropos, I've a mathematician friend in the states who is getting PAID to do research on behalf of some porn-king who has a new theory of gravity).

anyway, yeah, I guess this numerai stuff sucks.

Chiral is Tyler Durden

ronin


Total Posts: 219
Joined: May 2006
 
Posted: 2016-05-09 16:41
> mathematician friend in the states who is getting PAID to do research on behalf of some porn-king who has a new theory of gravity

That sounds amazing.

Why do I never come across opportunities like that...?



"People say nothing's impossible, but I do nothing every day" --Winnie The Pooh

Nonius
Founding Member
Nonius Unbound
Total Posts: 12706
Joined: Mar 2004
 
Posted: 2016-05-09 17:32
I suppose concentrating a lot on things like HFT might hurt. He's got some sort of storefront advertising Mathematician for Hire. I paid him a bit of money to do some calculus I couldn't be bothered to do (and which I couldn't get an automated answer on wolframalpha). He was also paid to create all the 3 d graphs in some math book.

My brother used to be a serial (and mostly failed) entrepreneur. He was building things like, when people used CDs, CD label printers. Sounds almost horse and buggy now. Anyway, he lived in SoCal and mentioned to me this cottage industry of machine shops in the greater LA area that make bespoke widgets for aerospace, hardware etc. So, it sort of makes sense to have this kind of math solving cottage industry, nowadays.

Chiral is Tyler Durden

ronin


Total Posts: 219
Joined: May 2006
 
Posted: 2016-05-09 20:40

> He was also paid to create all the 3 d graphs in some math book.

Sounds like every nerd's natural progression - from math books to porn kings. His undergrad school can use him for marketing...
Cool

"People say nothing's impossible, but I do nothing every day" --Winnie The Pooh
Previous Thread :: Next Thread 
Page 1 of 3Goto to page: [1], 2, 3 Next