Forums  > Software  > database capacity & where you store (home PC or cloud)  
     
Page 1 of 1
Display using:  

JTDerp


Total Posts: 44
Joined: Nov 2013
 
Posted: 2016-11-05 20:35
If you're running your own database & blackbox setup, may I pick your brain for a minute?

I'm moving toward a scratch-built backtesting & live trading bot, and on the database side, the plan is to store tick-level data for 3 months' worth across 30 U.S. futures instruments (plus 8-10 years of 1-hour OHLC bars). I figured $300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS. So, for sake of minimizing costs without "too much" slow-down in data exchange (would be happening at night anyway), I'm moving toward build of a home server.

If you've been down this road and can share of bit of info on your data request schema as well as current storage capacity req'd and where it's stored (home PC and/or cloud e.g. AWS), please do. If using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?

Thanks in advance.

The clouded mind seeks; the emptied mind finds.

ryankennedyio


Total Posts: 12
Joined: Nov 2016
 
Posted: 2016-11-05 23:51
Have a look at cassandra for the database.

I put together a home server. Mount a few hard drives (/mnt/cas0, /mnt/cas1, etc) and then use that mount point as the data volume for a docker cassandra instance. Effectively you end up with a cassandra cluster.

You can then replicate that on AWS or your colo instance using data-centre replication features. From memory I don't think you can set a "time" for DC replication (you mention doing that at night), it just always happens.

I wrote a blog post about my thoughts when I was making the same decision (hope it's OK to link, it's just too lengthy to post here). I have included an example schema too, so I hope that's useful.


I roughly followed this for building my server. Incredibly cheap.
I added 4x480gb SSD, a 40gb SSD for OS and a big HDD for misc stuff.

EspressoLover


Total Posts: 245
Joined: Jan 2015
 
Posted: 2016-11-05 23:53
Store the data local to where you need it. If you're doing research on machines at home, then store the data at home. If you're using EC2 for research, then store the data on AWS. That cost estimate seems way too high. I can't see that data-set being more than 500 GB after compression. That's only $15/month on S3.

S3 transfer IN is free. As long as you're doing research at AWS, then why would you need to have any significant data transfer OUT. Major data transfer should really only involve colo->cloud (live capture to market data archive). So that's covered under free IN transfer. I can't imagine an application where the trading system needs local access to the entire historical data set. Cloud->colo should only be small caching transfers. Maybe a few dollars a month in transfer prices at most.

If you're going to go home server, then just get a file server with the bare minimum on CPU, memory, everything besides disk. This is just a file server for WORM data that's primarily sequentially accessed. You don't need anything like a traditional DB. Get a cheap 4U+ like an old-gen 2900. It's cheaper to buy more smaller disks and RAID10 over them. Since it's in your house, rack-space isn't a constraint. Pay up for SAS reliability, because swapping out failed disks gets old fast. But you don't need anything more than minimal rotation speed. A single user running backtests hardly uses any I/O. Backup to Glacier, as that's by far the simplest, cheapest reliable method.

jslade


Total Posts: 1095
Joined: Feb 2007
 
Posted: 2016-11-06 20:08
You're talking about an amount of data that can fit in memory on a laptop; you can use pretty much anything for this.

"Learning, n. The kind of ignorance distinguishing the studious."

JTDerp


Total Posts: 44
Joined: Nov 2013
 
Posted: 2016-11-07 03:15
Thanks for your detailed answers, gentlemen!

Espresso, as far as why I'd need *outgoing* transfer from AWS, I have no idea - think there's some confusion as to purpose & architecture within the discussion between myself & the developer I've contracted with...the mention of tick data for 30 futures markets @ 3 months seems to create the notion that the capacity req'd will swell quickly. I've been asking similar questions to the creator of the QtPyLib (qtpylib.io) package, and he provided a starting point for estimating db size: "As for ticks... Each tick record in the database takes up 54 bytes of data. An average Futures contract can get 250-300 ticks/day from IB. This checks out to ~300MB of storage requirements per futures contract per month (~80MB for stocks).

By that calculation, you'll need about 10GB per month to store one month's worth of both minute and tick data for 30 symbols. Multiply that by the number of months you want to keep and that's the storage you'll need for market data storage. Multiply that again by ~2 and that's your server requirements."

Difference being that I'm connecting to Rithmic's datafeed, which is unfiltered, as opposed to Interactive Brokers in QtPyLib's case...theirs is filtered snapshots every 250ms.


'Round these parts, the operative phrase is "Amateurs, Dude." :)

The clouded mind seeks; the emptied mind finds.

JTDerp


Total Posts: 44
Joined: Nov 2013
 
Posted: 2016-11-07 03:57
Ryan, thanks very much for going into the hardware specs & linked tutorials - very helpful. I've built PCs from scratch several times, but never needed to verify RAID0 compliance across several drives, let alone design as a server instead of a desktop.

The clouded mind seeks; the emptied mind finds.

ryankennedyio


Total Posts: 12
Joined: Nov 2016
 
Posted: 2016-11-07 12:09
Apologies if what I suggested seems overkill -- was actually very very easy to set up.

Great thing about the database setup I mentioned is you don't actually need to mess around with RAID. The Cassandra pseudo-cluster distributes data across the hard-drives automatically. Very fast setup then only 1 or 2 lines in command line to manage it now and again.

You can set that linked build up as a desktop too. Just get a big enough case and put Ubuntu desktop on it or similar. That said, unless you need lots of cores/lots of ram it's probably best just going with a cheap desktop build as others have mentioned.
Previous Thread :: Next Thread 
Page 1 of 1