Forums  > Software  > Storing EOD data PostgreSQL  
     
Page 1 of 1
Display using:  

flare9x


Total Posts: 2
Joined: Nov 2016
 
Posted: 2016-11-25 05:15
Data: EOD
Data sources = .csv
1. Want to import .csv to postgreSQL
2. Perform calculations (use column A and B to make C) perhaps perform calculation in python and store final result in SQL
3. Multiple data sources: options, price volume etc etc... all to match up per date
4. Search scripts to be able to pull data from the database - perhaps writing these in python
5. .csv source to SQL destination automated process on a daily basis

The above - of course open to suggestions

Looking for figure out the best way to achieve the above work flow and organise the structure of the SQL database... one table, multiple etc?

Thanks

FDAXHunter
Founding Member

Total Posts: 8353
Joined: Mar 2004
 
Posted: 2016-11-25 11:52
As a structure for the table I would suggest:

For this sort of thing, a single table is usually sufficient. And because it's daily data, you can't really do anything wrong with regards to data types (which is why I'm leaving them up to you).

[Timestamp - Start] // The beginning of the day. In its most simple form this will be 00:00:00
[Timestamp - End] // This allows you to have variable length daily bars (this can be a use case for example with EOD data that you calculate yourself).
[Timestamp - Calculation] // This might be irrelevant, but if you calculate the data yourself, this is when the data was calculated and persisted.
[Instrument ID Key] // Your way of storing the instrument identification. Can be simple for stocks, more complex for options.
[Price Open]
[Price High]
[Price Low]
[Price Close]
[Total Quantity] // also known as volume
[Open Interest]
[Open Interest Release Timestamp] // Open interest is not normally calculated and updated same day.
[Data Source] // Allows you store different sources for the same data.
[Record Create - Timestamp] // Debugging and tracebility/maintenance
[Record Modify - Timestamp] // Debugging and traceability/maintenance
[Source Data - Raw] // It's actually feasible to store the raw data from the source as a string in the table for daily data. It won't take up much space, but can help tremendously with debugging and traceability problems at a later date.

The Figs Protocol.

EspressoLover


Total Posts: 245
Joined: Jan 2015
 
Posted: 2016-11-26 17:57
FDAX's schema is great, if you decide to go forward. But I'd ask yourself first, do you even need a database? If so, why? You could end up doing a lot less work by skipping points 4) and 5). Just store the raw (compressed) CSVs, and use the scripts from 2) to perform the calculations on demand. If you need point 3) just use GNU join. Pipe the calculator scripts directly into the consuming application. Yes that means you re-run the calcs on every access, but processor power and SSD access is *way* cheaper than developer effort. No need to deal with any of the headaches of a persistence layer.

I'd really doubt if you need SQL. Do you need ACID transactions? Indexing? Scalability? Probably not. It's a WORM dataset small enough to fit into L3 cache, with a max of under a hundred dozen simultaneous readers. When in doubt - KISS.

contango_and_cash


Total Posts: 72
Joined: Sep 2015
 
Posted: 2016-11-28 18:39
Look into to_sql and read_sql within pandas if you're keen on using python.

In line with EspressoLover spend some time thinking about if you need a database.


sv507


Total Posts: 165
Joined: Aug 2010
 
Posted: 2016-12-18 20:31
I found pandas to_sql very slow (hours vs minutes) , compared to postgres copy command. (I don't remember why it was so much slower). Personally would definitely go for the db option, agree its worm, but that's precisely the advantages of the db... So much easier to slice and dice. And do all the queries.

Lebowski


Total Posts: 66
Joined: Jun 2015
 
Posted: 2016-12-18 23:36
FDAXHunter's table structure is quite comprehensive given the question posed and some of the details he's given are the type of wisdom you only gain by implementing this stuff such as the OI release timestamp. It's by no means necessary but perhaps worth mentioning that if your applications of this data include the need for the the "Record Modify" timestamp you might consider implementing event sourcing as perhaps a more comprehensive pattern for dealing with the potential for modifications.

Tradenator


Total Posts: 1585
Joined: Sep 2006
 
Posted: 2016-12-21 13:18
Sorry to be thick, but wouldn't you want a date stamp in the schema as well, or is that included within all the various timestamps? Blush

I played with a postgres database years ago, and decided to just stick with csv files for daily time series data. They are nice and simple.

svisstack


Total Posts: 303
Joined: Feb 2014
 
Posted: 2016-12-21 14:27
@tradenator: timestamp includes date,
@lebowski: everything is around your specific requirements

Time well wasted.

cordura21


Total Posts: 228
Joined: Aug 2009
 
Posted: 2017-01-09 17:21
I guess you have a schema that doesn't change. But for flexibility isn't it better to use a variable/value structure? What I mean is instead of:
-Open | High | Low | Close
Something like:
record 1: Open | 16
record 2: High | 17
record 3: Low | 15
record 4: Close | 16
Cord

Vespertilio homo est cientificus

FDAXHunter
Founding Member

Total Posts: 8353
Joined: Mar 2004
 
Posted: 2017-01-21 19:50
That may sound like it gives you flexibility, but for highly structured data (which this is), this "Attribute-Value" pair is a really bad idea. You will run into countless operational issues. It's a really bad idea.
Entire careers have been lost to Attribute-Value or Entity-Attribute-Value ("EAV") schemas.

There are hundreds of references detailing why it's bad. I just googled a couple without really reading them:


EAV - The Good and Bad
OTLT/EAV design - why do people hate it?

The Figs Protocol.
Previous Thread :: Next Thread 
Page 1 of 1