Forums  > Software  > What stack are you using for Logs/Monitor/Alerts  
     
Page 1 of 1
Display using:  

EspressoLover


Total Posts: 258
Joined: Jan 2015
 
Posted: 2018-01-08 11:53
I'm curious what other NP'ers who are running automated trading systems are using when it comes to logging, monitoring, and alerts. I'm poking my nose in this topic, since I want to upgrade my current setup to something shinier. I haven't really put much effort on this side of things. Up until now, I pretty much get by dumping output to stdout, piping to log files, then just regularly checking things with grep/sed/awk by shelling into the production machine.

However, I have a baby at home and am doing a lot of trading in a different timezone. So, I'd like to make it easier to step away, plus offload some of the responsibility to a non-technical person on my team. It'd be interesting to hear what other solutions people are using in this area. Particularly any good open source or relatively cheap software that can just be plugged in and turned on. It's hard to do research in this area, since everything's so web-dev focused. Off the top of my head here's a rough outline of what I'm looking at (critique or suggestions definitely welcome):

- Log in application to syslog (instead of stdout)
- Logstash for sync'ing logs from prod to archive
- Nagios to let me know if the server blows up or quoter dies
- Logstash/Splunk to pub/sub trading events from the quoter output
- Pagerduty to blow up my phone in case shit hits the fan
- Some sort of web frontend for easy monitoring: refresh PnL, positions, trades, other strat-specific stats.
- Bonus points if that frontend could also plot intraday PnL, etc. Unfortunately can't really find any good type of project that does this out of the box. Would be nice if Graphite or Kibana could be easily shoehorned into doing this...

Good questions outrank easy answers. -Paul Samuelson

goldorak


Total Posts: 1001
Joined: Nov 2004
 
Posted: 2018-01-08 12:47
I would be interested to see proposals on that topic too actually!

As a side note, and this will be my very modest contribution, my experience over the years has told me that the worst always happens silently... somewhere deep inside the systems and stays undiscovered and unlogged until sh... hits the fan. As a result I have been a supporter of jobs always systematically and politely saying: "I am done now".

If you are not living on the edge you are taking up too much space.

Maggette


Total Posts: 980
Joined: Jun 2007
 
Posted: 2018-01-08 13:05
Non-Trading applications, but probably still interesting:
we use the ELK stack a lot. And grafana.

In addition we started to blur the line bewteen data output and logs a bit.

We have some meta data (like counts, averages, medians, standard deviations and ranges of values and of proccessing times, deltas of these values to historical data, "hasFinished" flags with timestamps etc) as structured data, that is written to a database.

This makes it easier and straight forward to create supervisor and sanity check jobs, jobs that create technical reports and dashboards.

Here we use Scala, Python, the database itself and (I am embarrassed) we use jenkins for alerts and e-mailks

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

victorin


Total Posts: 6
Joined: Oct 2010
 
Posted: 2018-01-09 12:53
This is what we use for monitoring our infrastructure (Micro$oft .NET and python shop).
Besides an automated trading systems marketplace, we run a traditional online brokerage business. I don't have any reference to compare scales, but we're generating around 100GB logs per day and 15000 realtime metrics.

Everything are open source projects running in auto-pilot, almost no maintentance needed :)

Logging
=====

At application level we verbosely log almost everything using log4net. We started saving them into daily rolling text files, but soon we got some disk issues so we created an appender that fires and forget all log traces to rabbitmq.
These traces are consumed by a small process that indexes them into Elastic Search.

We're evaluating to migrate from log4net to serilog to create structured log traces, which are way better to deal with on the monitoring stage

Metrics
=====

We have used graphite a couple of years. A pain in the ass to install, but super powerful in terms of transform, combine and perform computations on series data.

Again we faced some scalability issues (disk bottleneck), and moved to influxdb (on windows!). Extremely easy to install, and telegraf (influxdb plugin-driven agent for collecting and reporting metrics) is just awesome.

The downside is that innodb is way worst in terms of metric aggregation and computing. We've had a hard time trying to replicate the real-time dashboards we had with same graphite metrics

Monitoring
======

Grafana. period.

Very easy to install, and super powerful. You can add multiple datasources (we've used graphite, innodb and elastic search) and combine metrics into real-time dashboards. You can visualize time-series, create single panel metrics with alerting colors and much much more

We do use kibana for some manual elastic search log analysis, but once you identify a thing you want to monitor in real-time don't hesitate: grafana

Alerting
=====

Years ago we developed a home made alerting systems, based on some database flags. I've done some tests using grafana alerting module and it's quite impressive. Out of the box you have email alerting (attaching fancy image charts) , but the great thing is that you have plenty of channel plugins (like telegram, slack and so on).

For telephone/SMS we use traditional SMS providers, but if I had to chose now I'd have a look at twilio

My two cents!
Previous Thread :: Next Thread 
Page 1 of 1