Forums  > Basics  > Most common techniques for multivariable anomaly detection  
     
Page 1 of 1
Display using:  

Jurassic


Total Posts: 358
Joined: Mar 2018
 
Posted: 2020-03-11 15:42
There seems to be a dizzing amount of multivariable anomaly detection techniques. Do you guys recommend any particular one? simplity appreciated

Its Grisha


Total Posts: 44
Joined: Nov 2019
 
Posted: 2020-03-11 16:01
Worked this problem a while back, my advice would be to make sure your multivariate anomalies aren't primarily driven by single variables. Tried out a bunch of fancy techniques and none of them ended up outperforming a simple median absolute deviation approach with a few heuristics.

That being said, seeing the results of fancy techniques allows for understanding the simple heuristics quickly. The isolation forest algorithm is pretty intuitive and useful for that purpose.

https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf?q=isolation-forest

Maggette


Total Posts: 1233
Joined: Jun 2007
 
Posted: 2020-03-11 16:10
IMHO you need to be more specific

That kind of depends on
1) wHat exactly you define as "Multivariate anomaly detection" ( A1) multiple time series that can be used to predict a single one, A2) multiple time series and you have a reason to believe that all depend on each other or A3) multiple time series that define your system and you want to get a feeling that the state of the whole system is an outlier)
2) what you define as simple ( B1) the algorithm, like linear regression vs a DL ANN, B2) the modelling (specification, training, feature selection...) process or B3) the software stack.
3) if you have labled data....states that already happened and you have labeled as "this is an outlier". Are we talking on an usupervised or an supervised prediction problem.

I have experience with A1 and A2. Keep in mind, that B1 (simple model) and simple data pipline/training can be orthogonal. I did have very very good results detecting broken stuff by pumping quite raw input into XGBoost and even an DL ANN hybrid, but also (more often) failed miserably with that approach.

I never came across a "plug and play" solution. And everybody who told me he had one was lying or clueless.


First: are you really realy sure you need a multivariat solution, and not look at each time series on its own?
Thanks

Edit: dito Grisha here. I had a case with several 100 time series (kafka topics). At the end of the day it was governed by a few topics. In my case that was trivial, because a lot of the topics were derived on these major topics. CUSUM on these topics were sufficent.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...
Previous Thread :: Next Thread 
Page 1 of 1