Forums  > Careers  > Looking for a head of machine learning engineering  
     
Page 1 of 1
Display using:  

Nonius
Founding Member
Nonius Unbound
Total Posts: 12799
Joined: Mar 2004
 
Posted: 2020-08-23 09:36
I'm looking to hire a relatively senior person to lead DevOps/software engineering/data engineering for a hedge fund quant/ml/ai lab. The person should continue to improve the architecture we have for ML (ETL, data and ML pipeline. AutoML, and seamlessly going from research to production) and be super hands on. Two junior direct reports.

we work in AWS so AWS skillz are super important. tech stack includes:
1. Kubernetes.
2. Terraform
3. Airflow and/or Prefect for scheduling.
4. GitLab/JupyterHub for repos
5. Redshift, Posgres, TimescaleDB, Mongo.
6. all the usual open source ML/AI Python libraries.
7. we used to use Spark Clusters through DataBricks; would like the hire to do a sort of quick and dirty version of their notebook interface so we don't bleed money to them.

PM me if interested.

Chiral is Tyler Durden

Maggette


Total Posts: 1251
Joined: Jun 2007
 
Posted: 2020-08-24 08:32
"7. we used to use Spark Clusters through DataBricks; would like the hire to do a sort of quick and dirty version of their notebook interface so we don't bleed money to them."

Sorry for the hijack, but may I ask what part of the Databricks functionality you are actually using?

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

Nonius
Founding Member
Nonius Unbound
Total Posts: 12799
Joined: Mar 2004
 
Posted: 2020-08-24 10:33
we *were* using their functionality to spin up clusters easily and their notebooks.

Chiral is Tyler Durden

Maggette


Total Posts: 1251
Joined: Jun 2007
 
Posted: 2020-08-24 14:15
Ok. Thx.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

Nonius
Founding Member
Nonius Unbound
Total Posts: 12799
Joined: Mar 2004
 
Posted: 2020-08-25 16:42
No problem broski. You use DataBricks and shit?

Chiral is Tyler Durden

Maggette


Total Posts: 1251
Joined: Jun 2007
 
Posted: 2020-08-26 12:48
Nope. One of my clients use HDP, anotherone Cloudera.

The last of my "big data" clients (VPP, energy trading) didn't use any platform. Some AWS. Even though for regulatory constraints, half of the work was done on premise. But no platform.

Spark on Yarn, Pulsar, Flink. Airflow for job scheduling. Data Engineering mostly in Scala. Machine Learning (Keras/Scikit Learn) and some Gurobi solver calls via python. BI/ data science exploration via Jupyter Notebooks on Postgres or parquet files.

Persistence 90% parquet files + a postgres for reporting and monitoring. Grafana for dashboards. No HDFS. We ignore the fact that S3 is not a real file system. And are ok with that so far. But I think that can be an issue.

Most things are dockered and run on kubernetes. But we aren't bulletproof there and I don't now shit about that and didn't contribute much other than exposing an URL for my applications for the health checks. Deployment via GITLAb CI.

Ich kam hierher und sah dich und deine Leute lächeln, und sagte mir: Maggette, scheiss auf den small talk, lass lieber deine Fäuste sprechen...

mgh95


Total Posts: 2
Joined: Jun 2020
 
Posted: 2020-08-31 01:07
@Nonius take a look at comet.ml (in particular, https://www.comet.ml/docs/python-sdk/pyspark/) and dbt-spark (https://github.com/fishtown-analytics/dbt-spark) which should integrate cleanly with your airflow + prefect usage. I have professional experience with both comet.ml and dbt (no affiliation with either) and can speak from experience it will save you from selling your kidneys to Databricks and sounds like it may work for you. Don't think I'm senior enough for this position but use almost all of the tooling you list in your OP, but if you're looking to fill mid-level roles PM me.

Nonius
Founding Member
Nonius Unbound
Total Posts: 12799
Joined: Mar 2004
 
Posted: 2020-09-04 07:47
thanks mgh95, I'll take a look. looks interesting...yes Databricks is very expensive.

hey Maggette, yeah we use parquet files a lot and I'm a big fan of grafana for dashboards.

On spark, I think we are probably hitting a sledgehammer to kill a fly, but we were using it for a lot of distributed computing.

Chiral is Tyler Durden
Previous Thread :: Next Thread 
Page 1 of 1