 Nonius
|
Founding MemberNonius Unbound |
Total Posts: 12800 |
Joined: Mar 2004 |
|
|
I'm looking to hire a relatively senior person to lead DevOps/software engineering/data engineering for a hedge fund quant/ml/ai lab. The person should continue to improve the architecture we have for ML (ETL, data and ML pipeline. AutoML, and seamlessly going from research to production) and be super hands on. Two junior direct reports.
we work in AWS so AWS skillz are super important. tech stack includes: 1. Kubernetes. 2. Terraform 3. Airflow and/or Prefect for scheduling. 4. GitLab/JupyterHub for repos 5. Redshift, Posgres, TimescaleDB, Mongo. 6. all the usual open source ML/AI Python libraries. 7. we used to use Spark Clusters through DataBricks; would like the hire to do a sort of quick and dirty version of their notebook interface so we don't bleed money to them.
PM me if interested. |
Chiral is Tyler Durden |
|
|
 |
 Maggette
|
|
Total Posts: 1288 |
Joined: Jun 2007 |
|
|
"7. we used to use Spark Clusters through DataBricks; would like the hire to do a sort of quick and dirty version of their notebook interface so we don't bleed money to them."
Sorry for the hijack, but may I ask what part of the Databricks functionality you are actually using? |
Ich kam hierher und sah dich und deine Leute lächeln,
und sagte mir: Maggette, scheiss auf den small talk,
lass lieber deine Fäuste sprechen...
|
|
 |
 Nonius
|
Founding MemberNonius Unbound |
Total Posts: 12800 |
Joined: Mar 2004 |
|
|
we *were* using their functionality to spin up clusters easily and their notebooks. |
Chiral is Tyler Durden |
|
|
 |
 Maggette
|
|
Total Posts: 1288 |
Joined: Jun 2007 |
|
|
Ok. Thx. |
Ich kam hierher und sah dich und deine Leute lächeln,
und sagte mir: Maggette, scheiss auf den small talk,
lass lieber deine Fäuste sprechen...
|
|
 |
 Nonius
|
Founding MemberNonius Unbound |
Total Posts: 12800 |
Joined: Mar 2004 |
|
|
No problem broski. You use DataBricks and shit? |
Chiral is Tyler Durden |
|
|
 |
 Maggette
|
|
Total Posts: 1288 |
Joined: Jun 2007 |
|
|
Nope. One of my clients use HDP, anotherone Cloudera.
The last of my "big data" clients (VPP, energy trading) didn't use any platform. Some AWS. Even though for regulatory constraints, half of the work was done on premise. But no platform.
Spark on Yarn, Pulsar, Flink. Airflow for job scheduling. Data Engineering mostly in Scala. Machine Learning (Keras/Scikit Learn) and some Gurobi solver calls via python. BI/ data science exploration via Jupyter Notebooks on Postgres or parquet files.
Persistence 90% parquet files + a postgres for reporting and monitoring. Grafana for dashboards. No HDFS. We ignore the fact that S3 is not a real file system. And are ok with that so far. But I think that can be an issue.
Most things are dockered and run on kubernetes. But we aren't bulletproof there and I don't now shit about that and didn't contribute much other than exposing an URL for my applications for the health checks. Deployment via GITLAb CI.
|
Ich kam hierher und sah dich und deine Leute lächeln,
und sagte mir: Maggette, scheiss auf den small talk,
lass lieber deine Fäuste sprechen...
|
|
 |
 mgh95
|
|
Total Posts: 2 |
Joined: Jun 2020 |
|
|
@Nonius take a look at comet.ml (in particular, https://www.comet.ml/docs/python-sdk/pyspark/) and dbt-spark (https://github.com/fishtown-analytics/dbt-spark) which should integrate cleanly with your airflow + prefect usage. I have professional experience with both comet.ml and dbt (no affiliation with either) and can speak from experience it will save you from selling your kidneys to Databricks and sounds like it may work for you. Don't think I'm senior enough for this position but use almost all of the tooling you list in your OP, but if you're looking to fill mid-level roles PM me. |
|
|
|
 |
 Nonius
|
Founding MemberNonius Unbound |
Total Posts: 12800 |
Joined: Mar 2004 |
|
|
thanks mgh95, I'll take a look. looks interesting...yes Databricks is very expensive.
hey Maggette, yeah we use parquet files a lot and I'm a big fan of grafana for dashboards.
On spark, I think we are probably hitting a sledgehammer to kill a fly, but we were using it for a lot of distributed computing. |
Chiral is Tyler Durden |
|
 |