The Data platform is currently based on a combination of Python, PostgreSQL, dbt, Dagster, and cloud services (AWS & GCP). You'll have the opportunity to expand and transform these services to support our ambitious growth plan.
You'll play a crucial role in the tools and processes that allow us to:
Collect large datasets continuously from various sources, filter, sort, process, store, and redirect data into our training pipelines, R&D experiments, and analytics solutions. Importantly, we expect to leverage AI-agent pipelines to ingest messy data locked in documents and images.
Support data access to our R&D team by contributing to our ETL processes (APIs, dbt, PostgreSQL) and our core data-access library in python: pnx.
Expand our data monitoring and data-quality control using pipelines, models, dashboards, alerts, tracing products, etc.
Efficiently train new model, evaluate them, release them.
Serve frequently updated models with reliability and efficiency for all our customers and for internal needs.
You'll be managed by our Head of Software and will work in very close relationship with our Head of Data Science.
As you have now understood, this job is as challenging as it can be rewarding. We don't expect you to know everything already, the position will too. You'll have the opportunity to learn a lot and teach us a lot too.
More importantly, we expect you to rapidly own large portions of these crucial systems, therefore being responsible for central parts of our production systems.
